I really am no expert in this sorting-thing, but I guess the real advantage of the sort-algorithm in this thread is that one can easily paralellize it. as far as I remember mmx had some possibility to sort aribatry bytes with this method in paralell: each 64-bit register can hold 8 bytes, and so comparing 2 registers will sort up to 16 bytes in a few assembler-mmx-commands. naturally larger values require more registers. of course with only 16 values starting a seperate thread (on some other processor or multicore) is quite an overhead, the same with switching from float to mmx on my amd. but if the overhead has already been taken of...