`.sort` -- why is it a mergesort?

jamshark70 · September 30, 2020, 1:08am

On another forum, the topic of sorting algorithms came up (someone even went as far as to implement a proper quicksort in Pure Data)… which made me wonder what a QS would look like in SC, and how it would perform.

SC has a quickSort method, and it seems to perform better than mergesort. So I wonder why we are using the slower algorithm as the default sort.

(
~bench = { |array|
	"mergesort:".postln;
	bench { array.copy.sort };
	"quicksort:".postln;
	bench { array.copy.quickSort(_ < _) };
};
)

// randomly ordered data
~bench.((1..100000).scramble);

mergesort:
time to run: 1.0105632369998 seconds.
quicksort:
time to run: 0.34835768400012 seconds.

// worst case #1: already in order
~bench.((1..100000));

mergesort:
time to run: 0.72621700100012 seconds.
quicksort:
time to run: 0.24629733899997 seconds.

// worst case #2: reverse order
~bench.((1..100000).reverse);

mergesort:
time to run: 0.8452806790001 seconds.
quicksort:
time to run: 0.26317427200001 seconds.

hjh

shiihs · October 1, 2020, 1:53pm

Just a wild thought without any investigation: is there perhaps a difference in stability of the sorting algorithms? (sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted). If you sort - say - midi events based on start time, I guess having a stable sort could be important.

jamshark70 · October 2, 2020, 6:18am

OK, yes, that was it. The merge sort is stable (but if you use the wrong comparison operator, it’s reverse stable):

a = Ptuple([Pwhite(1, 3, inf), Pwhite(1, 25, inf)]).asStream.nextN(20);
-> [ [ 3, 5 ], [ 2, 8 ], [ 2, 2 ], [ 3, 5 ], [ 1, 10 ], [ 2, 24 ], [ 2, 15 ], [ 1, 13 ], [ 2, 4 ], [ 1, 9 ], [ 2, 15 ], [ 3, 17 ], [ 2, 25 ], [ 3, 4 ], [ 1, 22 ], [ 1, 17 ], [ 1, 1 ], [ 2, 23 ], [ 2, 15 ], [ 1, 13 ] ]

// probably most of us take the shortcut of writing < here
b = a.copy.sort { |a, b| a[1] < b[1] };
-> [ [ 1, 1 ], [ 2, 2 ], [ 3, 4 ], [ 2, 4 ], [ 3, 5 ], [ 3, 5 ], [ 2, 8 ], [ 1, 9 ], [ 1, 10 ], [ 1, 13 ], [ 1, 13 ], [ 2, 15 ], [ 2, 15 ], [ 2, 15 ], [ 1, 17 ], [ 3, 17 ], [ 1, 22 ], [ 2, 23 ], [ 2, 24 ], [ 2, 25 ] ]

// but look what it does...
b.sort { |a, b| a[0] < b[0] };
-> [ [ 1, 22 ], [ 1, 17 ], [ 1, 13 ], [ 1, 13 ], [ 1, 10 ], [ 1, 9 ], [ 1, 1 ], [ 2, 25 ], [ 2, 24 ], [ 2, 23 ], [ 2, 15 ], [ 2, 15 ], [ 2, 15 ], [ 2, 8 ], [ 2, 4 ], [ 2, 2 ], [ 3, 17 ], [ 3, 5 ], [ 3, 5 ], [ 3, 4 ] ]

// but... use <= instead:
b = a.copy.sort { |a, b| a[1] <= b[1] };
b.sort { |a, b| a[0] <= b[0] };
-> [ [ 1, 1 ], [ 1, 9 ], [ 1, 10 ], [ 1, 13 ], [ 1, 13 ], [ 1, 17 ], [ 1, 22 ], [ 2, 2 ], [ 2, 4 ], [ 2, 8 ], [ 2, 15 ], [ 2, 15 ], [ 2, 15 ], [ 2, 23 ], [ 2, 24 ], [ 2, 25 ], [ 3, 4 ], [ 3, 5 ], [ 3, 5 ], [ 3, 17 ] ]

So there are a couple points of documentation for me to work on later: 1/ Use <= as the comparison, and 2/ In use cases where sort-stability doesn’t matter, users can cut the time in half by using quickSort.

hjh