# Sorting out sorting by

I often need to sort an array by some order relation. A simple example is ordering a number of matching rules by result size:

``````p = [\x -> [1, 2, 3], \y -> [1, 8, 1, 2], \z -> [16, 1], \a -> [1, 1, 2]];
p.sort { |a, b| a.value.size < b.value.size }
``````

This I usually refactor as:

``````f = { |x| x.value.size };
p.sort { |a, b| f.(a) < f.(b) }
``````

But I find a better solution is to use an order:

``````o = p.collect { |x| x.value.size };
p[o.order]
``````

Btw., performance-wise,

using an order can be much more efficient when `f` needs a little more calculation, because in the sorting algorithm, the comparison is done more than once. Here is a demonstration (difference of a factor of ten):

``````
(
var findTheNeedle =  { |x| x.detectIndex { |y| y < 4 } };
var haystacks = { (0..1000).scramble } ! 100;
bench {
haystacks.sort { |a, b| findTheNeedle.(a) < findTheNeedle.(b) };
}
)

(
var findTheNeedle =  { |x| x.detectIndex { |y| y < 4 } };
var haystacks = { (0..1000).scramble } ! 100;
bench {
var where = p.collect(findTheNeedle);
haystacks[where.order]
}
)
``````
1 Like

Yes, this is very nice. I think (?) it’s equivalent to the Schwartzian transform? It might be nice to have this in the class library, maybe as `sortOn`?

``````SequenceableCollection>>sortOn
arg keyFunc;
^this.at(this.collect(keyFunc).order)
``````
1 Like

I believe this would be slower than the existing sort methods while adding no new functionality.

Edit: I misread the key function – so there is new functionality, my mistake.

I’d be curious if there’s a more efficient implementation than `order`, though, which is really rather promiscuous in its use of memory and object allocation.

hjh

The idiom is common so it’s nice to have a name for it, or two even!

``````SequenceableCollection>>sortWith
arg keyFunc;
^this.copy.sort({arg p, q; keyFunc.value(p) <= keyFunc.value(q)})
``````

If `keyFunc` is fast then `sortWith` may be faster.

If `keyFunc` is slow then `sortOn` may be faster.

(`sortOn` runs `keyFunc` once for each element, `sortWith` runs it twice for each comparison.)

``````p = [\x -> [1,2,3], \y -> [1,8,1,2], \z -> [16,1], \a -> [1,1,2]];
p.sortOn({|x| x.value.size}) == p.sortWith({|x| x.value.size})
``````

Here’s a curious finding: `order` is likely to be pretty quick for Array, but slower – c. an order of magnitude slower – for other SequenceableCollections.

`.order` does two `flop` operations. `Array:flop` is implemented by a primitive; testing shows it to be faster than an equivalent `collect`. `SequenceableCollection:flop` is implemented in SuperCollider code and suffers accordingly:

``````(
~orderArrayByFlop = { |array|
[array, (0 .. array.size - 1)].flop
};

~orderArrayByCollect = { |array|
array.collect { |... pair| pair }
};
)

~orderArrayByCollect.([1, 2, 3]) == ~orderArrayByFlop.([1, 2, 3])
-> true  // good

(
var array = Array.fill(100000, { 1.0.rand });

bench { 100.do { ~orderArrayByFlop.(array) } };

bench { 100.do { ~orderArrayByCollect.(array) } };
)

time to run: 1.667288094 seconds.
time to run: 2.399677395 seconds.  // collect is slower

(
var array = List.fill(100000, { 1.0.rand });

bench { 100.do { ~orderArrayByFlop.(array) } };

bench { 100.do { ~orderArrayByCollect.(array) } };
)

time to run: 12.818654597 seconds.  // hoooo... wow
time to run: 2.349224316 seconds.  // this is unchanged from Array
``````

So while we’re looking at this, it might be worthwhile to move the current implementation of `order` into `Array`, and, for SequenceableCollection, change it to use `collect` loops instead.

hjh

1 Like

Yes, that was also my initial thought. But the problem that I found with this approach that you lose generality. You can’t specify the sorting relation (`<, >, >=`, or some hypthetical other function) any more.

What I haven’t understood about this idea is that in the implementation of `order` we are only ever calling `flop` on `Array`s, nothing else:

``````array = [this, (0..this.lastIndex)].flop; // returns an Array
``````

and

``````array.mergeSort(orderFunc).flop[1] // sorts the array
``````

It is a trade off, as @rdd indicated, I’ve added an example to the code in the first post.

OK, now I get it. As your example show, it is a `List` inside an array that makes it slow.

So here is an extended test. It shows that we just have to convert the collection into an array.

``````(
~orderArrayByFlop = { |array|
[array, (0 .. array.size - 1)].flop
};

~orderArrayByConvertAndFlop = { |array|
[array.asArray, (0 .. array.size - 1)].flop
};

~orderArrayByCollect = { |array|
array.collect { |... pair| pair }
};
)

~orderArrayByCollect.([1, 2, 3]) == ~orderArrayByFlop.([1, 2, 3]);
~orderArrayByCollect.([1, 2, 3]) == ~orderArrayByConvertAndFlop.([1, 2, 3]);
-> true  // all good

(
var array = Array.fill(100000, { 1.0.rand });

bench { 100.do { ~orderArrayByFlop.(array) } };

bench { 100.do { ~orderArrayByConvertAndFlop.(array) } };

bench { 100.do { ~orderArrayByCollect.(array) } };
)

time to run: 0.45237913400342 seconds.  // flop is fast
time to run: 0.44121945799998 seconds. // flop with convert is fast
time to run: 1.592607118997 seconds. // collect is slower

(
var array = List.fill(100000, { 1.0.rand });

bench { 100.do { ~orderArrayByFlop.(array) } };

bench { 100.do { ~orderArrayByConvertAndFlop.(array) } };

bench { 100.do { ~orderArrayByCollect.(array) } };
)

time to run: 6.7353264319972 seconds. // flop is slow
time to run: 0.52251220300241 seconds.  // flop with convert is fastest
time to run: 1.5765203389965 seconds. // collect is faster

``````

But the problem that I found with this approach that you lose generality. You can’t specify the sorting relation (`<, >, >=`, or some hypthetical other function) any more.

Do you mean this in relation to making a sort variant?

Perhaps the idiomatic SC approach would be something like:

``````sortOn
arg keyFunc, orderFunc = nil;
^this.at(this.collect(keyFunc).order(orderFunc))
``````

or:

``````sortWith
arg keyFunc, compareMsg = '<=';
^this.copy.sort({arg p, q; keyFunc.value(p).perform(compareMsg, keyFunc.value(q))})
``````

Ruby calls the decorate-sort-undecorate form `sort_by`, but SC already has `sortBy` for something else:

https://ruby-doc.org/core-3.0.2/Enumerable.html#method-i-sort_by

Based on further benchmarks, I think decorate-sort-undecorate is of relatively limited use in SC, unless the implementation can be seriously optimized.

One other thing to consider in `order` is its use of `mergeSort`. `quickSort` might outperform it (but might be slower, see below, which surprised me) and is valid in cases where there’s no need to preserve order within identical sort keys.

Anyway, comparing both types of sort against both types of decorate-sort:

``````(
~mergeSortOrder = { arg a, function;
var array, orderFunc;
// returns an array of indices that would sort the collection into order.
if(a.isEmpty) { [] } {
if (function.isNil) { function = { arg a, b; a <= b }; };
array = [a, (0..a.lastIndex)].flop;
orderFunc = { |a, b| function.value(a[0], b[0]) };
array.mergeSort(orderFunc).flop[1]
}
};

~quickSortOrder = { arg a, function;
var array, orderFunc;
// returns an array of indices that would sort the collection into order.
if(a.isEmpty) { [] } {
if (function.isNil) { function = { arg a, b; a <= b }; };
array = [a, (0..a.lastIndex)].flop;
orderFunc = { |a, b| function.value(a[0], b[0]) };
array.quickSort(orderFunc).flop[1]
}
};
)

(
var n = 100;
a = Array.fill(n, { |i| i -> Array.fill(rrand(1, 20), { 1.0.rand }) });

"mergesort".postln;
bench { 10.do { a.copy.mergeSort { |a, b| a.value.size < b.value.size } } };

"quicksort".postln;
bench { 10.do { a.copy.quickSort { |a, b| a.value.size < b.value.size } } };

"decorate-mergesort".postln;
bench { 10.do { a.at(~mergeSortOrder.(a.collect { |item| item.value.size })) } };

"decorate-quicksort".postln;
bench { 10.do { a.at(~quickSortOrder.(a.collect { |item| item.value.size })) } };
)

n = 100
mergesort
time to run: 0.018006851999985 seconds.
quicksort
time to run: 0.0097171379999281 seconds.
decorate-mergesort
time to run: 0.011257472000011 seconds.
decorate-quicksort
time to run: 0.0057003010000471 seconds.

n = 1000
mergesort
time to run: 0.086589871000001 seconds.
quicksort
time to run: 0.031966936000003 seconds.
decorate-mergesort
time to run: 0.094929352000008 seconds.
decorate-quicksort
time to run: 0.10641334999991 seconds.

n = 10000
mergesort
time to run: 1.075894006 seconds.
quicksort
time to run: 0.40626165700007 seconds.
decorate-mergesort
time to run: 1.2201362960001 seconds.
decorate-quicksort
time to run: 5.204760256 seconds.
``````

The decorate-sort is faster only for smaller arrays, and much slower for large arrays – which contradicts the theory that reducing the number of comparison-function calls would speed things up.

Needs work.

hjh

Here running:

``````x = R.directoryFilesRecursive(Platform.classLibraryDir)
x.size == 344
{x.sortOn({arg fn; File.mtime(fn)})}.bench
{x.sortWith({arg fn; File.mtime(fn)})}.bench
``````

prints:

``````time to run: 0.010503765999601 seconds.
time to run: 0.02712619399972 seconds.
``````

This function (whatever you call it, sort on a projected key) is a not uncommon thing to want, and sometimes the projection is slow, that’s all I meant.

Ps. I missed before that @julian had gone backwards and added a very nice example to the initial message! (Apologies)

In case anyone else did too, here it is again, with timings:

``````var findTheNeedle =  { |x| x.detectIndex { |y| y < 4 } };
var haystacks = { (0..10000).scramble } ! 1000;
bench {haystacks.sortOn(findTheNeedle)};
bench {haystacks.sortWith(findTheNeedle)};
``````

Timings:

``````time to run: 0.300721128 seconds.
time to run: 4.800341104 seconds.
``````

After (probably too much) effort, I was able to get a roughly 50% speed improvement over `order` as it is.

``````a = Array.fill(2000, { 1.0.rand });

bench { 100.do { a.order } };
``````

On my machine, this pretty consistently comes in around 1.96 - 1.99 seconds.

And reimplement a quicksort to use optimized methods from the new class (pasted below).

The new version is about 1.31 - 1.33 seconds. 1.97 / 1.32 = 1.49… = ~50% improvement.

`order` creates a new 2-element array for each input array element, and then has to index repeatedly into the sub-arrays. ArrayPair simply keeps a separate array for each “row,” so the comparison function can simply `at1` directly to the right value instead of `[index][0]`.

So then you can have the speed improvement of evaluating a slow “decorate” function only once per item, and better performance in `order` too.

``````(
var qs1 = { |array, start(0), end(array.size-1), comparison({ |a, b| a < b })|
var pivot, pivotI, i1, i2;
if(end - start >= 2) {  // must be at least 3 values for recursive branch
pivotI = rrand(start, end);
pivot = array.at1(pivotI);
i1 = start;
i2 = end;
while { i1 < i2 } {
case
// if @i1 should not come before pivot
{ comparison.value(pivot, array.at1(i1)) } {
// swap i1 to the upper partition
array.swap(i1, i2);
i2 = i2 - 1;
}
// else if i2 should come before pivot
{ comparison.value(array.at1(i2), pivot) } {
// swap i2 to the lower partition
array.swap(i1, i2);
i1 = i1 + 1;
}
{
i1 = i1 + 1;
if(i2 > i1) { i2 = i2 - 1 };
};
};
// here i1 == i2, but this may be the top of the low part, or bottom of the high part
// ensure that i1 is the bottom of the high part
if(comparison.(array.at1(i1), pivot)) {
i1 = i1 + 1;
};
qs1.(array, start, i1 - 1, comparison);
qs1.(array, i1, end, comparison);
} {
if(start < end) {  // 2 values
if(comparison.(array.at1(end), array.at1(start))) {
array.swap(start, end);
};
};
};
array
};

if(a.size != 2000) { a = Array.fill(2000, { 1.0.rand }) };

bench { 100.do {
var b = ArrayPair.newFromPair(a.copy, (0 .. a.size-1));
qs1.(b);
b.array2  // order
} };
)
``````

hjh