Sorting an array containing duplicate elements by the inverse number of element occurrences

prko · June 21, 2023, 2:52am

Dear users,

How do you sort an array of duplicate elements by the inverse number of occurrences of the elements?
The following is my way of doing it, but I am not sure if it is logically optimised. Could you give me some feedback?

Thanks in advance!

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		Dictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| item.atAll([\uniqueItem, \occurrence]) }
	.do {|item| item.postcs }
}
)


~occurrences.([1, 2, 2, 33, 33, 33])


~array = [1, 300, 6, 213, 21, 98, 12, 8, 3, 164, 258, 338, 3, 23, 21, 161, 62, 168, 42, 359, 65, 207, 19, 8, 21, 358, 85, 336, 250, 65, 188, 54, 12, 78, 26, 88, 12, 299, 14, 24, 3, 29, 6, 3, 42, 117, 358, 217, 47, 162, 169, 119, 228, 367, 117, 170, 8, 320, 130, 168, 3, 6, 74, 361, 44, 244, 115, 6]

x = ~occurrences.(~array)

rdd · June 21, 2023, 4:27am

Perhaps:

[3, 2, 2, 1, 1, 1].asBag.sortedCounts == [1 -> 3, 2 -> 2, 3 -> 1].asSortedList

where Bag>>sortedCounts (the traditional Smalltalk-80 name) is something like:

+ Bag {
	sortedCounts {
		var answer = SortedList();
		this.contents.associationsDo { | anAssociation |
			answer.add(anAssociation.value -> anAssociation.key)
		};
		^answer
	}
}

prko · June 22, 2023, 7:44am

Thank you for this!
Why has this convenient method not been implemented? I hope it will be implemented.

jordan · June 22, 2023, 2:42pm

How about…

~f = {|arr|
	var weights = arr.collect({|a| a -> arr.occurrencesOf(a) }).asEvent;
	arr.sort({|l,r| weights[l] > weights[r] })
};

~f.([3, 2, 2, 1, 1, 1]) == [1, 1, 1, 2, 2, 3]

rdd · June 22, 2023, 10:32pm

I don’t know why this (and it’s sibling sortedElements) were left out.

You could suggest them for the class library if you like? Or just add them to your local extensions!

Ps. The SuperCollider library follows the Smalltalk model pretty closely, so when something useful is missing we can often just borrow a definition, from Squeak in this case.

rdd · June 22, 2023, 11:32pm

Pps. Apologies, a mistake, I left out the sort function!

+Bag {
	sortedCounts {
		var answer = SortedList(8) { | x y | x >= y };
		this.contents.associationsDo { | anAssociation |
			answer.add(anAssociation.value -> anAssociation.key)
		};
		^answer.array
	}
}

So rather:

[1, 2, 3, 1, 3, 1].asBag.sortedCounts == [3 -> 1, 2 -> 3, 1 -> 2]

Which is actually what you wanted, I think…

prko · June 22, 2023, 11:40pm

Thanks for your suggestion. asEvent is very good because it reduces unnecessary information in the post window.

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		Dictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| item.atAll([\uniqueItem, \occurrence]).asEvent }
	.do {|item| item.postcs }
}
)

~occurrences.([3, 2, [2], 2, [2], 1, 1, 1])

prko · June 23, 2023, 12:00am

Thanks!

I did it:

github.com/supercollider/supercollider

Adding .sortedCounts to Bag and .countStats to Collection

opened 11:59PM - 22 Jun 23 UTC

prko

enhancement

## Motivation While using arrays, I encountered the need to know the number of …occurrences of the same elements. ## Description of Proposed Feature 1. The occurrences of the same elements of an instance of an array (and of any subclasses of the class Collection) are counted. 2. The result is sorted by the number of occurrences of the elements. https://scsynth.org/t/sorting-an-array-containing-duplicate-elements-by-the-inverse-number-of-element-occurrences/7822 ## Plan for Implementation The two methods are already ready to implement: **.sortedCounts** > +Bag { sortedCounts { The full code is in the following post: https://scsynth.org/t/sorting-an-array-containing-duplicate-elements-by-the-inverse-number-of-element-occurrences/7822/6?u=prko **.countStats** ``` + Collection { countStats { var answer = this.asSet.asArray.collect { |uniqueItem, uniqueIndex| Dictionary[\uniqueItem->uniqueItem, \occurrence->this.occurrencesOf(uniqueItem)] }.sortBy(\occurrence) .reverse.collect { |item| item.atAll([\uniqueItem, \occurrence]).asEvent } .do {|item| item.postcs }; ^answer } } ``` I did not write ```Bag.sortedCounts```, so I am not sure if I can make a pull request with that code. I can do it with ```.countStats```, but I am not sure if my implementation is good enough to release.

prko · June 23, 2023, 3:13am

@semiquaver
Hi, the following discussion is closed, so I am replying here:
I know that anyone can participate in the development of the SC. I have done it a few times, but it was very difficult for me and other developers had to work with my problematic drafts, so my conclusion is that my direct involvement in development is not helpful for the community and me.

There are very few ways for individuals like me to contribute to the development of the SC, one possibility might be funding with feature suggestions, but this does not seem to fit the character of the community. It is not easy.

Continuing the discussion from (crowd)Funding SuperCollider development?:

rdd · June 23, 2023, 4:08am

Good luck!

I’m afraid I was also wrong about sortedCounts & sortedElements being in St-80!

And they’re not in “A Little Smalltalk” (1987), which was, I vaguely recall, the reference for SuperCollider:

https://rmod-files.lille.inria.fr/FreeBooks/LittleSmalltalk/ALittleSmalltalk.pdf

So that’d explain why they’re missing…

They’re quite old though! i.e. p.356 of “Inside Smalltalk” (1990):

https://rmod-files.lille.inria.fr/FreeBooks/InsideST/InsideSmalltalkNoOCRed.pdf

Anyway, apologies about that…

rdd · June 23, 2023, 4:09am

Ppps. Squeak also has histogramOf which is a somewhat nicer name for asBag, and allows a select block…

"collection".histogramOf.itemCount($c) == 2

It looks something like:

+ Collection {

	collectInto { | aBlock aCollection |
		^aCollection.fillFromWith(this, aBlock)
	}

	fillFromWith { | aCollection aBlock |
		aCollection.do { | each |
			this.add(aBlock.value(each))
		}
	}

	histogramOf { | aBlockOrNil |
		var answer = Bag();
		this.collectInto(aBlockOrNil ? { | each | each }, answer);
		^answer
	}

}

semiquaver · June 23, 2023, 1:39pm

I’m in the same situation - I’m not very knowledgeable about coding - my only real experience is sclang (and little bash, some Forth in the 80s and some Scheme in the 90s). My thought is to attend the dev meeting tomorrow morning - perhaps there are some simple but annoying tasks that someone like me could do which could lighten the load for the more competent devs but I’m not sure!

My point was not directed at you in particular - I apologize. But I do think that there are many potential contributors here on the forum who don’t step forward for whatever reason and I think that broader participation might be good!

jamshark70 · June 24, 2023, 1:22am

A thought about efficiency:

The Bag approach iterates over the array once – O(n) – and each item performs a Dictionary lookup, which is not constant time, but (much) faster than O(n). In the end, it’s not exactly O(n) but dictionary lookup (especially IdentityDictionary) is very fast, so I would informally call it O(n…ish).

array.do { |item| array.occurrencesOf(item) } iterates over the array, and each item iterates over the array – so it’s O(n^2). We’d expect performance to degrade quickly, then.

And that’s exactly what happens. I even avoided re-counting occurrencesOf for duplicate items, but it still shows the rapidly rising execution times associated with O(n^2).

(
f = { |array|
	var counts = IdentityDictionary.new;
	array.do { |item|
		if(counts[item].isNil) {
			counts[item] = array.occurrencesOf(item);
		};
	};
	counts
};

g = { |array|
	var counts = IdentityBag.new;
	array.do { |item| counts.add(item) };
	counts
};

h = { |array|
	var occ = bench({ f.value(array) }, false);
	var bag = bench({ g.value(array) }, false);
	[occ, bag]
};
)

h.value(Array.fill(10, { 10.rand }));
-> [ 4.554699989967e-05, 2.4474999918311e-05 ]

h.value(Array.fill(100, { 100.rand }));
-> [ 0.0011201919996893, 9.4758999694022e-05 ]

h.value(Array.fill(1000, { 1000.rand }));
-> [ 0.02896626599977, 0.00027163100003236 ]

// 2.1 whole seconds is not a typo
h.value(Array.fill(10000, { 10000.rand }));
-> [ 2.1172122190001, 0.0036399169994183 ]

So it’s helpful at times to think about what is happening inside collection methods, especially when they could be hiding additional layers of iteration.

hjh

VIRTUALDOG · June 24, 2023, 2:45am

Dictionaries in SC are hash maps with linear probing – for the purposes of big-O notation they have constant time lookup.

jamshark70 · June 24, 2023, 6:58am

Thanks – I’d thought as much (it would have to be a pretty bad hash algorithm to incur a lot of scanning cost).

hjh

jordan · June 24, 2023, 10:01am

Did not know about Bag:itemCount. Still, given that the question only had 68 elements this is a speed up of ~0.00005 seconds (on my machine).

Here is an updated version using only the features currently in supercollider.

~f = {|arr|
	var weights = arr.asBag;
	arr.sort({|l,r| weights.itemCount(l) > weights.itemCount(r) })
};

~f.([1, 3, 2, 2, 1, 1, 1]) == [ 1, 1, 1, 1, 2, 2, 3 ]

prko · June 25, 2023, 6:23am

jordan:

Here is an updated version using only the features currently in supercollider.
~f = {|arr|
	var weights = arr.asBag;
	arr.sort({|l,r| weights.itemCount(l) > weights.itemCount(r) })
};

~f.([1, 3, 2, 2, 1, 1, 1]) == [ 1, 1, 1, 1, 2, 2, 3 ]

Thank you very much! That seems to be the most efficient code for the title of my question.

At the same time, I apologise for the incorrect title of this thread:
I now see that the thread has the wrong title. The correct title should be:
Constructing the statistic by the inverse number of element occurrences from an array containing duplicate elements.

Sorry for my poor wording…

Based on the discussion in this thread, I have updated my original code as follows:

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		IdentityDictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| 
		item.atAll([\occurrence, \uniqueItem]).asEvent }
	.do {|item| item.postcs }
}
)

~occurrences.([1, 2, 3, 4, 5, \6, [7], 1, 2, 1])

Am I correct?

jordan · June 25, 2023, 9:33am

Isn’t this exactly what @rdd’s answer did?

Phrased without the class extension:

~sortedCounts = {|arr|
	var contents = arr.asBag.contents;
	var answer = SortedList(size: contents.size, function: {|l, r| l.value >=  r.value });
	contents.associationsDo { | anAssociation |
		answer.add(anAssociation.key -> anAssociation.value)
	};
	answer
};
~sortedCounts.([1, 2, 3, 4, 5, \6, [7], 1, 2, 1]) 
// SortedList[ (1 -> 3), (2 -> 2), ([ 7 ] -> 1), (4 -> 1), (5 -> 1), (3 -> 1), (6 -> 1) ]

prko · June 25, 2023, 9:46am

Yes, I have slightly changed it!