Sorting an array containing duplicate elements by the inverse number of element occurrences

Dear users,

How do you sort an array of duplicate elements by the inverse number of occurrences of the elements?
The following is my way of doing it, but I am not sure if it is logically optimised. Could you give me some feedback?

Thanks in advance!

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		Dictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| item.atAll([\uniqueItem, \occurrence]) }
	.do {|item| item.postcs }
}
)


~occurrences.([1, 2, 2, 33, 33, 33])


~array = [1, 300, 6, 213, 21, 98, 12, 8, 3, 164, 258, 338, 3, 23, 21, 161, 62, 168, 42, 359, 65, 207, 19, 8, 21, 358, 85, 336, 250, 65, 188, 54, 12, 78, 26, 88, 12, 299, 14, 24, 3, 29, 6, 3, 42, 117, 358, 217, 47, 162, 169, 119, 228, 367, 117, 170, 8, 320, 130, 168, 3, 6, 74, 361, 44, 244, 115, 6]

x = ~occurrences.(~array)

Perhaps:

[3, 2, 2, 1, 1, 1].asBag.sortedCounts == [1 -> 3, 2 -> 2, 3 -> 1].asSortedList

where Bag>>sortedCounts (the traditional Smalltalk-80 name) is something like:

+ Bag {
	sortedCounts {
		var answer = SortedList();
		this.contents.associationsDo { | anAssociation |
			answer.add(anAssociation.value -> anAssociation.key)
		};
		^answer
	}
}
1 Like

Thank you for this!
Why has this convenient method not been implemented? I hope it will be implemented.

How about…

~f = {|arr|
	var weights = arr.collect({|a| a -> arr.occurrencesOf(a) }).asEvent;
	arr.sort({|l,r| weights[l] > weights[r] })
};

~f.([3, 2, 2, 1, 1, 1]) == [1, 1, 1, 2, 2, 3]

I don’t know why this (and it’s sibling sortedElements) were left out.

You could suggest them for the class library if you like? Or just add them to your local extensions!

Ps. The SuperCollider library follows the Smalltalk model pretty closely, so when something useful is missing we can often just borrow a definition, from Squeak in this case.

Pps. Apologies, a mistake, I left out the sort function!

+Bag {
	sortedCounts {
		var answer = SortedList(8) { | x y | x >= y };
		this.contents.associationsDo { | anAssociation |
			answer.add(anAssociation.value -> anAssociation.key)
		};
		^answer.array
	}
}

So rather:

[1, 2, 3, 1, 3, 1].asBag.sortedCounts == [3 -> 1, 2 -> 3, 1 -> 2]

Which is actually what you wanted, I think…

Thanks for your suggestion. asEvent is very good because it reduces unnecessary information in the post window.

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		Dictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| item.atAll([\uniqueItem, \occurrence]).asEvent }
	.do {|item| item.postcs }
}
)

~occurrences.([3, 2, [2], 2, [2], 1, 1, 1]) 

Thanks!

I did it:

1 Like

@semiquaver
Hi, the following discussion is closed, so I am replying here:
I know that anyone can participate in the development of the SC. I have done it a few times, but it was very difficult for me and other developers had to work with my problematic drafts, so my conclusion is that my direct involvement in development is not helpful for the community and me.

There are very few ways for individuals like me to contribute to the development of the SC, one possibility might be funding with feature suggestions, but this does not seem to fit the character of the community. It is not easy.

Continuing the discussion from (crowd)Funding SuperCollider development?:

1 Like

Good luck!

I’m afraid I was also wrong about sortedCounts & sortedElements being in St-80!

And they’re not in “A Little Smalltalk” (1987), which was, I vaguely recall, the reference for SuperCollider:

https://rmod-files.lille.inria.fr/FreeBooks/LittleSmalltalk/ALittleSmalltalk.pdf

So that’d explain why they’re missing…

They’re quite old though! i.e. p.356 of “Inside Smalltalk” (1990):

https://rmod-files.lille.inria.fr/FreeBooks/InsideST/InsideSmalltalkNoOCRed.pdf

Anyway, apologies about that…

Ppps. Squeak also has histogramOf which is a somewhat nicer name for asBag, and allows a select block…

"collection".histogramOf.itemCount($c) == 2

It looks something like:

+ Collection {

	collectInto { | aBlock aCollection |
		^aCollection.fillFromWith(this, aBlock)
	}

	fillFromWith { | aCollection aBlock |
		aCollection.do { | each |
			this.add(aBlock.value(each))
		}
	}

	histogramOf { | aBlockOrNil |
		var answer = Bag();
		this.collectInto(aBlockOrNil ? { | each | each }, answer);
		^answer
	}

}

I’m in the same situation - I’m not very knowledgeable about coding - my only real experience is sclang (and little bash, some Forth in the 80s and some Scheme in the 90s). My thought is to attend the dev meeting tomorrow morning - perhaps there are some simple but annoying tasks that someone like me could do which could lighten the load for the more competent devs but I’m not sure!

My point was not directed at you in particular - I apologize. But I do think that there are many potential contributors here on the forum who don’t step forward for whatever reason and I think that broader participation might be good!

2 Likes

A thought about efficiency:

The Bag approach iterates over the array once – O(n) – and each item performs a Dictionary lookup, which is not constant time, but (much) faster than O(n). In the end, it’s not exactly O(n) but dictionary lookup (especially IdentityDictionary) is very fast, so I would informally call it O(n…ish).

array.do { |item| array.occurrencesOf(item) } iterates over the array, and each item iterates over the array – so it’s O(n^2). We’d expect performance to degrade quickly, then.

And that’s exactly what happens. I even avoided re-counting occurrencesOf for duplicate items, but it still shows the rapidly rising execution times associated with O(n^2).

(
f = { |array|
	var counts = IdentityDictionary.new;
	array.do { |item|
		if(counts[item].isNil) {
			counts[item] = array.occurrencesOf(item);
		};
	};
	counts
};

g = { |array|
	var counts = IdentityBag.new;
	array.do { |item| counts.add(item) };
	counts
};

h = { |array|
	var occ = bench({ f.value(array) }, false);
	var bag = bench({ g.value(array) }, false);
	[occ, bag]
};
)

h.value(Array.fill(10, { 10.rand }));
-> [ 4.554699989967e-05, 2.4474999918311e-05 ]

h.value(Array.fill(100, { 100.rand }));
-> [ 0.0011201919996893, 9.4758999694022e-05 ]

h.value(Array.fill(1000, { 1000.rand }));
-> [ 0.02896626599977, 0.00027163100003236 ]

// 2.1 whole seconds is not a typo
h.value(Array.fill(10000, { 10000.rand }));
-> [ 2.1172122190001, 0.0036399169994183 ]

So it’s helpful at times to think about what is happening inside collection methods, especially when they could be hiding additional layers of iteration.

hjh

1 Like

Dictionaries in SC are hash maps with linear probing – for the purposes of big-O notation they have constant time lookup.

2 Likes

Thanks – I’d thought as much (it would have to be a pretty bad hash algorithm to incur a lot of scanning cost).

hjh

1 Like

Did not know about Bag:itemCount. Still, given that the question only had 68 elements this is a speed up of ~0.00005 seconds (on my machine).

Here is an updated version using only the features currently in supercollider.

~f = {|arr|
	var weights = arr.asBag;
	arr.sort({|l,r| weights.itemCount(l) > weights.itemCount(r) })
};

~f.([1, 3, 2, 2, 1, 1, 1]) == [ 1, 1, 1, 1, 2, 2, 3 ]
1 Like

Thank you very much! That seems to be the most efficient code for the title of my question.

At the same time, I apologise for the incorrect title of this thread:
I now see that the thread has the wrong title. The correct title should be:
Constructing the statistic by the inverse number of element occurrences from an array containing duplicate elements.

Sorry for my poor wording…

Based on the discussion in this thread, I have updated my original code as follows:

(
~occurrences = { |array|
	array.asSet.asArray
	.collect { |uniqueItem, uniqueIndex|
		IdentityDictionary[\uniqueItem->uniqueItem, \occurrence->array.occurrencesOf(uniqueItem)]
	}.sortBy(\occurrence)
	.reverse.collect { |item| 
		item.atAll([\occurrence, \uniqueItem]).asEvent }
	.do {|item| item.postcs }
}
)

~occurrences.([1, 2, 3, 4, 5, \6, [7], 1, 2, 1])

Am I correct?

Isn’t this exactly what @rdd’s answer did?

Phrased without the class extension:

~sortedCounts = {|arr|
	var contents = arr.asBag.contents;
	var answer = SortedList(size: contents.size, function: {|l, r| l.value >=  r.value });
	contents.associationsDo { | anAssociation |
		answer.add(anAssociation.key -> anAssociation.value)
	};
	answer
};
~sortedCounts.([1, 2, 3, 4, 5, \6, [7], 1, 2, 1]) 
// SortedList[ (1 -> 3), (2 -> 2), ([ 7 ] -> 1), (4 -> 1), (5 -> 1), (3 -> 1), (6 -> 1) ]

1 Like

Yes, I have slightly changed it!