The writing and accessing time of Array, List, Event, IdentityDictionary and Dictionary: which is the fastest and surest for real-time processing?

prko · September 3, 2022, 6:14am

Dear users and developers,

As far as I understand, the subclasses of unordered collections like Event, IdentityDictionary, and Dictionary are especially useful when constructing an ordered collection with Array or List is not efficient or impossible. Which is, however, the best choice in real-time processing?

E.g., recording the received data in real-time from many OSC or MIDI messages.

According to the following benchmarks, using Array seems faster than unordered collections. However, my knowledge and some codes by more experienced users than me imply that Event might be better.

So, is the following benchmark meaningless in terms of fastness and stabilities?

// Writing time of Array, Event, Dictionary and IdentityDictionary.

a = (0..9999)

benchmark { ~array = []; a.do { |i| ~array = ~array.add([("i"++i).asSymbol, i]) } }
~array.class

benchmark { ~list = List[]; a.do { |i| ~list.add([("i"++i).asSymbol, i]) } }
~list.class

benchmark { ~event = (); a.do {|i| ~event.put(("i"++i).asSymbol, i) } }
~event.class

benchmark { ~identityDictionary = IdentityDictionary(); a.do {|i| ~identityDictionary.put(("i"++i).asSymbol, i) } }
~identityDictionary.class

benchmark { ~dictionary = Dictionary(); a.do {|i| ~dictionary.put(("i"++i).asSymbol, i) } }
~dictionary.class


// Accessing time of Array, Event, Dictionary and IdentityDictionary.

benchmark { ~array.size.do { |i| ~array[i] } }

benchmark { ~list.size.do { |i| ~list[i] } }

benchmark { ~event.size.do { |i| ~event[("i"++i).asSymbol] } }

benchmark { ~identityDictionary.size.do { |i| ~identityDictionary[("i"++i).asSymbol] } }

benchmark { ~dictionary.size.do { |i| ~dictionary[("i"++i).asSymbol] } }

jamshark70 · September 3, 2022, 7:34am

The really important factors are: What type of access do you need? And, what types of operations do you need to do?

Arrays are very fast at accessing (and overwriting) items based on a consistent numeric position. Also fast for adding to the end. But they are much slower at inserting, deleting or searching for items.

Dictionaries are reasonably fast at accessing/overwriting items based on arbitrary keys (but this will always be slower than arrays). Because the keys are arbitrary, this also includes “searching for” (the search is optimized by the hashing strategy). Inserting items may be faster than array, but every so often, it will have to rearrange all of the items in storage, so once in awhile it will be slower.

“recording the received data in real-time from many OSC or MIDI messages” – this use case implies a sequence. Sequences can be ordered by number, so an array would be the best choice.

I guess you might be confused by my use of events for the items in an array into real-time data recording example. But, note that the two situations are different.

The array will hold a potentially unlimited number of events, and each new note is added at the end (fast for arrays).
The items within the array are much smaller (note number, velocity, onset time, duration – and only these). In this case it makes sense to give names to those values: event[\midinote] is more descriptive code than event[1]. For this size of event, the hash lookup should be extremely fast anyway.

So the question in the subject line – which is the fastest and surest – really needs to consider the use case. It wouldn’t be correct to say “always use xyz-type collection” – the reason why different types of collections exist is exactly because there are many use cases with different requirements.

hjh

prko · September 3, 2022, 2:25pm

Yes, this question arose while reading that code.

I will remember this. I mainly used arrays because I thought I didn’t need other collections. However, I sometimes use other collections nowadays to reduce comments and build more descriptive codes. Previously I thought the shorter the codes were, the better codes. Now, the readability and understandability without comments seem sometimes essential, and I frequently use other collections in this case.

Thanks always for your kind and detailed answers!

jamshark70 · September 4, 2022, 12:57am

Also, some notes on benchmarking.

Benchmarks should be constructed so that the only difference between them is the performance aspect you want to test. The “accessing time” benchmarks are not accurate because the array and list tests simply use i, while the dictionary tests construct symbols within the loop. That is, the performance difference is not only due to the different implementation of at but also because of the ++ and asSymbol operations, which are not relevant to at time.

I understand why you wrote it that way – because most of the time, dictionary keys are symbols rather than numbers (which leads to the overgeneralization that dictionary keys should/must be symbols). But this has accidentally introduced another uncontrolled difference, so we have no way to know how much of the performance difference is because of the methods, and how much because of inconsistent test scenarios.

I’d use numbers in the dictionaries.

Second, avoid unnecessary slow operations. Environment variables are actually entries in an identity dictionary! So your array tests also include dictionary lookups. Use declared or interpreter variables instead. It isn’t necessary to create sub-arrays either.

And a side note – benchmark is in sc3-plugins, so, before doing the comparison, I’ll replace this with bench from the main class library. (And another sidenote: It’s useful to run the test multiple times. I prefer to do at least 5 bench calls and average the results. However, in this case, I found that for “Array add,” I got 5-point averages ranging between 3 and 12 ms, so, not only was the execution time over-reported, the margin of error was almost as large as the measurement itself. I ended up bumping the number of iterations up to 1000.)

Let’s compare your version of the tests vs mine:

Test	Yours	Mine
Array add	22.02	1.84
List add	22.68	3.87
Event add	19.72	2.28
IDict add	19.29	2.20
Dict add	71.91	44.1
Array at	1.85	1.32
List at	6.29	1.44
Event at	18.82	1.72
IDict at	19.91	1.70
Dict at	38.2	17.14

In the array-add test, when I remove the [("i"++i).asSymbol, i] and switch to interpreter variables, execution time drops by more than 90%. This means that 90% of your measurement is actually the time to look up ~array and to construct the symbols and 2-item arrays, and only 10% the add time + the loop! This explains the surprising result that Event / IdentityDictionary add seemed faster: because these tests are doing less work.

This underscores the importance of making sure the test scenarios are as absolutely identical as possible, except for the one thing you want to compare.

The third column does reflect the results I would expect. List is slightly slower than Array because each add or at performs one additional method dispatch. Event and IdentityDictionary are practically identical in performance, and a bit slower than Array. Dictionary is much slower because Event/IDict do the hash table lookup in the C++ backend, while Dictionary does it in (slower) sclang code.

hjh

PS My modified benchmarks:

// Writing time of Array, Event, Dictionary and IdentityDictionary.
(
z = (0..9999);

f = { |func, num = 5|
	var sum = 0;
	num.do {
		sum = sum + bench(func, false);
	};
	sum / num * 1000  // ok, let's do ms
};
)


f.({ a = []; z.do { |item| a = a.add(item) } }, 1000);

f.({ l = List[]; z.do { |item| l.add(item) } }, 1000)

f.({ e = (); z.do { |item| e.put(item, item) } }, 1000)

f.({ i = IdentityDictionary(); z.do { |item| i.put(item, item) } }, 1000)

f.({ d = Dictionary(); z.do { |item| d.put(item, item) } }, 1000)


// Accessing time of Array, Event, Dictionary and IdentityDictionary.

f.({ a.size.do { |i| a[i] } }, 1000)

f.({ l.size.do { |i| l[i] } }, 1000)

f.({ e.size.do { |i| e[i] } }, 1000)

f.({ i.size.do { |j| i[j] } }, 1000)  // note, the collection is 'i' so we need a different index name

f.({ d.size.do { |i| d[i] } }, 1000)

prko · September 4, 2022, 2:55am

I appreciate your detailed explanation and the attached codes! I now understand better about constructing benchmarks and each of mentioned classes’ processing speeds in SuperCollider.

jordan · September 4, 2022, 12:14pm

Just wanted to leave something here just in case someone newer to programming read this and though you should only use array (or whatever) or got obsessed with writing the most optimised code.

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”
Knuth 1974

The point being, first and foremost write code that is easy to read and maintain, - if and only if it is too slow - benchmark it, changing only things you measure as slow and ignore the rest.

Supercollider in particular is built so that the language is fast enough for most things most of the time, with all the performant audio code being written in C++ as plugins.