SynthDef optimization strategies (esp wrt non-sclang clients)

Don’t take the current implementation as standard, it’s very broken.

Here is a (pretty big) rewrite I’m working on :

My opinion is that this is generally a bad idea if you intend to make this produce optimised synthdefs, this is because the optimisations are actually rather complex and ordering is not trivial. If this just does exactly what the user writes then I think this is an okay idea (only okay due to wirebufs and the weird way people would have to write synthdefs), but I would rather see sclang be made easier/possible to call from a c interface. This way, there is only one implementation, and that shouldn’t be too hard.

FWIW, my parsing method would return the “good” version in your PlayBuf example, not the “bad” version (which is not to say there won’t be cases that will be more difficult). My plan was for the DSL to have less syntactic sugar than sclang and be easier to parse.

Calling sclang from C strikes me as quite difficult, but maybe I’m missing something.

It shouldn’t be too much harder than calling lua (hopefully, haven’t tried this yet!!).

The difficultly is to sort the graph depth first (reducing wirebufs) while keeping the ordering. I believe (but could be wrong as I’ve only glanced at the code) scalacollider also gets this wrong.

Just to clarify, I’m not knocking this idea, just trying to point out the moment you start applying optimisations, things get very complex, to the point, where a standard implementation should be provided.

Yes. In my opinion, sclang tries too hard to be helpful. Maybe it’s just me, but I find it easier to stick as close to the graph as possible.

Not yet. I find sclang’s mce quite confusing–I wish I had a dollar for every time I ended up with four output channels instead of two–and I’m curious to try alternative approaches. I may not come up with anything better, of course.

1 Like

I think that would be something potentially handy. A shared and clean “intermediate language” for graph defs, with explicit representation, used among different clients.

Lua and C are BFFs: Lua 5.4 Reference Manual

What I’m doing is this: NodeSpec objects get pushed onto a stack when they are created. Each has a unique ID. NodeSpecs store the IDs of their inputs. When the graph is parsed, an initial pass through the stack maps the specs to UGens in creation order, and creates a mapping between their IDs and their “synthIndex”. A second pass then creates the wirebufs using this mapping. I don’t yet eliminate unused UGens, but that won’t be very hard. AFAIK this should work pretty well if the user’s code doesn’t go out of its way to build the graph out of order. I’m not sure how one would use fewer wirebufs.

All that said, I didn’t even try to understand how the sclang parser works, and I may run into some cases that need a different approach than I’m currently using. But I don’t mind not supporting every path to creating a synthdef in sclang, as long as there aren’t synthdefs that sclang can create that I can’t.

A standard implementation would be very welcome if it can be easily imported into non-sclang code. But I’m not sure its clients should need to use sclang–I think it might be better to have a closer-to-the-graph representation that all clients including sclang would implement.

Think as the server as a register based vm, and each wirebuf is a register. Multichannel expansion stores each channel in a register then descends to the next operation, this requires at least the number of wirebufs be equal to the number of channels. If instead you evaluate the totality of each channel in order (depth first) perhaps adding to a sum or outputting the result, you might only need two wirebufs. Wirebufs might affect performance, but definitely increase memory usage.

The largest performance benefit I have found (not todo with wirebufs, but optimisations in general) was from this graph (I found it online but don’t have the source):

SynthDef(\t, {
	var out;
	var s = { |o, i|
		SinOsc.ar(
			[i, i + 0.0001] ** 2 * f.value(o, i - 1),
			f.value(o, i - 1) * 0.0001) * f.value(o, i - 1
		)
	};
	var f = { |o, i| if(i > 0, { s.value(o, i) }, o)};
	out = (f.value(60, 6) / 60).sum;
	Out.ar(0, out)
}).add

Without optimisations this produces circa 2000 ugens. With common subexpression and a bunch of mathematical identities, this number goes down to circa 50 and the synth has a performance boost of circa 3610% (not a typo).

You might want to check the mce implementation in hsc3. It’s similar to some array programming ideas (scalar and vector types only) . The type limits the depth of the “tree”:

https://hackage.haskell.org/package/hsc3-0.20/docs/Sound-Sc3-Ugen-Mce.html
https://hackage.haskell.org/package/hsc3-0.20/docs/Sound-Sc3-Common-Mce.html

Why not just write it sensibly?

SynthDef(\t, {
	var out;
	var s = { |o, i|
		var mod = f.value(o, i - 1);
		SinOsc.ar(
			[i, i + 0.0001] ** 2 * mod,
			mod * 0.0001) * mod
	};
	var f = { |o, i, offset| if(i > 0, { s.value(o, i) }, o)};
	out = (f.value(60, 6) / 60).sum;
	Out.ar(0, out)
}).add;

I get 48 UGens that way.

That’s just an example that produces a bad case, that sort of things comes up quite regularly in smaller quantities. 48 comes from applying some maths identities.

I’m not saying not producing perfectly optimised, or even less optimised synthdefs is a bad! Just that applying any optimisation, especially sorting, will add a ton of complexity to do correctly. Also, users of your code might expect similar synth performance for equivalent code in sclang, particularly if they are using it to make a language client.

IMHO the user benefits more from a clean, easily maintainable parser than they do from guardrails. In the example, the user asks for 2000 UGens. I think they should get 2000 UGens! Just as a matter of taste, I don’t want my mistakes fixed behind the scenes, I want them brought to my attention so that I can learn not to make them.

An example where the user gets something that they can reasonably say they didn’t ask for would be a different matter. I’d be very interested in seeing such an example and seeing how my parser would handle it.

1 Like

It may not be the best example of automatic optimization because calling the f function three times with the same arguments in a loop is not the most usual case and is easily improved.

The crucial cases are those where the user can’t quickly intervene.

On the other hand, common subexpression elimination is a widespread optimization everywhere (any c compiler, etc). I don’t see any reason not to do it properly in Supercollider, especially when the user cannot improve it manually easily.

I would not be difficult in my parser to eliminate duplicated nodes (i.e., nodes that differ only in unique ID), keeping the earliest created instance. Expressions beyond that are parsed by Lua, which one hopes knows best!

(But when do duplicated nodes happen except when the user explicitly creates them?)

The user might inadvertently create duplicate nodes in more complex graphs, when composing with reusable components, or some other tool we are unaware of. Or maybe writing in a particular way is more straightforward/clear.

Or maybe only generate a new key if there isn’t an identical subgraph.

I would be surprised if Lua could do that. Can you confirm?

Maybe I’m misunderstanding what you mean by subexpression. If you mean something like:

osc1 = SinOsc(110) * SinOsc(5 * 3)
osc2 = SinOsc(55 * 2) * SinOsc(15)

then elimination of osc2 would happen automatically by removing duplicate nodes, either before or after building the subtrees. Both osc1 and osc2 would point to the same instance of Mul(SinOsc(110), SinOsc(15)). There’s no need to handle subexpressions separately.

Anything not in the node tree might affect compile performance, but that is up to Lua, not me. Does that make sense?

No that won’t work.

a =In.ar(0);
Out.ar(0, White noise.ar);
b = In.ar(0);

The bus changes after the out. That is why all the complexity of the PR I’ve made is needed.

Just to summarise the approach I’ve taken, I think it’s the only way if you want to apply all possible optimisations. You could do less by applying fewer optimisations.

Ugens need the following meta data:
Whether they effect state (Out, BufWr…)
Whether they can be replaced by identical calls (noise can’t).
What server side resources (bus, buffer) they touch, and in some cases, how they interact with them (read, write).

Then when making the graph, you gather all ugens that effect state and any that touch a resource get passed to a resource manager which creates additional ‘weak’ edges between the ugens so ordering is preserved.

Then starting at the effective ugens, walk upwards, checking for duplicates and whether the ugen can be replaced by an identical call, ensuring to also include the ‘weak’ edges as a part of the call identity. When you get a match, jump back down the graph to the children of the replaced ugen. Also apply any rewrites here, ensuring to jump down every time something gets applies.

Then calculate how many parents each node has, and sort them based on the largest first. Sort the ugens that effect state to. Then by starting at the ugens that effect state walk up adding each newly encountered ugen to a list, doing depth first. Reverse the list, this gives you execution order. Number them.

The alternative – which I have pointed out in other threads – is to mandate that the user explicitly specifies the order in such cases, e.g. with a method call like a.orderAfter(b). This is essentially the approach that Pure Data takes because users can force the ordering of unconnected canvases with dummy connections. To be clear: I do think that your approach is more elegant and provides a better UX, but it’s certainly not the only one.

Of course, the real solution to this problem would be to move all graph optimizations to the server so that clients do not have to deal with all this complexity. However, I don’t think this can be done without breaking backwards compatibility so we can only add it to the wishlist for SC4 :smiley:

1 Like

Ah sorry, I meant you basically needs all that information if you do ordering and optimisation implicitly.

Having the user manually provide orderings would would also offer more optimisation, as things that aren’t ordered can be ignored, however, you’d also have to order noise UGens with things like RandSeed, which the user might not expect need ordering.

1 Like

There are other ways; Noise Ugens, for example, can have a completely different type (in this case, a monadic form of ugen) to distinguish computations from expressions, or be instances of distinct classes (which influences optimizations). See in https://hackage.haskell.org/package/hsc3-0.6/src/Help/hsc3.help.lhs