Conversation about rates and synthdef optimisations

jordan · November 21, 2024, 12:12pm

Continuing the discussion from What could be wrong with the "range" method?:

Hmm I might be mistaken, its been a few months since I looked at this (ClassLib: fix and refactor SynthDef compiler by JordanHendersonMusic · Pull Request #6405 · supercollider/supercollider · GitHub)

If I remember correctly, the issue was in trying to resolve something like this.

var a = DC.kr(3);
var b = DC.ar(2);
var c = DC.kr(0.5);

var ab = a * b; // Ar  6
var abc = ab / c; // Ar 3 

var r1 = U.kr(abc);

// use abc later...
U.ar(abc);

What I was trying to do was (something like) taking abc and optimising away * b and / c as they cancel each other out, this would leave just a which is Kr.
Further, since these are DC and we know the values at compile time, you could turn it into a scalar.

The issue I found was that some UGens (namely, RecordBuf) will let you write something like this… U.ar(DC.kr(1)) but will have drastically different behaviour than U.ar(DC.ar(1)). This is a bug in RecordBuf as it reads uninitialised memory, but it turned out there were quite a few places where this occurs as the logic for checking rates can happen both when the sclang UGen object is constructed (not available at compile time) and when the server side ugen is initialised.

I believe this bug can currently be triggered in sc by RecordBuf.ar(SinOsc.ar * 0).

To get around this in a uniform way, I made all arithmetic optimisations that occur during compilation return a DC of the result, and to allow arithmetic optimisations on DCs as their value is known during compilation DC.kr(1) + DC.kr(2) === DC.kr(3).

Anyway… my point was that you might be able to save a few ugens, but you will also have more DC objects as you cannot reliably change the rate of a ugen without calling its sc side constructor, which means you’d also want some way to remove duplicate DCs, and given the current state of the synthdef compiler, doing this reliably is not trivial.

smoge · November 21, 2024, 12:27pm

Operations lifted to signals don’t have to fit what happens with simple numbers. That’s how it is. This part is normal.

Yea, but the situation makes sense. Sometimes, it is not worth it. You would need to type-check with type inference in the synthdef for real. That’s serious formal logic work. (put a grain of salt here)

On a slightly more realistic note, the team here can do this, no problem, but only if an architect is leading and has a plan. Ideas are just words without some of that.

While that happens, you will also fix bugs in places you did not know they even existed. But, for me, that is a meaningfully positive side effect for a long-term vision.

jordan · November 21, 2024, 1:25pm

I’ve already implement what I said, there is a PR, go check it out, I’m pretty proud of it .

You could take the arithmetic optimisations further than what I’ve done, as it only looks at immediately neighbouring operations for optimisations, when you could traverse the entire AST. However, I think that would add a lot of complexity to maintain rates and not offer enough benefit.

Here is an example of something my version can’t do…

var abcd = a + b + c + d;

var abcd2 = abcd / 2;

var r = abcd2 - (d / 2);

It can’t turn r into Sum3(a,b,c) / 2 .

smoge · November 21, 2024, 1:30pm

Logicians do that with a page full of strange symbols and Greek letters.

This is a Faust developer type-checking a signal graph, and borrowing Arrow operation from Haskell:

This is a joke (but this paper is 100% real by the faust team, and it’s about how fast signal processing pipelines (instead of more traditional faust block diagrams) are a representation borrowed from haskell arrows)t

julian · November 21, 2024, 1:43pm

Since the development discussion about this topic faded out at this point, I would like to continue at this point: a better optimisation, together with the bugfixes, is a really good thing … and this was a lot of work, I can see.

We have discovered that there is a trade-off that needs to be resolved, between

automatic synth efficiency and
synth def build time.

For some applications, automatic optimisation is very important, for example, if you generate ugen graphs algorithmically (like an automatic synth def builder).

For other applications, a short build time for synth defs is very important, for live coding or interactive exploration (like when you build dynamical systems with Fb1_ODE, for example).

Having let this sink a little, I wonder if the following could help?

a warning when really redundant ugens were removed, so you can improve your code if it is hand-written. Maybe the inefficient code was not intended. The question is if this can be done for the right cases.
more importantly, a possibility to switch off a reasonable part of the optimisation. I think topological sort should be kept, but it may not be necessary to do other things.

Of course, we’d need to discuss what the default behaviours should be.

jordan · November 21, 2024, 1:50pm

In that PR you can disable optimisation passes. This could probably be improved in terms of interface and granularity, but generally, it is removing duplicates that it slow, if you disable this, the performance hit is less impactful.

You could use this to disable certain optimisations for all NDefs (for example).

From synthdef...
	classvar <>enableSorting = true;
	classvar <>enableOptimisations = true;

Creating a warning for duplicates isn’t really possible as the checking for equivalence is what is slow.

You can disable optimisations and keep the topological sorting, but you can’t remove all the extra meta information needed in the UGen class. You could remove some of it, but this would result in a worse ordering.

The issue is that you need some optimisations as otherwise the behaviour will differ. PV ugens use optimisation passes to insert PV_Copy automatically. Still, the de-duplicator could be disabled without negative side effect.

smoge · November 21, 2024, 1:51pm

I agree, Julian.

But at the same time, all this is a bit relative. I spent some time learning with an experienced Haskeller how a perfectly written audio graph using Arrow notation, depending on how you write, would make GHC generate an intermediate code 20 times longer and more complex than the other.

This case shows a contradictory aspect of this division. GHC works hard to optimize everything and keep the programmer thinking on a high level. At the same time, the Arrow implementation was never really good in Haskell. It started as a separate preprocessor with some bugs. It’s not as good as the rest of GHC.

The C++ programmer psyche will say that you need to know everything happening on every level below where you are coding. If they don’t do this, their code will be trash. Fair enough for them.

But, other times, it’s a contradiction between two aspects of the same reality.

julian · November 21, 2024, 1:55pm

Ah that sounds good. It would be important to explore and document what optimisations are affecting what aspect.

I meant that the warnings for the optimising case. But maybe I didn’t understand what you meant.

Hm, this is interesting – should optimisations ever change actual behaviour?

EDIT: ok I see, some fix stuff that needs to be fixed. Those need to remain of course.

The solution may be some more granularity, so it can be tested.

jordan · November 21, 2024, 2:11pm

In this particular example (inserting PV_Copy) it isn’t too much of an issue because it is fast… but in general, allowing ugens (particularly quarks) to re-interpret the ugen graph transforming it somehow could be really powerful.

Here is a theoretical possibility where this kind of behaviour might be desirable…

var in = In.ar(0);
var in_over = Oversample.ar(2, in);
var distort = Shape.ar(in_over, ~buf);
var in_norm = Undersample.ar(2, distort);

Here, you can walk the graph and replace the call to Shape with a call to Shape2x that does the oversampling.

What do you mean by testing here? The way I am currently testing the PR is to do null tests with un-optimised synths.

jordan · November 21, 2024, 2:27pm

No, I misunderstood…

I don’t think it would be possible to identify useful information to the user, and even if you could, it would often involve a large re-structuring of the code to remove the duplicate.

Consider if you have two loops, where the first loop’s tenth entry and the second loop’s eleventh entry are the same. For the user to remove this duplicate, they’d basically have to recreate the de-duplication logic already present in the synthdef compiler. I think most cases would look like this, rather than more obvious duplications.

There is no way for the compiler to distinguish between literally writing SinOsc.ar * SinOsc.ar and

12.do { |i| 12.do { |j| o = SinOsc.ar(i) * SinOsc.ar(j) + o} }

If the user wants to see how it is optimised, you can always dumpUgens, but beyond that, I don’t think it is possible to provide useful information. Although this output could definitely be improved!

julian · November 23, 2024, 10:27am

Yes, there are too many cases, too many reasons of why some ugen is removed, which make it hard to select those cases where it would be useful to see that this happens.

So let’s drop this part of the idea (posting information about removed ugens).

What would be the best way to strike and adjust a balance between optimisation and synth def build speed? And what would be the reasonable defaults?

I think that an improvement in optimisation should not slow down default build speed. So perhaps there is a middle ground that could be the default, and adjustments on how far to go into details?

Also, we could find cases that can be clearly distinguished, and passed to the SynthDef as an argument.

jordan · November 23, 2024, 2:27pm

I’ve posted benchmarks of the different optimisation levels on github.

github.com/supercollider/supercollider

Comment by JordanHendersonMusic - ClassLib: fix and refactor SynthDef compiler

supercollider:develop ← JordanHendersonMusic:topic/synthdef-compiler

@telephon following from the conversation the other day. I've just made the o…ptimisations easier to use. Now the interface looks like this... ```supercollider SynthDef.newWithOptimizations(SynthDefOptimizations.none, \none, {...}) ``` Here are some benchmarks for the different levels, note the number of ugens and the times for each optimisation level. They all null test against the no optimisations (this is checked in the unit tests too). ```supercollider ~benchLevels.({ var s = { |o, i| SinOsc.ar( [i, i + 0.0001] ** 2 * f.value(o, i - 1), f.value(o, i - 1) * 0.0001) * f.value(o, i - 1 ) }; var f = { |o, i| if(i > 0, { s.value(o, i) }, o)}; Out.ar(0, (f.value(60, 6) / 60).sum) }); none : 0.036313 num UGens: 2673 onlySorting : 0.043201 num UGens: 2673 sortingAndRewrite : 0.118957 num UGens: 1944 cseAndSorting : 0.433917 num UGens: 61 all : 0.361737 num UGens: 48 ~benchLevels.({ var o = Mix.fill(20, { var i = Impulse.ar(5)!2; CombC.ar(i, 1, Select.ar(Impulse.kr(0, 5, i), (77 + [0, 3, 7, 10, 12]).collect{ |x| DC.ar(1 / x.midicps)}), 0.3 ) }).sum; Out.ar(0, o) }); none : 0.008108 num UGens: 321 onlySorting : 0.013634 num UGens: 321 sortingAndRewrite : 0.025736 num UGens: 274 cseAndSorting : 0.057055 num UGens: 19 all : 0.068162 num UGens: 16 ~benchLevels.({ |a = 2, b = 3, c = 4| var sig1 = SinOsc.ar; // dead code var sig2; sig1 = a + b; sig2 = sig1; sig1 = sig1 + a + b; sig1 * sig2 + sig1 // dead code, this just outputs silence. }); none : 0.000365 num UGens: 7 onlySorting : 0.000328 num UGens: 1 sortingAndRewrite : 0.000381 num UGens: 1 cseAndSorting : 0.000245 num UGens: 1 all : 0.000267 num UGens: 1 ``` Some things to notice. Sorting is pretty quick. Sometimes when rewriting *and* removing duplicates this can be faster than just removing duplicates. Removing duplicates is slow. If any one has any more code examples they'd like me to test, let me know and I'll add them here! <details><summary>Here is the `~benchLevels` function</summary> <p> ``` ~benchLevels = { |f| var o; "%: %\tnum UGens: %".format( "none\t\t\t\t", { o = SynthDef.newWithOptimizations(SynthDefOptimizations.none, \none, f) }.bench(false).round(0.000001), o.children.size ).postln; "%: %\tnum UGens: %".format( "onlySorting\t\t\t", { o = SynthDef.newWithOptimizations(SynthDefOptimizations.onlySorting, \onlySorting, f) }.bench(false).round(0.000001), o.children.size ).postln; "%: %\tnum UGens: %".format( "sortingAndRewrite\t", { o = SynthDef.newWithOptimizations(SynthDefOptimizations.sortingAndRewrite, \sortingAndRewrite, f) }.bench(false).round(0.000001), o.children.size ).postln; "%: %\tnum UGens: %".format( "cseAndSorting\t\t", { o = SynthDef.newWithOptimizations(SynthDefOptimizations.cseAndSorting, \cseAndSorting, f) }.bench(false).round(0.000001), o.children.size ).postln; "%: %\tnum UGens: %".format( "all\t\t\t\t\t", { o = SynthDef.newWithOptimizations(SynthDefOptimizations.all, \all, f) }.bench(false).round(0.000001), o.children.size ).postln; }; ``` </p> </details>

And what would be the reasonable defaults?

I think everything should be turned on when using SynthDef explicitly, but only sorting+rewriting enabled when using NDef. You could disable the rewriting for ndef, but things like inserting PV_Copy won’t work so there would be a discrepancy in behaviour.

Going off the benchmarks I made, this is a 50% to 100% performance increase. I don’t think this is that bad given it fixes other issues.

jordan · November 23, 2024, 2:45pm

Its not just an improvement in optimisation, but making sure the graph does in fact produce the same sound the user’s code describes (as mentioned here ClassLib: fix and refactor SynthDef compiler by JordanHendersonMusic · Pull Request #6405 · supercollider/supercollider · GitHub),
while also ensuring the ordering produces the minimal number of wire buffers, and drops unused UGens.

It is not possible to fix the incorrect ordering of the current build without some overhead, as this simply requires doing more.