If I remember correctly, the issue was in trying to resolve something like this.
var a = DC.kr(3);
var b = DC.ar(2);
var c = DC.kr(0.5);
var ab = a * b; // Ar 6
var abc = ab / c; // Ar 3
var r1 = U.kr(abc);
// use abc later...
U.ar(abc);
What I was trying to do was (something like) taking abc and optimising away * b and / c as they cancel each other out, this would leave just a which is Kr.
Further, since these are DC and we know the values at compile time, you could turn it into a scalar.
The issue I found was that some UGens (namely, RecordBuf) will let you write something like this… U.ar(DC.kr(1)) but will have drastically different behaviour than U.ar(DC.ar(1)). This is a bug in RecordBuf as it reads uninitialised memory, but it turned out there were quite a few places where this occurs as the logic for checking rates can happen both when the sclang UGen object is constructed (not available at compile time) and when the server side ugen is initialised.
I believe this bug can currently be triggered in sc by RecordBuf.ar(SinOsc.ar * 0).
To get around this in a uniform way, I made all arithmetic optimisations that occur during compilation return a DC of the result, and to allow arithmetic optimisations on DCs as their value is known during compilation DC.kr(1) + DC.kr(2) === DC.kr(3).
Anyway… my point was that you might be able to save a few ugens, but you will also have more DC objects as you cannot reliably change the rate of a ugen without calling its sc side constructor, which means you’d also want some way to remove duplicate DCs, and given the current state of the synthdef compiler, doing this reliably is not trivial.
Operations lifted to signals don’t have to fit what happens with simple numbers. That’s how it is. This part is normal.
Yea, but the situation makes sense. Sometimes, it is not worth it. You would need to type-check with type inference in the synthdef for real. That’s serious formal logic work. (put a grain of salt here)
On a slightly more realistic note, the team here can do this, no problem, but only if an architect is leading and has a plan. Ideas are just words without some of that.
While that happens, you will also fix bugs in places you did not know they even existed. But, for me, that is a meaningfully positive side effect for a long-term vision.
I’ve already implement what I said, there is a PR, go check it out, I’m pretty proud of it .
You could take the arithmetic optimisations further than what I’ve done, as it only looks at immediately neighbouring operations for optimisations, when you could traverse the entire AST. However, I think that would add a lot of complexity to maintain rates and not offer enough benefit.
Here is an example of something my version can’t do…
var abcd = a + b + c + d;
var abcd2 = abcd / 2;
var r = abcd2 - (d / 2);
This is a joke (but this paper is 100% real by the faust team, and it’s about how fast signal processing pipelines (instead of more traditional faust block diagrams) are a representation borrowed from haskell arrows)t
Since the development discussion about this topic faded out at this point, I would like to continue at this point: a better optimisation, together with the bugfixes, is a really good thing … and this was a lot of work, I can see.
We have discovered that there is a trade-off that needs to be resolved, between
automatic synth efficiency and
synth def build time.
For some applications, automatic optimisation is very important, for example, if you generate ugen graphs algorithmically (like an automatic synth def builder).
For other applications, a short build time for synth defs is very important, for live coding or interactive exploration (like when you build dynamical systems with Fb1_ODE, for example).
Having let this sink a little, I wonder if the following could help?
a warning when really redundant ugens were removed, so you can improve your code if it is hand-written. Maybe the inefficient code was not intended. The question is if this can be done for the right cases.
more importantly, a possibility to switch off a reasonable part of the optimisation. I think topological sort should be kept, but it may not be necessary to do other things.
Of course, we’d need to discuss what the default behaviours should be.
In that PR you can disable optimisation passes. This could probably be improved in terms of interface and granularity, but generally, it is removing duplicates that it slow, if you disable this, the performance hit is less impactful.
You could use this to disable certain optimisations for all NDefs (for example).
From synthdef...
classvar <>enableSorting = true;
classvar <>enableOptimisations = true;
Creating a warning for duplicates isn’t really possible as the checking for equivalence is what is slow.
You can disable optimisations and keep the topological sorting, but you can’t remove all the extra meta information needed in the UGen class. You could remove some of it, but this would result in a worse ordering.
The issue is that you need some optimisations as otherwise the behaviour will differ. PV ugens use optimisation passes to insert PV_Copy automatically. Still, the de-duplicator could be disabled without negative side effect.
But at the same time, all this is a bit relative. I spent some time learning with an experienced Haskeller how a perfectly written audio graph using Arrow notation, depending on how you write, would make GHC generate an intermediate code 20 times longer and more complex than the other.
This case shows a contradictory aspect of this division. GHC works hard to optimize everything and keep the programmer thinking on a high level. At the same time, the Arrow implementation was never really good in Haskell. It started as a separate preprocessor with some bugs. It’s not as good as the rest of GHC.
The C++ programmer psyche will say that you need to know everything happening on every level below where you are coding. If they don’t do this, their code will be trash. Fair enough for them.
But, other times, it’s a contradiction between two aspects of the same reality.
In this particular example (inserting PV_Copy) it isn’t too much of an issue because it is fast… but in general, allowing ugens (particularly quarks) to re-interpret the ugen graph transforming it somehow could be really powerful.
Here is a theoretical possibility where this kind of behaviour might be desirable…
var in = In.ar(0);
var in_over = Oversample.ar(2, in);
var distort = Shape.ar(in_over, ~buf);
var in_norm = Undersample.ar(2, distort);
Here, you can walk the graph and replace the call to Shape with a call to Shape2x that does the oversampling.
What do you mean by testing here? The way I am currently testing the PR is to do null tests with un-optimised synths.
I don’t think it would be possible to identify useful information to the user, and even if you could, it would often involve a large re-structuring of the code to remove the duplicate.
Consider if you have two loops, where the first loop’s tenth entry and the second loop’s eleventh entry are the same. For the user to remove this duplicate, they’d basically have to recreate the de-duplication logic already present in the synthdef compiler. I think most cases would look like this, rather than more obvious duplications.
There is no way for the compiler to distinguish between literally writing SinOsc.ar * SinOsc.ar and
If the user wants to see how it is optimised, you can always dumpUgens, but beyond that, I don’t think it is possible to provide useful information. Although this output could definitely be improved!
Yes, there are too many cases, too many reasons of why some ugen is removed, which make it hard to select those cases where it would be useful to see that this happens.
So let’s drop this part of the idea (posting information about removed ugens).
What would be the best way to strike and adjust a balance between optimisation and synth def build speed? And what would be the reasonable defaults?
I think that an improvement in optimisation should not slow down default build speed. So perhaps there is a middle ground that could be the default, and adjustments on how far to go into details?
Also, we could find cases that can be clearly distinguished, and passed to the SynthDef as an argument.
I’ve posted benchmarks of the different optimisation levels on github.
And what would be the reasonable defaults?
I think everything should be turned on when using SynthDef explicitly, but only sorting+rewriting enabled when using NDef. You could disable the rewriting for ndef, but things like inserting PV_Copy won’t work so there would be a discrepancy in behaviour.
Going off the benchmarks I made, this is a 50% to 100% performance increase. I don’t think this is that bad given it fixes other issues.