Proposal: Refactor synthdef ugen optimizations

muellmusik · July 25, 2024, 2:09am

That thread perhaps contains some useful discussion from when this came up a couple of years ago.

I posted a couple of test cases here: Classlib: Invert the SynthDef topological sort (to fix a wide-graph case) by jamshark70 · Pull Request #5811 · supercollider/supercollider · GitHub

muellmusik · July 25, 2024, 2:25am

I have this weird feeling that was tried and there was still some issue (there was a lot of discussion about this), but sounds good!

muellmusik · July 25, 2024, 2:27am

Again would be nice to have a clear spec that can at least be tested against. I had a go in the linked GH thread, but was never confirmed:

AFAIK, this has never been explicitly documented (!) but here’s a rough go:

Any UGens that have a side-effect (e.g. modifying a buffer, sending a message) need to occur in the chain before any UGens that come later in the user-specified order, to ensure that side-effects occur in the right order.

Where possible, reorder depth first to allow wireBufs to be reused.

Where appropriate, use code-elimination strategies to remove unused UGens, unnecessary MulAdds etc.

NB 1. is crucial and top priority, while 2. and 3. are nice to haves. If unsure, we should implement with that in mind.

Is that spec complete and accurate? Is it worded correctly and precisely?

jordan · July 25, 2024, 7:22am

Okay so there seems to be 4 cases that creat additional edges in the graph. These are properties of nodes (ugens).

Bus access.
This involves creating an edge between the last access and the next. We could avoid creating edges between consecutive groups of Outs and Ins as the order doesn’t matter, although it does with replace out making this a little complicated.
Buffer access.
Same as before, but for groups of reads and writes.
Random state.
There needs to be an edge from the last random seed to all ugens that touch random state - these ugens should also have an edge to the next random state change. This assumes things like LFNoise can be reordered within a random seed boundary (seems reasonable?)
All other state. If plugins touch state that isn’t listed above they need a way of indicating so. This can just do the dumb thing and creat edges between each subsequent node that has this property.

@muellmusik your first point mentions send messages, should that be included here? Does message order matter? (If so I guess it’s currently broken as it’s not width first?).

jordan · July 25, 2024, 9:49am

With two types of edges (children: strong, io order: weak) depth first sorting can be re-defined as limiting the length of the strong edges.

muellmusik · July 25, 2024, 3:48pm

Yeah, I don’t have a specific use case where for example multiple sends in the same synth need to come in the indicated order, but I think the standard should rather be, can we rule that out? I don’t think we can.

In principle a synth should always behave like the explicitly specified order, so if there’s any doubt I’d say treat as width first.

This is partly why I idly wondered if the sort was even still worth it. Happy to accept that the answer is probably yes, but memory and CPU are way cheaper than they were decades ago, so it seemed worth asking.

Spacechild1 · July 25, 2024, 4:43pm

See my comment in Proposal: Refactor synthdef ugen optimizations - #39 by Spacechild1. That’s why I was asking if anyone can present an example where there’s a significant and consistent performance difference. And again, we must not forget to test on Supernova, where each Synth has its own set of wire buffers!

muellmusik · July 25, 2024, 6:14pm

I suppose small computers may have more constrained resources, so might be worth testing on a Pi or Beagle?

Spacechild1 · July 25, 2024, 7:19pm

The crucial factor are the cache sizes. Isolated micro benchmarks often have the problem that the cache effects only really come into play in larger applications.

Just for reference: A 64Kb L1 Cache can theoretically hold (64 × 1024 / 4 / 64) = 256 wire buffers at a 64 sample block size. (In practice it is less because if associativity.) A 512 KB L2 cache can hold 2048 wire buffers. Above that buffers will start to spill into the L3 cache (which is shared by several cores). With scsynth it is hard to reach this limit, but in Supernova it can happen easily if you instantiate lots of Synths. I’m curious about my test results

Side note: this really shows a design problem with Scsynth (which Supernova was forced to inherit): the Ugen input and output pointers are fixed at construction and owned by the Unit. If the pointers were instead passed to the process function (which is typical for audio plugins), you’d only need a set of buffers per thread.

jordan · July 26, 2024, 9:39am

Not being pedantic about terms but I was thinking about this little use of the term ‘topological’ and implying that the ‘non-sorted’ isn’t a topological ordering (which is what the code implies via names) and I think it captures why this code is complex.

The default way the ugens are added to the children array is a topological ordering - it’s just not a good ordering for limiting the number of wirebufs.

This is important because the code is treating this children array as the execution order (which by definition must be a topological order), which makes doing replacements or insertions complex as you need to preserve a valid execution order (a valid topological order) while the nuanced logic of inserting, deleting and replacing ugen is taking place.

Juggling these two constrains is why the code appears written ‘inside-out’.

By removing this children/execution-order array and storing the ugens directly as antecedents and descendants making changes to the graph is significantly easier, as the ordering is no longer a concern, but dealt with in one final step.

Topological sorting helps keep the code easier to understand by separating concerns, assuming all the constraints are expressed in the graph (which currently they are not).

Limiting wirebufs is just one such topological ordering and whether we adopt it, I think, should be independent of whether we sort the graph which we should always do.

muellmusik · July 26, 2024, 6:17pm

Well, yes, in that strict sense you don’t really have a choice as the sclang syntax forces you to do this. Nevertheless I think it’s fair to say that the intended order of execution in sclang is very clear, if sometimes ‘inside-out’. (if I understand you correctly).

jordan · July 26, 2024, 6:30pm

I meant the way the SytnhDef optimisations are written is inside out as you have to mutate the graph from inside the nodes (ugens), keeping a valid topological order as you make substitutions. The addCopies from the PV ugens is one example of this, it continually calls things like indexUGens, which are graph wide process, rather than just changing the edges and creating new nodes.

muellmusik · July 26, 2024, 7:41pm

Yeah I think that goes back to the original addToSynth. I remember being surprised when I first realised that buildSynthDef is a classvar of UGen, but I don’t know what the original rationale was, and it’s changed a lot since then.

muellmusik · July 28, 2024, 1:51pm

Actually this could be better worded. UGens with side-effects need to be correctly ordered relative to UGens which depend on either the new or the previous state. A relevant example is ‘multichannel expanded’ PV UGens. The implicit PV_Copy instances need to run before anything later in the chain modifies the buffer. (Thus width first.)

jordan · July 28, 2024, 7:11pm

This is how I’m suggesting this be solved, by inserting ‘weak’ edges.

From

To

This way, PV1 can’t be executed before the copies, but the copies can still be nicely sorted to avoid excessive wire bufs.

So an execution order might be

Buffer
PV_Copy1
PV2
PV_Copy2
PV3
PV1

Whereas strictly evaluating them width first is unnecessary. Technically, the buffer could also be recycled and the copy avoided in certain situations, but that is a little more advanced than needed here (at least for now).

smoge · July 28, 2024, 7:23pm

That’s also the case after using and adapting graph libraries in Haskell. Both start with edges to manage dependencies and construct the graphs, which is the design used in most implementations.

The main difference is that sc constructs synthesis graphs step-by-step since that’s how the synthdef dsl works. Building a graph explicitly involves clearly defining the nodes and edges in most implementations. I believe most work elsewhere is done this way.

The solution I came up with for side effects was to have a type class or just a simple function with priorities. Elsewhere, it can be used in the topo sort recursive code (go style), for example, sortedSuccessors = sortBy (comparing (priorityFn ...., etc. (of course, type classes allow more fine-tunning)

instance HasSideEffect UGen where
  hasSideEffect = \case
    PV_Copy{} -> True
    PV_MagAbove{} -> True
    Out{} -> True
    _ -> False

sideEffectPriority :: UGen -> Int
sideEffectPriority ugen = case ugen of
  PV1     -> 1
  PV_Copy -> 2
  Buffer  -> 3
  _       -> 0

EDIT: I think recycling wire buffers, especially large ones, is usually possible. The benefit of large graphs with 50+ UGens can be major.

EDIT2: This is one of the code pearls: strong components can be discovered “on the fly” during a depth-first search—good correct references: https://logarithmic.net/pfh-files/blog/01208083168/tarjan.py AND https://cs.stanford.edu/~knuth/fasc12a+.pdf#page=10 (dates back to 1972). Besides, graphs are a well-studied problem. Just take a look around and see if there is a feeling about our old code.

jordan · July 28, 2024, 7:50pm

One oddity is groups of sequential Ins or Outs, which don’t need to be ordered, but ReplaceOut does. This is perhaps important because without this stipulation, there could be limits on the ordering of multichannel branches that end in an Out.

Spacechild1 · July 28, 2024, 11:48pm

This sounds like a great and – in hindsight! – obvious solution. Some UGens have implicit interdependencies and expressing these with additional graph edges makes them visible to the graph sorting algorithm.

BTW, some of these dependencies cannot be automatically deduced and need to expressed explicitly by the user. One common example is a BufWr followed by a BufRd:

var sig = SinOsc.ar;
var phase = Phasor.ar(0, 1, 0, BufFrames.kr(buf));
var writer = BufWr.ar(sig, buf, phase);
var reader = BufRd.ar(1, buf, phase - 44100, loop: 1);

(Taken from firstArg question - #2 by jamshark70)
Nothing in the example above guarantees that BufWr is scheduled before BufRd.
Currently, the workaround is to use the (rather obscure) !< operator:

var sig = SinOsc.ar;
var phase = Phasor.ar(0, 1, 0, BufFrames.kr(buf));
var writer = BufWr.ar(sig, buf, phase);
var reader = BufRd.ar(1, buf <! writer, phase - 44100, loop: 1);

This ensures that BufWr runs before BufRd. Note that <! actually creates a BinaryOpUGen with the operator firstArg which just discards the second argument.

I think a better solution would be to have a method that adds a “weak edge” between two UGens, e.g.:

var sig = SinOsc.ar;
var phase = Phasor.ar(0, 1, 0, BufFrames.kr(buf));
var writer = BufWr.ar(sig, buf, phase);
var reader = BufRd.ar(1, buf, phase - 44100, loop: 1).after(writer);

IMO this is much easier to understand than the <! operator. Also, it wouldn’t require an additional UGen.

muellmusik · July 28, 2024, 11:56pm

I really feel like adding an extra layer of specification is a mistake here, and should be unnecessary. The user has already explicitly specified intended order. This is straightforward and intuitive. The sorting algorithm should only optimise away from that where it is guaranteed there will be no issue. Yes, you could fix an exceptional case if needed with a method like that, but not violating the principle of least surprise should be a far higher priority than say minimising wirebuf usage in my opinion.

In the vast majority of cases an extra wirebuf is harmless. Better to supply a weird exception or some guidance for super memory or CPU constrained cases if that’s really needed.

muellmusik · July 29, 2024, 12:00am

Indeed! Similarly, another tricky one might be two unconnected chains of UGens, in the same synth, one with an In and one with an Out. (I don’t have a real example, but I think it’s conceivable.) The sort must not optimise the in chain to be after the out chain, as the Out will mix with what’s already on the bus, creating feedback where it’s not intended.