Proposal: SynthDefs could eliminate duplicate UGens

Does Sum3 and simd_unit guarantee memory alignment?

First, I’m not sure what SIMD has to do with the topic at hand. Second, you can trust that the people who wrote the SIMD code knew what they were doing. (Yes, wire buffers are always properly aligned for SIMD.)

All ugens would need to implement those optimizations

UGens themselves don’t need to implement anything. These optimizations are done purely in the language by the SynthDef builder. In the case of Sum3 and Sum4, a chain of 2 resp. 3 BinaryOpUGens (+) is replaced by a single UGen; the reasoning is that doing the 3 resp. 4 additions in a single process function is much more efficient (fewer function calls, better pipelining).

build with LTO

Why would you need to build with LTO? In fact, LTO is practically irrelevant for most plugins because they typically consist of a single source file.

I think James’ method is safer since I’m not sure all elements will always be polished. Of course, that’s a guess.

It’s not an either-or question, these are completely different optimizations. It just happens that there a specific cases where they appear to conflict with each other.

where you say that a chain of binary compiled agents constructs a chain

I did not say that. Again, these optimizations happen in the language during SynthDef construction. Scsynth itself does no graph optimizations at all. Hence all this discussions about SIMD, compiler flags, etc. is moot because it obviously doesn’t apply to sclang.

I would suggest to actually study the relevant parts in the ClassLibrary (in particular optimizeGraph).

I didn’t say that.

Ok, so it functions as Mix. supercollider/SCClassLibrary/Common/Audio/Mix.sc at develop · supercollider/supercollider · GitHub

I misunderstood what Sum3 was since the word optimization was circulation around, and I may confused about it

(sorry, no time to check the codebase for that now)

word optimization was circulation around

It was not “circulating around”, James proposed a specific optimization with a working implementation. If you had a quick look at the actual code, you would have noticed that it was not about C++ at all.

Don’t jump to quick conclusions! It tends to derail the discussion.

Mix is like bread and butter in sc. I didn’t provide any absurd information per se; they are also ways to do things, even if it’s not in the codebase today. Sorry if I disturbed the discussion. (Although I think there is no need for exaggeration in public messages) – Thus, I should also apologize to the SC community for suboptimal behavior in this discussion, a community I love so much.

But also about defining what is redundant at what level.

The rather flawed benchmark I posted shows evidence that the number of ugen calls should be minimised as the act of calling something out weights any cache optimisation — but that is a rather limited benchmark and doesn’t quite replicate how cache would work.

SynthDef(\s, {
	|a, b, c|
	var sig = a + b + c;
	Out.kr(0, sig)
}).dumpUGens

[0_Control, control, nil]
[1_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[2_Out, control, [0, 1_Sum3]]

Since your optimisation pass happens after the ugens are created it seems fine in the above case and the question I raised doesn’t affect anything.

One interesting case, though not directly related to this proposal is when the user has already abstracted out the common operation. In the two examples below, the (admittedly flawed) benchmark I posted would suggest that the second is about 1/6 faster.

SynthDef(\s, {
	|a, b, c, d|
	var sig = a + b;
	Out.kr(1, sig + d);
	Out.kr(0, sig + c)
}).dumpUGens

[0_Control, control, nil]
[1_+, control, [0_Control[0], 0_Control[1]]]
[2_+, control, [1_+, 0_Control[3]]]
[3_Out, control, [1, 2_+]]
[4_+, control, [1_+, 0_Control[2]]]
[5_Out, control, [0, 4_+]]
SynthDef(\s, {
	|a, b, c, d|
	Out.kr(1, a + b + d);
	Out.kr(0, a + b + c)
}).dumpUGens

[0_Control, control, nil]
[1_Sum3, control, [0_Control[3], 0_Control[1], 0_Control[0]]]
[2_Out, control, [1, 1_Sum3]]
[3_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[4_Out, control, [0, 3_Sum3]]

There could be a compiler pass to transform one to the other. In this case it is relatively simple and could be done with a little bit of regex.

I’ve worked directly with graph representation, which has been extensively studied. I would suggest using graph representation for anything, really. Regex for this kind of thing?

I only meant it would be an easy transformation…
Edit: although checking it isn’t used after might be a bit hard and I’m not sure supercollider already has this functionality built in.
Edit 2: nope, this is harder than I though…

Yes, a pattern matching on a graph.

Example of a simplified pattern matching (not actual work code, just for simple illustration; for type-safe pattern matching, one could use type classes to define methods and transformation. A ugen GADT could also help encapsulate types, etc)

optimizeUGen :: UGen -> UGen
optimizeUGen (AddUGen (Const 0) u) = optimizeUGen u
optimizeUGen (AddUGen u (Const 0)) = optimizeUGen u
optimizeUGen (MulUGen (Const 1) u) = optimizeUGen u
optimizeUGen (MulUGen u (Const 1)) = optimizeUGen u
optimizeUGen (MulUGen (Const 0) _) = Const 0
optimizeUGen (MulUGen _ (Const 0)) = Const 0
optimizeUGen (AddUGen (Const a) (Const b)) = Const (a + b)
optimizeUGen (MulUGen (Const a) (Const b)) = Const (a * b)

// simple recursion examples
optimizeUGen (AddUGen u1 u2) = 
  let ou1 = optimizeUGen u1
      ou2 = optimizeUGen u2
  in case (ou1, ou2) of
       (Const a, Const b) -> Const (a + b)
       _ -> AddUGen ou1 ou2

optimizeUGen (MulUGen u1 u2) = 
  let ou1 = optimizeUGen u1
      ou2 = optimizeUGen u2
  in case (ou1, ou2) of
       (Const a, Const b) -> Const (a * b)
       _ -> MulUGen ou1 ou2

optimizeUGen (AddUGen (AddUGen a b) c) = Sum3 (optimizeUGen a) (optimizeUGen b) (optimizeUGen c)
optimizeUGen (AddUGen a (AddUGen b c)) = Sum3 (optimizeUGen a) (optimizeUGen b) (optimizeUGen c)

optimizeUGen ugen = ugen

Feel free to have a go implementing this.

1 Like

I did already. I posted the results and benchmarks. What precisely are you referring to?

Turning this…

… into this…

Sorry I didn’t realise this had been done before!!! Got a link? Would love to see how it is done and updates all the descendants.

The snippet updates the entire graph (everything is done with recursion in haskell, it’s really nothing special there). I did this with my own ’ ugens’; I wrote from scratch using Haskell and C++. We can have a video call about it. I design better in haskell, but it’s possible to just use c++, the topological sorting, buffer pool and other things I wrote in c++

Please, I’m not sure if there was sarcasm in your message. But let’s learn with each other, it’s better.

Ah, that’s cheating! The challenge is to do it on the 20 year old legacy code.

No, I’m not being sarcastic, but there has perhaps been a miscommunication. I thought the point of this thread is about improving sc without breaking backwards compatibility (given its tagged as development), not designing a new system from scratch, which of course, given hindsight, would be significantly easier to achieve this task with.

It’s not about easier or harder. The idea that worked in my toy engine also fits the GraphDef structure— it’s the same. That’s my starting point. I may add something more, but that’s irrelevant here. The matter of discussion is working with graphs and using specific techniques.

That’s the development and experimentation of the same data structure. Thus, the same algorithms will work with Haskell, c++20, or C99. If you can’t touch the GraphDef, that’s a design issue that needs to be resolved.

Sure, I’m just trying to keep this discussion on track, which is about a specific proposal that targets the current codebase. That being said, there are indeed other ways of optimizing the SynthDef graph – I have my own thoughts about this topic – but this should be discussed in another thread.

The only thing I mentioned was suggesting pattern matching on graphs instead of using REGEX, which is just a bit too hackish, to speak frankly. If that doesn’t add anything, I don’t know what.

I’m not pushing any idea, but it’s just a good idea for that.

I will leave this thread now. Thanks

… continuing as normal I guess??


Perhaps this could be its own topic, but I think its deeply connected to what is being discussed, let me know and I’ll move it if needed!

I’ve figured out how the optimisation pass works a little better now and have been able to get it to reduce more additions into Sum3s and Sum4s.

It was a simple matter of altering BinaryOpUGen:optimize(Sum3|Sum4|MulAdd...) to:

  1. stop checking that there is only one descendant
  2. stop removing the ‘input’.
  3. rerun optimizeGraph in SynthDef to retrigger the deadcode elimination.

This runs the risk of creating duplicates, which works well with the rest of @jamshark70’s code!

I think the optimisation passes could be rewritten. Its very odd that the graph (SynthDef) delegates to the nodes (UGen) to do all of the optimisation and mutate the state of the entire graph.

Dead code elimination is one such optimisation that could probably be done at the graph level.


Here are some examples.

A clear win.

SynthDef(\s, {
	|a, b, c, d|
	var sig = a + b;
	Out.kr(1, sig + d);
	Out.kr(0, sig + c);
}).dumpUGens

// Old
[0_Control, control, nil]
[1_+, control, [0_Control[0], 0_Control[1]]]
[2_+, control, [1_+, 0_Control[3]]]
[3_Out, control, [1, 2_+]]
[4_+, control, [1_+, 0_Control[2]]]
[5_Out, control, [0, 4_+]]

// New
[0_Control, control, nil]
[1_Sum3, control, [0_Control[3], 0_Control[1], 0_Control[0]]]
[2_Out, control, [1, 1_Sum3]]
[3_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[4_Out, control, [0, 3_Sum3]]

A draw, perhaps?

SynthDef(\s, {
	|a, b, c, d|
	var sig = a + b;
	Out.kr(1, sig + d);
	Out.kr(0, sig + c);
	Out.kr(4, sig);
}).dumpUGens

// old
[0_Control, control, nil]
[1_+, control, [0_Control[0], 0_Control[1]]]
[2_+, control, [1_+, 0_Control[3]]]
[3_Out, control, [1, 2_+]]
[4_+, control, [1_+, 0_Control[2]]]
[5_Out, control, [0, 4_+]]
[6_Out, control, [4, 1_+]]

// new
[0_Control, control, nil]
[1_+, control, [0_Control[0], 0_Control[1]]]
[2_Out, control, [4, 1_+]]
[3_Sum3, control, [0_Control[3], 0_Control[1], 0_Control[0]]]
[4_Out, control, [1, 3_Sum3]]
[5_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[6_Out, control, [0, 5_Sum3]]

A clear win, but needs @jamshark70 duplicate removable to improve further.

SynthDef(\t, {
	|a, b, c, d|
	var com = a + b;
	Out.kr(0, com + c);
	Out.kr(1, com + c);
}).dumpUGens;

// old
[0_Control, control, nil]
[1_+, control, [0_Control[0], 0_Control[1]]]
[2_+, control, [1_+, 0_Control[2]]]
[3_Out, control, [0, 2_+]]
[4_+, control, [1_+, 0_Control[2]]]
[5_Out, control, [1, 4_+]]

// new 
[0_Control, control, nil]
[1_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[2_Out, control, [0, 1_Sum3]]
duplicate>> [3_Sum3, control, [0_Control[2], 0_Control[1], 0_Control[0]]]
[4_Out, control, [1, 3_Sum3]]

Again, all this is based on the assumption that Sum3 and Sum4 are cheap and that UGen calls are expensive.

1 Like

Nice work!

I have thought the same! If the optimization passes were done “on” the UGens, instead of “in” the UGens, it would be much easier to reason about. Whenever I’m looking at the current graph optimization code, my head immediately begins to hurt :smiley:

Perhaps this could be its own topic

IMO it would warrant a new topic. Also because this thread is already quite long.