Why you should always wrap Synth(...) and Synth:set in Server.default.bind { ... }

but I’ve never gotten the point of multi-client. Why not have a single server and single client, the latter of which receives and executes messages from performers? This would allow individual pieces to manage ownership of synth nodes, busses, etc. in a way that’s tailored to the work.

I totally agree. I guess multi-client setups are mainly used for free-form collaborative live coding – which is quite niche, TBH.

Same goes for single-client remote server setups, I just don’t see the benefit over a client and server on the same device, with the client receiving and forwarding messages to the server.

Once you have the controller/client and processor/server completely decoupled – which is generally a good thing – you are now able to run them in different processes; the question is whether you really should. Most audio applications decide not to do it. In the case of SuperCollider, I think the rational was that there is no real downside to running scsynth in a separate process per default (which I think wasn’t always the case), and for some people it even has slight upsides. Of course, there is one big downside: client and server are running on different clocks and it is impossible to achieve deterministic (sub)sample accurate scheduling. But as I sketched out in my last post, this could be solved. (I remember I have written about this in more detail somewhere in the forum or on GitHub, but I can’t find it right now.)

If the context is embedded devices that only have the capacity to run scsynth, that’s more of a problem of the resource usage of sclang than a solid argument in favor of separate processes.

One nice thing about running the sclang process on the client machine is that you can use GUI objects. (You cannot run the GUI separate from sclang, at least not out of the box.)

Another point that I’m sure someone will mention is the ability to develop alternate clients in other languages. I don’t know the specifics of libscsynth, but shouldn’t it be at least theoretically possible to make an equivalent of an internal server in any language with a C FFI?

It surely is possible. I think you could already do this in Python or Lua with libscsynth. Having scsynth in another process is still useful for browser based clients, though. On the other hand, scsynth can already be compiled for WebAssembly (Add WebAssembly (wasm) target for scsynth (rebased) by dylans · Pull Request #5571 · supercollider/supercollider · GitHub).

I think that for clients written in scripting languages it can still be a good idea to run scsynth in a separate process; otherwise a Server crash would bring down the whole interpreter. (Remember that we don’t want to lose our unsaved project after a Server crash.) Of course, this is not relevant if the “editor” part is already implemented in a dedicated process.

Yeah. I still can’t find the benefit. I made something like this:

(fork{10.do{
a = ParGroup.new;
6.do{
	b = Group.new(a);
	10.do{{Out.ar(0, GVerb.ar(SinOsc.ar(rrand(2000,3000), 0, 0.0001)))}.play(b)};
};
0.2.wait;
}})

and I can get maybe get 600 of them going vs 400 on the normal server. Vs 7 servers, where I can get 2800. I would love it if someone could show a case where supernova just blows scserver out of the water, but I haven’t been able to find that case myself.

Sam

I would love it if someone could show a case where supernova just blows scserver out of the water

Generally, supernova will never be faster than multiple servers, but ideally it should get close. You would trade some performance for much increased flexibility.

(As a side note: you should also give each Group its own Bus and only sum into the hardware outputs after all Groups have completed. The idea is to keep all data access local and avoid synchronization to achieve better scalability. Again, this is not documented…)

I don’t have time right now to test your code, but I will do it later. If you’re interested in investigating this further, can you open a new thread?

2 Likes

Thank you very much!
This is highly informative and helpful!

I have two questions:

  1. Could Synth be used with .onFree and .register in ```s.bind``? There would be no obstacle, but I would like to know if there are things I am unaware of.

  2. Which is better when working with animation using the Pen class or when controlling windows? Should the animation be delayed, or should the synths not be wrapped by s.bind? There should be no significant difference, but I ask to be sure if there are things I am not aware of.

Hmmm, not sure. Those methods are wrappers for NodeWatcher method calls. I don’t see any sendMsg in NodeWatcher.sc but there could be something hiding.

There would be a big difference. If you don’t schedule the OSC commands, their timing will be audibly worse than if you did. So I’d recommend the following:

  1. If you want accurate timing, particularly in anything beat-based, you should wrap the synths in s.bind and delay any synced visuals by s.latency seconds.
  2. If you want instant response to external input, don’t use s.bind, and display the visuals immediately.
  3. If your visuals are audio reactive, e.g. a volume meter sending messages back to sclang with SendReply, that counts as external input, so the visuals should also be displayed immediately in that case.

I don’t use GUIs in supercollider these days so I’m not 100% sure I’ve covered all bases here, feel free to report back if you run into sync problems.

1 Like

As far as I know, yes. Register and onFree are purely language side; there is no need to send anything to the server except for the normal Synth messages (which are produced by other methods). They wait for replies from the server, nothing else.

With makeBundle and bind, the function runs now, and you get any objects created within the function now – and also the message(s) are sent now! But the outgoing bundle is timestamped to be performed later in the server.

Second question, I agree with Nathan completely. To delay the visuals, use { ... GUI stuff ... }.defer(s.latency) (defer already is a delay mechanism – we just normally delay by 0).

hjh

1 Like

Thank you for your kind answers!
I have more questions:

  1. The s.bind { ... } examples in your examples and in the server help document uses Out.ar to write the signal to the audio bus. Wouldn’t it be better in terms of timing accuracy to write the output of the SynthDef as OffsetOut.ar instead of Out.ar? OffsetOut.ar produces the correct sound in the following example:
(
fork { 
	SynthDef(\testOut, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf);
		Out.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	200.do { s.bind { Synth(\testOut) }; 0.01.wait } }
)

(
fork { 
	SynthDef(\testOffsetOut, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf);
		OffsetOut.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	200.do { s.bind { Synth(\testOffsetOut) }; 0.01.wait } }
)
  1. s.bind { ... } is shorter than s.makeBundle(0.2, { ... }), but it is still extra typing. Can it be enclosed by a function to reduce the typing? The enclosed s.bind { ... } by a function seems to work in the following example, but I am not sure what will happen if the language-side algorithms or SynthDef are more complex than the example:
(
fork { s.bind { Synth(\testOffsetOut, [freq: 440, out: 1]) };
	0.1.wait;
	s.bind { Synth(\testOffsetOut, [freq: 660, out: 1]) };
	0.1.wait;
	s.bind { Synth(\testOffsetOut, [freq: 880, out: 1]) } 
} 
)

(
fork { 
	var synth = { |freq| s.bind { Synth(\testOffsetOut, [freq: freq]) } };
	synth.(440);
	0.1.wait;
	synth.(660);
	0.1.wait;
	synth.(880)
} 
)

(
fork { 
	var synth = { |freq| s.bind { Synth(\testOffsetOut, [freq: freq]) } };
	synth.(440);
	s.bind { Synth(\testOffsetOut, [freq: 440, out: 1]) };
	0.1.wait;
	synth.(660);
	s.bind { Synth(\testOffsetOut, [freq: 660, out: 1]) };
	0.1.wait;
	synth.(880);
	s.bind { Synth(\testOffsetOut, [freq: 880, out: 1]) } 
} 
)
  1. Can { ... }.play be also used when SynthDef(...).play can be used? I think not, because { ... }.play takes extra time to be sent to the server when the code block is evaluated. However, in the following examples, { ... }.play seems to work well when the sound length and repeat interval are not extremely short:
( // seems to work
fork { 
	var synth = { |freq|
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play }
	};
	synth.(440);
	0.1.wait;
	synth.(660);
	0.1.wait;
	synth.(880);
	0.1.wait;
} 
)

( // does not work corretly:
fork { 
	var synth = { |freq| 
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play } 
	};
	200.do { s.bind { synth.(440); 0.01.wait } }
} 
)

( // seems to work
fork { 
	var synth, funcSynth;
	
	SynthDef(\testOffsetOut_, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(doneAction: Done.freeSelf);
		OffsetOut.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	funcSynth = { |freq| 
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play } 
	};
	synth = { |freq| 
		s.bind { Synth(\testOffsetOut_, [freq: freq, out: 1]) } 
	};
	
	funcSynth.(440);
	synth.(440);
	
	0.1.wait;
	
	funcSynth.(660);	
	synth.(660);
	
	0.1.wait;
	funcSynth.(880);
	synth.(880);
} 
)

Yes, but often it’s not critical.

If something is possible to execute outside of a function, then it’s possible to execute inside a function. (Actually everything runs inside a function. Interactive code gets compiled into a function, and then this function is executed just like any other.)

Here, it’s helpful to understand the message format instead of just regarding server abstractions as black boxes. SynthDef().play and {}.play both send a SynthDef-receive /d_recv message, with a second message (/s_new) embedded in it, to be executed when the SynthDef is ready for use. Whether this is a freestanding message or part of a bundle, doesn’t matter.

What is odd about it is that bind is used for timing control, but, because the sounding part (/s_new) is the completion message belonging to an asynchronous command, the sounding part will not be timed precisely. So you can but it won’t be exact (thus, not really much point to it).

hjh

1 Like

3 posts were merged into an existing topic: Opinionated Advice for SuperCollider Beginners

Have you tested the same code on Linux, where supernova was developed?

Please see Why you should always wrap Synth(...) and Synth:set in Server.default.bind { ... } - #10 by Spacechild1. The fundamental issue of synchronization/scheduling overhead is the same on every OS.


Another issue is that on Supernova every Synth gets its own wire buffers and local busses because it might execute in parallel with other Synths. This may cause significant memory overhead and cache misses. The smaller the Synths, the more pronounced the overhead. 16000 SynthOsc synths is probably the point where the model breaks down… But then again, it’s not exactly a realworld test scenario :slight_smile:

However, future parallel server implementations should take this issue into account!

I always wondered why that is. The number of DSP threads is known in advance and parallelism can’t exceed this. Wouldn’t it be enough to have one set of wire buffers per DSP thread?

hjh

Synths are not pinned to specific threads. On every DSP tick, the DSP tasks are pushed to a thread pool and any DSP thread might pop and execute them. The wire buffers, however, have to be set when the Synth is created.

As a side note: this is much less of a problem when the DSP graph is fixed. In fact, I haven working on a multi-threaded version of Pd (GitHub - Spacechild1/pure-data at multi-threading) and I only need to create new signal contexts at “fork points”.

In SuperCollider, however, the DSP graph can be rearranged freely and in real-time. Tricky stuff…

I see. I’d assumed that, in one DSP tick, one synth would execute on one thread (could be a different thread next time), end there can never be more than one synth active in one thread – so I naively thought that the synth node could use wire buffers belonging to the thread. (I should also assume that Tim considered that and rejected it for some reason.)

I had quite good results from supernova in a piece where I was playing a lot of chords, although I can’t use it live because my MixerChannel sorting logic has sometimes crashed supernova due to too many group moves in rapid succession.

hjh

(I should also assume that Tim considered that and rejected it for some reason.)

I think one limiting factor was the wish to stay compatible with scsynth as much as possible (which is a nice thing, course!). Alternatively, one possible solution could be to have wire buffers per ParGroup and fix up all Units in a Graph when moved between Groups. But this would require a significant change in the plugin API.

I would be great if you could find a somewhat reproducable example and open a ticket on GitHub. I have already fixed a few Supernova bugs in the past, so there is a good chance I can fix that as well. It would be great if Supernova were more stable and more users would feel more comfortable using it in their projects, so we can get more practical experience with various forms of real-life usage.

2 Likes

I thought that ParGroup was a designator only of Synths that could be parallelized (e.g. don’t depend on each other), not a designator of which thread/executor they would be executed on? In which case, things inside a ParGroup are guaranteed to be executed in parallel (or at least have a high likelihood of this). But possibly I’m misunderstanding your suggestion?

I would imagine that the optimal solution would require only one set of wirebufs that would be re-used for every Synth, and in case of parallelized graph execution you would just need one set per thread/executor (not one per Synth). It’s been a while since I’ve looked at the architecture of supernova, maybe it doesn’t follow this - but in any case it should be at least theoretically the best option.

In my (very anecdotal) experience, supernova can be lower overhead for high UGen count synths (meaning: lots of SinOsc’s for e.g. additive). I haven’t tested for lots of independent synths though. I can’t imagine this would be significantly different for non-threaded /no-ParGroup cases - Tim is a performance nerd and would never have released that :slight_smile: - but I could easily see a case where the cost of doing queue operations could overtake the cost of a trivial single SinOsc synth, in which case the performance would be noticably worse.

That’s correct. I phrased that sentence very poorly. What I meant was:

One possible solution could be having dedicated wire buffers for each toplevel Node within a ParGroup, but child nodes would reuse the wire buffers of their parent. This would guarantee that wire buffers are isolated, while avoiding the overhead of always having separate wire buffers for each and every Synth. However, this would require to fix up all Units in a Graph when it (or one of its parents) is moved between Groups.

I would imagine that the optimal solution would require only one set of wirebufs that would be re-used for every Synth, and in case of parallelized graph execution you would just need one set per thread/executor (not one per Synth)

For simple chains, that would be the ideal solution indeed. Unfortunately, Synths are graphs, so in practice you would need to traverse the SynthDef (more specifically, its unit specs and corresponding wire specs) and fix up all (audio-rate) wires for every Synth on every DSP tick - which would be prohibitively expensive.

but I could easily see a case where the cost of doing queue operations could overtake the cost of a trivial single SinOsc synth, in which case the performance would be noticably worse.

Yes, that’s exactly what I think happens when you try to run 16000 SinOsc Synths in a single ParGroup. (I want to do some benchmarks, actually.)

I think there is some low hanging fruit for optimization. Currently, Supernova makes the pessimistic assumption that a ParGroup may contain wildly different Synths/Nodes, so they are all scheduled as individual tasks. However, if all the Synths/Nodes are roughly equal in terms of CPU cost, it would be better to partition them into N tasks, where N is the number of DSP tasks. On the user-facing side, this could be implemented as an additional (optional) argument for ParGroup. I have already put this on my TODO list :slight_smile:

1 Like

Yes, looking over the wire implementations I think I was less clear on how this worked than I thought. I had imagined that the wirebufs were effectively storage for temporaries while calculating the Synth graph (where e.g. a totally linear chain like SinOsc.ar(SinOsc.ar(SinOsc.ar(440))) would be aliased into a single wire, not counting constants), and that UGen inputs and outputs were pre-calculated during the graph build process to indexes into this array of wires. The actual implementation doesn’t look exactly like this, I need to refresh my memory a bit here…

It already feels like a mistake that ParGroup is a user-facing object (graph partitioning afaik is a problem with a clear solution, or at least a solution that will do as good or better than any manual partitioning strategy a SuperCollider user might cook up - granted, given the design of the server, this is still tough to solve). I wonder if there’s a better / less manual approach here? Wouldn’t it be enough to sort the ParGroup by SynthDef when the graph is changed, and then have threads pull multiple nodes from the queue in cases where the total node count for a given SynthDef is significantly larger than the number of threads? The worst case of the sort would potentially ruin any performance benefits here, but there may be a pragmatic path to making this performant enough. It’s hard to imagine any solution that doesn’t make use of some kind of sort operation, however.

I guess there’s a meta-consideration here, which is that both SuperCollider servers are fundamentally not set up to efficiently process node counts in the many-thousands. I wonder if it’s worth the effort to optimize the “thousands of SinOsc Synths” case when this will always be highly non-optimal - and the general advice to any user would be to ABSOLUTELY avoid this. :slight_smile:

I had imagined that the wirebufs were effectively storage for temporaries while calculating the Synth graph (where e.g. a totally linear chain like SinOsc.ar(SinOsc.ar(SinOsc.ar(440))) would be aliased into a single wire, not counting constants), and that UGen inputs and outputs were pre-calculated during the graph build process to indexes into this array of wires.

That sounds about right. The actual wire buffers are just pointers into one contiguous array. In scsynth, the same array is used for all Synths and lives in HiddenWorld. In Supernova, each Synth has its own array.

Now, if you wanted to change the underlying array for a particular set of Synths, you would need to fixup all UGens so that the individual buffers point into the new array. For this you’d have to traverse the entire graph and lookup mWireIndex in the input and output specs of each Unit. You can probably get away with it if you only do it occasionally, but it’s not something you would do on each process tick.

It already feels like a mistake that ParGroup is a user-facing object (graph partitioning afaik is a problem with a clear solution).

In my understanding, graph partitioning only works if all the connections between nodes are static and visible. This is typically the case in a DAW, as VST plugins are only connected through their audio input and outputs. (Users can, of course, change the routing, in which case the graph would need to be recomputed.)

In scsynth/supernova, however, Nodes resp. UGens communicate via busses and bus indexes can be set dynamically. Moreover, I/O Ugens are effectively blackboxes, just like any other UGen, so the Server is not even aware of them.

IMO, ParGroup makes a lot of sense. One general problem with scsynth is that when you look at a Node, it is not immediately clear whether its children are supposed to run in series or parallel. ParGroup effectively says that all children are independent from each other, i.e. conceptually they run in parallel. Group, on the other hand, implies that children may run in series. Supernova happens to use this information to enable multiprocessing where possible, but in general I think it also helps to clarify the structure of the graph. If you mentally substitute Group with SerGroup, ParGroup starts to make more sense :slight_smile:

One slight issue I have with Supernova’s multiprocessing is that it only supports “fork/join” multiprocessing. Another approach is “asynchronous pipelining” which also allow to parallalize serial signal chains, albeit with one-block-delays. My experimental multithreaded Pd fork (GitHub - Spacechild1/pure-data at multi-threading) actually supports both.

Actually, Tim discusses and evaluates several multiprocessing strategies in his (amazing) master thesis. I cannot recommend it enough. It’s a joy to read!

Wouldn’t it be enough to sort the ParGroup by SynthDef

ParGroup may contain other Groups. You could compare them recursively to figure out if they are equivalent, but I have the feeling it would be better to just let the user tell the Server…

I guess there’s a meta-consideration here, which is that both SuperCollider servers are fundamentally not set up to efficiently process node counts in the many-thousands.

Yes! One thing that would help is true multi-channel processing á la Max/MSP (and the upcoming Pd 0.54!). Sclang’s “multi-channel expansion” tends to hide the fact that the Server is fundamentally single-channel. Multi-channel processing not only improves cache locality, it also allows to vectorize certain operations that would be otherwise impossible to vectorize, such as oscillators or filters. With proper AVX instructions you can effectively compute 8 oscillators for the price of 1 (well, almost :slight_smile: ).