Why you should always wrap Synth(...) and Synth:set in Server.default.bind { ... }

I would love this to be the solution. But in practice I just don’t get great performance with Supernova. I can run 16000+ SinOsc synths on a single sc_server without any distortion. A ParGroup on supernova can’t even play 2000. Maybe I’m doing it wrong, but proper documentation is…lacking.

I agree the multi-server approach is a kludge, but it is a very powerful one. Proper multi-threading would certainly be welcome though.

Sam

Thanks Christof, very much appreciate your expertise on this.

Maybe this is because I avoid networked setups in my practice in the interest of minimizing complexity and failure points, but I’ve never gotten the point of multi-client. Why not have a single server and single client, the latter of which receives and executes messages from performers? This would allow individual pieces to manage ownership of synth nodes, busses, etc. in a way that’s tailored to the work. Does anyone use multi-client setups outside of the live coding community?

Same goes for single-client remote server setups, I just don’t see the benefit over a client and server on the same device, with the client receiving and forwarding messages to the server. If the context is embedded devices that only have the capacity to run scsynth, that’s more of a problem of the resource usage of sclang than a solid argument in favor of separate processes.

Another point that I’m sure someone will mention is the ability to develop alternate clients in other languages. I don’t know the specifics of libscsynth, but shouldn’t it be at least theoretically possible to make an equivalent of an internal server in any language with a C FFI?

I can run 16000+ SinOsc synths on a single sc_server without any distortion. A ParGroup on supernova can’t even play 2000. Maybe I’m doing it wrong, but proper documentation is…lacking.

Dispatching a Node to a helper thread has a (rougly) constant cost. If your Synths are very lightweight, this cost can easily outweight the benefits of parallelization. Instead of putting all your tiny Synths directly in a ParGroup, you should rather distribute them into a few Groups and put those into the ParGroup. The Groups will be executed in parallel, but Synths inside each Group will run sequentially.

This is called “data partitioning” and it is an important concept for writing parallel programs. The basic idea is minimize the ratio between dispatching/synchronization cost and actual workload.

Unfortunately, the documentation of ParGroup is very sparse and does not really explain how to use it effectively…

but I’ve never gotten the point of multi-client. Why not have a single server and single client, the latter of which receives and executes messages from performers? This would allow individual pieces to manage ownership of synth nodes, busses, etc. in a way that’s tailored to the work.

I totally agree. I guess multi-client setups are mainly used for free-form collaborative live coding – which is quite niche, TBH.

Same goes for single-client remote server setups, I just don’t see the benefit over a client and server on the same device, with the client receiving and forwarding messages to the server.

Once you have the controller/client and processor/server completely decoupled – which is generally a good thing – you are now able to run them in different processes; the question is whether you really should. Most audio applications decide not to do it. In the case of SuperCollider, I think the rational was that there is no real downside to running scsynth in a separate process per default (which I think wasn’t always the case), and for some people it even has slight upsides. Of course, there is one big downside: client and server are running on different clocks and it is impossible to achieve deterministic (sub)sample accurate scheduling. But as I sketched out in my last post, this could be solved. (I remember I have written about this in more detail somewhere in the forum or on GitHub, but I can’t find it right now.)

If the context is embedded devices that only have the capacity to run scsynth, that’s more of a problem of the resource usage of sclang than a solid argument in favor of separate processes.

One nice thing about running the sclang process on the client machine is that you can use GUI objects. (You cannot run the GUI separate from sclang, at least not out of the box.)

Another point that I’m sure someone will mention is the ability to develop alternate clients in other languages. I don’t know the specifics of libscsynth, but shouldn’t it be at least theoretically possible to make an equivalent of an internal server in any language with a C FFI?

It surely is possible. I think you could already do this in Python or Lua with libscsynth. Having scsynth in another process is still useful for browser based clients, though. On the other hand, scsynth can already be compiled for WebAssembly (Add WebAssembly (wasm) target for scsynth (rebased) by dylans · Pull Request #5571 · supercollider/supercollider · GitHub).

I think that for clients written in scripting languages it can still be a good idea to run scsynth in a separate process; otherwise a Server crash would bring down the whole interpreter. (Remember that we don’t want to lose our unsaved project after a Server crash.) Of course, this is not relevant if the “editor” part is already implemented in a dedicated process.

Yeah. I still can’t find the benefit. I made something like this:

(fork{10.do{
a = ParGroup.new;
6.do{
	b = Group.new(a);
	10.do{{Out.ar(0, GVerb.ar(SinOsc.ar(rrand(2000,3000), 0, 0.0001)))}.play(b)};
};
0.2.wait;
}})

and I can get maybe get 600 of them going vs 400 on the normal server. Vs 7 servers, where I can get 2800. I would love it if someone could show a case where supernova just blows scserver out of the water, but I haven’t been able to find that case myself.

Sam

I would love it if someone could show a case where supernova just blows scserver out of the water

Generally, supernova will never be faster than multiple servers, but ideally it should get close. You would trade some performance for much increased flexibility.

(As a side note: you should also give each Group its own Bus and only sum into the hardware outputs after all Groups have completed. The idea is to keep all data access local and avoid synchronization to achieve better scalability. Again, this is not documented…)

I don’t have time right now to test your code, but I will do it later. If you’re interested in investigating this further, can you open a new thread?

2 Likes

Thank you very much!
This is highly informative and helpful!

I have two questions:

  1. Could Synth be used with .onFree and .register in ```s.bind``? There would be no obstacle, but I would like to know if there are things I am unaware of.

  2. Which is better when working with animation using the Pen class or when controlling windows? Should the animation be delayed, or should the synths not be wrapped by s.bind? There should be no significant difference, but I ask to be sure if there are things I am not aware of.

Hmmm, not sure. Those methods are wrappers for NodeWatcher method calls. I don’t see any sendMsg in NodeWatcher.sc but there could be something hiding.

There would be a big difference. If you don’t schedule the OSC commands, their timing will be audibly worse than if you did. So I’d recommend the following:

  1. If you want accurate timing, particularly in anything beat-based, you should wrap the synths in s.bind and delay any synced visuals by s.latency seconds.
  2. If you want instant response to external input, don’t use s.bind, and display the visuals immediately.
  3. If your visuals are audio reactive, e.g. a volume meter sending messages back to sclang with SendReply, that counts as external input, so the visuals should also be displayed immediately in that case.

I don’t use GUIs in supercollider these days so I’m not 100% sure I’ve covered all bases here, feel free to report back if you run into sync problems.

1 Like

As far as I know, yes. Register and onFree are purely language side; there is no need to send anything to the server except for the normal Synth messages (which are produced by other methods). They wait for replies from the server, nothing else.

With makeBundle and bind, the function runs now, and you get any objects created within the function now – and also the message(s) are sent now! But the outgoing bundle is timestamped to be performed later in the server.

Second question, I agree with Nathan completely. To delay the visuals, use { ... GUI stuff ... }.defer(s.latency) (defer already is a delay mechanism – we just normally delay by 0).

hjh

1 Like

Thank you for your kind answers!
I have more questions:

  1. The s.bind { ... } examples in your examples and in the server help document uses Out.ar to write the signal to the audio bus. Wouldn’t it be better in terms of timing accuracy to write the output of the SynthDef as OffsetOut.ar instead of Out.ar? OffsetOut.ar produces the correct sound in the following example:
(
fork { 
	SynthDef(\testOut, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf);
		Out.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	200.do { s.bind { Synth(\testOut) }; 0.01.wait } }
)

(
fork { 
	SynthDef(\testOffsetOut, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf);
		OffsetOut.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	200.do { s.bind { Synth(\testOffsetOut) }; 0.01.wait } }
)
  1. s.bind { ... } is shorter than s.makeBundle(0.2, { ... }), but it is still extra typing. Can it be enclosed by a function to reduce the typing? The enclosed s.bind { ... } by a function seems to work in the following example, but I am not sure what will happen if the language-side algorithms or SynthDef are more complex than the example:
(
fork { s.bind { Synth(\testOffsetOut, [freq: 440, out: 1]) };
	0.1.wait;
	s.bind { Synth(\testOffsetOut, [freq: 660, out: 1]) };
	0.1.wait;
	s.bind { Synth(\testOffsetOut, [freq: 880, out: 1]) } 
} 
)

(
fork { 
	var synth = { |freq| s.bind { Synth(\testOffsetOut, [freq: freq]) } };
	synth.(440);
	0.1.wait;
	synth.(660);
	0.1.wait;
	synth.(880)
} 
)

(
fork { 
	var synth = { |freq| s.bind { Synth(\testOffsetOut, [freq: freq]) } };
	synth.(440);
	s.bind { Synth(\testOffsetOut, [freq: 440, out: 1]) };
	0.1.wait;
	synth.(660);
	s.bind { Synth(\testOffsetOut, [freq: 660, out: 1]) };
	0.1.wait;
	synth.(880);
	s.bind { Synth(\testOffsetOut, [freq: 880, out: 1]) } 
} 
)
  1. Can { ... }.play be also used when SynthDef(...).play can be used? I think not, because { ... }.play takes extra time to be sent to the server when the code block is evaluated. However, in the following examples, { ... }.play seems to work well when the sound length and repeat interval are not extremely short:
( // seems to work
fork { 
	var synth = { |freq|
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play }
	};
	synth.(440);
	0.1.wait;
	synth.(660);
	0.1.wait;
	synth.(880);
	0.1.wait;
} 
)

( // does not work corretly:
fork { 
	var synth = { |freq| 
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play } 
	};
	200.do { s.bind { synth.(440); 0.01.wait } }
} 
)

( // seems to work
fork { 
	var synth, funcSynth;
	
	SynthDef(\testOffsetOut_, { |freq = 440, out = 0|
		var sig, env;
		sig = SinOsc.ar(freq) * 0.1;
		env = Env.perc(0.01, 0.05, 0.2).ar(doneAction: Done.freeSelf);
		OffsetOut.ar(out, sig * env)
	}
	).add;
	
	s.sync;
	
	funcSynth = { |freq| 
		s.bind { { SinOsc.ar(freq) * 0.1 * Env.perc(0.01, 0.05, 0.2).ar(Done.freeSelf) }.play } 
	};
	synth = { |freq| 
		s.bind { Synth(\testOffsetOut_, [freq: freq, out: 1]) } 
	};
	
	funcSynth.(440);
	synth.(440);
	
	0.1.wait;
	
	funcSynth.(660);	
	synth.(660);
	
	0.1.wait;
	funcSynth.(880);
	synth.(880);
} 
)

Yes, but often it’s not critical.

If something is possible to execute outside of a function, then it’s possible to execute inside a function. (Actually everything runs inside a function. Interactive code gets compiled into a function, and then this function is executed just like any other.)

Here, it’s helpful to understand the message format instead of just regarding server abstractions as black boxes. SynthDef().play and {}.play both send a SynthDef-receive /d_recv message, with a second message (/s_new) embedded in it, to be executed when the SynthDef is ready for use. Whether this is a freestanding message or part of a bundle, doesn’t matter.

What is odd about it is that bind is used for timing control, but, because the sounding part (/s_new) is the completion message belonging to an asynchronous command, the sounding part will not be timed precisely. So you can but it won’t be exact (thus, not really much point to it).

hjh

1 Like

3 posts were merged into an existing topic: Opinionated Advice for SuperCollider Beginners

Have you tested the same code on Linux, where supernova was developed?

Please see Why you should always wrap Synth(...) and Synth:set in Server.default.bind { ... } - #10 by Spacechild1. The fundamental issue of synchronization/scheduling overhead is the same on every OS.


Another issue is that on Supernova every Synth gets its own wire buffers and local busses because it might execute in parallel with other Synths. This may cause significant memory overhead and cache misses. The smaller the Synths, the more pronounced the overhead. 16000 SynthOsc synths is probably the point where the model breaks down… But then again, it’s not exactly a realworld test scenario :slight_smile:

However, future parallel server implementations should take this issue into account!

I always wondered why that is. The number of DSP threads is known in advance and parallelism can’t exceed this. Wouldn’t it be enough to have one set of wire buffers per DSP thread?

hjh

Synths are not pinned to specific threads. On every DSP tick, the DSP tasks are pushed to a thread pool and any DSP thread might pop and execute them. The wire buffers, however, have to be set when the Synth is created.

As a side note: this is much less of a problem when the DSP graph is fixed. In fact, I haven working on a multi-threaded version of Pd (GitHub - Spacechild1/pure-data at multi-threading) and I only need to create new signal contexts at “fork points”.

In SuperCollider, however, the DSP graph can be rearranged freely and in real-time. Tricky stuff…

I see. I’d assumed that, in one DSP tick, one synth would execute on one thread (could be a different thread next time), end there can never be more than one synth active in one thread – so I naively thought that the synth node could use wire buffers belonging to the thread. (I should also assume that Tim considered that and rejected it for some reason.)

I had quite good results from supernova in a piece where I was playing a lot of chords, although I can’t use it live because my MixerChannel sorting logic has sometimes crashed supernova due to too many group moves in rapid succession.

hjh

(I should also assume that Tim considered that and rejected it for some reason.)

I think one limiting factor was the wish to stay compatible with scsynth as much as possible (which is a nice thing, course!). Alternatively, one possible solution could be to have wire buffers per ParGroup and fix up all Units in a Graph when moved between Groups. But this would require a significant change in the plugin API.

I would be great if you could find a somewhat reproducable example and open a ticket on GitHub. I have already fixed a few Supernova bugs in the past, so there is a good chance I can fix that as well. It would be great if Supernova were more stable and more users would feel more comfortable using it in their projects, so we can get more practical experience with various forms of real-life usage.

2 Likes

I thought that ParGroup was a designator only of Synths that could be parallelized (e.g. don’t depend on each other), not a designator of which thread/executor they would be executed on? In which case, things inside a ParGroup are guaranteed to be executed in parallel (or at least have a high likelihood of this). But possibly I’m misunderstanding your suggestion?

I would imagine that the optimal solution would require only one set of wirebufs that would be re-used for every Synth, and in case of parallelized graph execution you would just need one set per thread/executor (not one per Synth). It’s been a while since I’ve looked at the architecture of supernova, maybe it doesn’t follow this - but in any case it should be at least theoretically the best option.

In my (very anecdotal) experience, supernova can be lower overhead for high UGen count synths (meaning: lots of SinOsc’s for e.g. additive). I haven’t tested for lots of independent synths though. I can’t imagine this would be significantly different for non-threaded /no-ParGroup cases - Tim is a performance nerd and would never have released that :slight_smile: - but I could easily see a case where the cost of doing queue operations could overtake the cost of a trivial single SinOsc synth, in which case the performance would be noticably worse.