Imperfection of language-based timing

This goes back to an observation of my colleague Gerhard Eckel. I posted this a while ago on the mailing list, but can’t find it anymore in the archives. As I think it’s an interesting topic I’m reposting it here again for future reference.

Timing in general is a crucial topic and practically users are mainly confronted with asynchronicity between server and language, latency, quantization, the Out/OffsetOut differences etc. Sample-accurate timing isn’t relevant in all contexts, nevertheless in certain situations it’s absolutely necessary.

Starting point: you can have sample-accurate timing with synths, but not with language based sequencing in RT. The reason for this is a kind of calibration that different clocks, responsible for SC timing and RT output must perform. Frankly I am not aware of the details, there has been a lengthy discussion and there was no consensus if this inaccuracy could be circumvented at all. Practically a workaround is the usage of NRT synthesis, therewith you can get sample-accuracy also with patterns, this is shown in the second part of the example.

The following results come from SC 3.9.3 on OSX with built-in audio and standard driver blocksize 512 (samplerate 44100), other settings might even produce more deviations.

Note that SynthDef ‘dirac’ already uses OffsetOut, so this inaccuracy is independant from the rough one you get with Out alone – see the OffsetOut helpfile for the reason for this (so for lang-based timing with short durations always use OffsetOut).

(
s = Server.local;
Server.default = s;
s.boot;
)


(
// use store for later NRT usage,
// define with dummy channel here for use with pattern recording,
// play reference pulse train of synth to channel 0, "real" Pbind pulse train to channel 1
// we need an out arg anyway for aPattern.record

SynthDef(\diracs, { |out = 0, duration = 60|
    OffsetOut.ar(out, [Impulse.ar(10), DC.ar(0)] * EnvGen.ar(Env([1, 1, 0], [duration + 0.01, 0]), doneAction: 2))
}).store;

SynthDef(\dirac, { |out = 0|
    OffsetOut.ar(out, [DC.ar(0), FreeSelf.kr(Impulse.ar(1))])
}).store;

// Synth as Pbind for pattern recording, needs legato = 1 to get equal duration
p = Pbind(\instrument, \diracs, \duration, 60);
q = Pbind(\instrument, \dirac, \dur, 0.1);
)


(
// record 60 sec of both pulse trains in realtime to compare,
// wait until finishing confirmed in post window
// mute to avoid playing hard clicks to speakers

c = TempoClock.();
t = 60;
s.volume.mute;

~date = Date.getDate.stamp;
~fileName = "diracsRT_synthL_patR" ++ "_" ++ ~date ++ ".aiff";

// Reference pulse train (Pbind with one synth) needs to know overall duration
r = Ppar([p <> Pbind(\dur, Pn(t, 1)), q]);

r.record(~fileName, clock: c, dur: t + 0.05);

// need CmdPeriod here as record doesn't stop (broken)
c.sched(t + 3, { CmdPeriod.run; s.volume.unmute });
)


(
// helper function for index search
~indicesOfEqualWithPrecision = { |seq, item, prec = 0.0001|
    var indices;
    seq.do { |val, i|
        if (item.equalWithPrecision(val, prec)) { indices = indices.add(i) }
    };
    indices
};

// analysing function

~analyse = { |fileName|
    ~soundFile = SoundFile.openRead(fileName);
    ~data = FloatArray.fill(~soundFile.numFrames * ~soundFile.numChannels, 0);
    ~soundFile.readData(~data);
    ~soundFile.close;
    ~stereo = ~data.clump(2).flop;
    ~dataL = ~stereo[0];
    ~dataR = ~stereo[1];


    // get indices of diracs
    ~leftIndices = ~indicesOfEqualWithPrecision.(~dataL, 1);
    ~rightIndices = ~indicesOfEqualWithPrecision.(~dataR, 1);

    // look at deltas
    ~leftDeltas = ~leftIndices.differentiate.drop(1);
    ~rightDeltas = ~rightIndices.differentiate.drop(1);

    // count occurences of (possibly) different deltas

    ~leftDeltaSet = ~leftDeltas.asSet.collect { |x| [x, ~leftDeltas.occurrencesOf(x)] };
    ~rightDeltaSet = ~rightDeltas.asSet.collect { |x| [x, ~rightDeltas.occurrencesOf(x)] };

    "".postln;
    "occurences of sample deltas with single synth: ".postln;
    ~leftDeltaSet.postln;"".postln;

    "occurences of sample deltas with Pbind: ".postln;
    ~rightDeltaSet.postln;"".postln;
};
)


// analyse, takes some seconds

~analyse.(~fileName)

->

occurences of sample deltas with single synth: 
Set[ [ 4410, 600 ] ]

occurences of sample deltas with Pbind: 
Set[ [ 4446, 3 ], [ 4374, 3 ], [ 4410, 594 ] ]


// verify equal start

~leftIndices[0]

~rightIndices[0]


/////

(
// render 60 secs of the same stereo audio generation in NRT mode

c = TempoClock.();
t = 60;
s.volume.mute;

~date = Date.getDate.stamp;
~fileName = "diracsRT_synthL_patR" ++ "_" ++ ~date ++ ".aiff";

// Reference pulse train (Pbind with one synth) needs to know overall duration
r = Ppar([p <> Pbind(\dur, Pn(t + 0.05, 1)), q]);

// to be passed to render stereo
o = ServerOptions.new.numOutputBusChannels = 2;
r.render(~fileName, t + 0.05, options: o);
)

~analyse.(~fileName)

// perfect NRT timing
->

occurences of sample deltas with single synth: 
Set[ [ 4410, 600 ] ]

occurences of sample deltas with Pbind: 
Set[ [ 4410, 600 ] ]

Daniel

3 Likes

Here are the results of running this experiment on my laptop, sc 3.10-beta on arch linux 64bit, kernel 4.18.5-arch1-1-ARCH, with jack using 1024 samples buffer size and 48000 Hz sampling rate.

On my system instead, I get:

occurences of sample deltas with single synth:
Set[ [ 4800, 600 ] ]

occurences of sample deltas with Pbind:
Set[ [ 4800, 600 ] ]

Which looks quite ok to me.

Gives:
→ 7001
→ 7001

1 Like

Thanks for the check, that’s very interesting !
So, if I got it right, you are reporting sample-accuracy with language based RT-timing ?
I’m very surprised that this is the case.

On my OSX tests the calibration happened every 10-20 secs or so, it might be that there are larger calibration intervals on linux. So you could check with e.g. t = 200 or 300 instead of 60.

BTW with 48000 Hz on OSX and built-in audio standard driver blocksize 512 I get similar results as with 44100 Hz

occurences of sample deltas with single synth: 
Set[ [ 4800, 600 ] ]

occurences of sample deltas with Pbind: 
Set[ [ 4834, 1 ], [ 4800, 592 ], [ 4833, 2 ], [ 4799, 2 ], [ 4767, 3 ] ]

With t=300, things don’t look as rosy anymore…

occurences of sample deltas with single synth:
Set[ [ 4800, 600 ] ]

occurences of sample deltas with Pbind:
Set[ [ 4810, 42 ], [ 4762, 4 ], [ 4803, 93 ], [ 4794, 77 ], [ 4756, 1 ], [ 4815, 11 ], [ 4792, 48 ], [ 4822, 36 ], [ 4827, 13 ], [ 4771, 3 ], [ 4770, 7 ], [ 4817, 13 ], [ 4808, 75 ], [ 4764, 7 ], [ 4834, 3 ], [ 4755, 3 ], [ 4765, 7 ], [ 4837, 4 ], [ 4818, 15 ], [ 4841, 3 ], [ 4790, 26 ], [ 4820, 26 ], [ 4782, 47 ], [ 4833, 8 ], [ 4838, 3 ], [ 4786, 29 ], [ 4816, 18 ], [ 4780, 40 ], [ 4775, 9 ], [ 4813, 28 ], [ 4789, 30 ], [ 4821, 19 ], [ 4832, 8 ], [ 4781, 40 ], [ 4814, 14 ], [ 4828, 19 ], [ 4757, 2 ], [ 47…etc…

Thanks for rechecking, that looks rather normal (unfortunately)

Ah, yes, for the single synth, duration would have to be replaced too, but the result would be exact anyway (probably)

The best we can do in RT is checking different combinations of samplerate, hardware (driver) and hardware driver buffer size to minimize deviations. IIRC with my setup a buffer size of 512 gave the best results.

You should write SynthDef(...).add, otherwise the parameters like duration are not sent from the pattern.

With .add and t=300 the image is not very different

→ a Function

occurences of sample deltas with single synth:
Set[ [ 4800, 600 ] ]

occurences of sample deltas with Pbind:
Set[ [ 4832, 10 ], [ 4762, 8 ], [ 4801, 139 ], [ 4753, 2 ], [ 4824, 38 ], [ 4834, 6 ], [ 4813, 38 ], [ 4785, 39 ], [ 4759, 2 ], [ 4774, 15 ], [ 4817, 20 ], [ 4755, 1 ], [ 4841, 2 ], [ 4831, 7 ], [ 4770, 9 ], [ 4788, 38 ], [ 4758, 1 ], [ 4805, 89 ], [ 4792, 46 ], [ 4833, 4 ], [ 4793, 50 ], [ 4823, 25 ], [ 4816, 33 ], [ 4795, 73 ], [ 4806, 84 ], [ 4775, 11 ], [ 4830, 14 ], [ 4803, 95 ], [ 4781, 39 ], [ 4778, 25 ], [ 4780, 26 ], [ 4827, 19 ], [ 4768, 6 ], [ 4765, 8 ], [ 4783, 54 ], [ 4842, 1 ], [ 4822, 24 ], […etc…

aha, in this case it probably makes no difference.

‘store’ is needed for the NRT part of the example and is generally ok with patterns, not only in this case,
from the help: “Write the defFile and store it in the SynthDescLib specified by libname”.

So duration as extra arg is passed correctly:

SynthDef(\diracs, { |out = 0, duration = 60, amp = 0.1|
    OffsetOut.ar(out, [Impulse.ar(10), DC.ar(0)] * 
         EnvGen.ar(Env([amp, amp, 0], [duration + 0.01, 0]), doneAction: 2))
}).store;

p = Pbind(\instrument, \diracs, \duration, 0.5, \dur, Pn(1, 1), \amp, 0.3).play

Ultimately, a comment of James in a thread on the mailing list has shed more light into this: the vast amount of jitter seems to be produced by system action (NTP)! After a switch to OS 10.15 the large jumps in the realtime variants analyzed in the first post of this thread – and confirmed by @shiihs on Linux – have gone away! Besides, if you want to check the realtime tests, take the variants below which contain an additional path definition necessary for newer SC versions.

https://www.listarc.bham.ac.uk/lists/sc-users/msg69793.html

On older OS systems disabling ”Set date and time automatically” in system preferences should do the trick, I’m not aware of the necessary terminal commands on Linux but some googling (NTP) should help.

Besides, tests on supernova show sample-accurate timing with the original examples whereas scsynth gives maximum deviations of just one sample.

I’m very happy about this finding because it enables nice synthesis options with patterns being unreliable so far.

E.g., a pulsar stream like that didn’t sound smoothly. With new settings and max deviations of one sample this is perfectly ok now (at least for my ears). There might be cases where the supernova variant could turn out to be better, I’d have to do further tests.

// ATTENTION: with NTP you get nasty and sometimes loud bumps every some seconds

(
SynthDef(\sinePerc, { |out, freq = 400, att = 0.005, rel = 0.05, amp = 0.1|
	OffsetOut.ar(out, EnvGen.ar(Env.perc(att, rel), doneAction: 2) * SinOsc.ar(freq, 0, amp) ! 2)
}).add;
)

(
x = Pbind(
	\instrument, \sinePerc,
	\freq, 500,
	\dur, 0.005,
	\att, 0.002,
	\rel, 0.002
).play
)

x.stop

It could also be interesting for Windows users to check those issues, I have no machine available at the moment.

Results on OS 10.15.7 and SC 3.11.2 (and on OS 10.10.5 and SC 3.9.3 with disabled automatic date and time setting)

The latency-based timing variant goes back to a suggestion of Christof Ressi to take only one logical time, it doesn’t make a difference in this case, though.

scsynth language-based timing (test 1): max deviation of 1 sample
scsynth latency-based timing (test 2): max deviation of 1 sample

supernova language-based timing (test 1): sample-accurate
supernova latency-based timing (test 2): sample-accurate

There were no principle differences between 44.1 and 48 kHz. Further testing should check longer run times than one minute and other interfaces (checked with built-in out now).

// for tests with scsynth
(
Server.scsynth;
s.reboot;
)

// for tests with supernova
(
Server.supernova;
s.reboot;
)

// prepare for test 1 - language-scheduled timing
(
// use store for later NRT usage,
// define with dummy channel here for use with pattern recording,
// play reference pulse train of synth to channel 0, "real" Pbind pulse train to channel 1
// we need an out arg anyway for aPattern.record

SynthDef(\diracs, { |out = 0, duration = 60|
    OffsetOut.ar(out, [Impulse.ar(10), DC.ar(0)] * EnvGen.ar(Env([1, 1, 0], [duration + 0.01, 0]), doneAction: 2))
}).store;

SynthDef(\dirac, { |out = 0|
    OffsetOut.ar(out, [DC.ar(0), FreeSelf.kr(Impulse.ar(1))])
}).store;

// Synth as Pbind for pattern recording, needs legato = 1 to get equal duration
p = Pbind(\instrument, \diracs, \duration, 60);
q = Pbind(\instrument, \dirac, \dur, 0.1);
)


// prepare for test 2 - latency-based timing
(
// use store for later NRT usage,
// define with dummy channel here for use with pattern recording,
// play reference pulse train of synth to channel 0, "real" Pbind pulse train to channel 1
// we need an out arg anyway for aPattern.record

SynthDef(\diracs, { |out = 0, duration = 60|
    OffsetOut.ar(out, [Impulse.ar(10), DC.ar(0)] * EnvGen.ar(Env([1, 1, 0], [duration + 0.01, 0]), doneAction: 2))
}).store;

SynthDef(\dirac, { |out = 0|
    OffsetOut.ar(out, [DC.ar(0), FreeSelf.kr(Impulse.ar(1))])
}).store;

// Synth as Pbind for pattern recording
p = Pbind(\instrument, \diracs, \duration, 60);

~latency = s.latency;

// we need a finite Pattern !
q = Pbind(
	\instrument, \dirac,
	\dur, 0,
	// dur sequencing to be inserted here
	\realDur, Pn(0.1, 601),
	\latency, Pfunc { |ev|
		var latency = ~latency;
		~latency = ~latency + ev[\realDur];
		latency
	}
);
)


// run test 1 or 2

(
// record 60 sec of both pulse trains in realtime to compare,
// wait until finishing confirmed in post window
// mute to avoid playing hard clicks to speakers

c = TempoClock.();
t = 60;
s.volume.mute;

~date = Date.getDate.stamp;
~fileName = Platform.recordingsDir +/+ "diracsRT_synthL_patR" ++ "_" ++ ~date ++ ".aiff";

// Reference pulse train (Pbind with one synth) needs to know overall duration
r = Ppar([p <> Pbind(\dur, Pn(t, 1)), q]);

r.record(~fileName, clock: c, dur: t + 0.05);

// need CmdPeriod here as record doesn't stop (broken)
c.sched(t + 3, { CmdPeriod.run; s.volume.unmute });
)


(
// helper function for index search
~indicesOfEqualWithPrecision = { |seq, item, prec = 0.0001|
    var indices;
    seq.do { |val, i|
        if (item.equalWithPrecision(val, prec)) { indices = indices.add(i) }
    };
    indices
};

// analysing function

~analyse = { |fileName|
    ~soundFile = SoundFile.openRead(fileName);
	~data = FloatArray.fill(~soundFile.numFrames * ~soundFile.numChannels, 0);
    ~soundFile.readData(~data);
    ~soundFile.close;
    ~stereo = ~data.clump(2).flop;
    ~dataL = ~stereo[0];
    ~dataR = ~stereo[1];


    // get indices of diracs
    ~leftIndices = ~indicesOfEqualWithPrecision.(~dataL, 1);
    ~rightIndices = ~indicesOfEqualWithPrecision.(~dataR, 1);

    // look at deltas
    ~leftDeltas = ~leftIndices.differentiate.drop(1);
    ~rightDeltas = ~rightIndices.differentiate.drop(1);

    // count occurences of (possibly) different deltas

    ~leftDeltaSet = ~leftDeltas.asSet.collect { |x| [x, ~leftDeltas.occurrencesOf(x)] };
    ~rightDeltaSet = ~rightDeltas.asSet.collect { |x| [x, ~rightDeltas.occurrencesOf(x)] };

    "".postln;
    "occurences of sample deltas with single synth: ".postln;
    ~leftDeltaSet.postln;"".postln;

    "occurences of sample deltas with Pbind: ".postln;
    ~rightDeltaSet.postln;"".postln;
};
)


// analyse, takes some seconds

~analyse.(~fileName)


// results of test 1 and 2 with supernova (44100 Hz):
->

occurences of sample deltas with single synth:
Set[ [ 4410, 600 ] ]

occurences of sample deltas with Pbind:
Set[ [ 4410, 600 ] ]


// results of test 1 and 2 with scsynth (of course second set can vary)
->

occurences of sample deltas with single synth:
Set[ [ 4410, 600 ] ]

occurences of sample deltas with Pbind:
Set[ [ 4409, 8 ], [ 4411, 52 ], [ 4410, 540 ] ]
2 Likes

Hi all,
Im working on a project in which I want to play long audio files and keep them synced with the language. Offcourse I ran into some timing issue. My issue is that there appears to be a fluctuating variation between the time at the server and the TempoClock upto 10 ms. Sample accuracy is not strictly needed, but I do want it to be in the order of 1ms. A simple example which shows the issue is:

t = TempoCLock; // LinkClock.new gives comparable results
o = Bus.control;
(
SynthDef("test",{
	var out;
	out = Sweep.kr();
	Out.kr(o,out);
}).play
)

d = o.getSynchronous - t.beats; // offset
(o.getSynchronous - t.beats - d) * 1000; // difference between TempoClock and synth Sweep in ms, is spitting out values between -10 and 10 for me

1000/s.sampleRate*s.options.blockSize; // blocksize in ms

An error in the order of the blockSize/control rate, due to reading from a control bus, I would expect. With my settings the blockSize equals 1.3ms, but the errors vary between -10 and 10ms. Important to note is that this is not an OSC latency issue: I am using getSynchronous to directly get the value from the control bus.

Also I have performed the test as described in the first post of this thread, and the results are actually quite good:

occurences of sample deltas with single synth:  Set[[4800, 600]]
occurences of sample deltas with Pbind: Set[[4800, 413], [4801, 45], [4802, 1], [4799, 141]]

I guess the fact that this last performance tests yields good results, should provide insight in what is going well/wrong, but I cant solve the puzzle yet myself. It might have to do with getSynchronous? But using another test I have verified that it indeed is synchronous, I can read and write to a control bus within a control period. So I know that there is no latency in reading the control bus. So that point me in the direction of the TempoClock. However the results of the last test debunk that, as it shows that Pbind, which relies on the clock, gives quite accurate scheduling results.

What do you guys think?

It’s actually quantized to the hardware block size, not the control block size. A 512-sample block would come to ~11 ms jitter.

hjh

Wait, is there a difference? I though controlRate is once each hardware block. And currently I have s.options.blockSize = 64. Which ~ 1.3ms.

No, not the same. See s.options.blockSize and s.options.hardwareBufferSize. (Though, as I understand it, setting hardwareBufferSize requests the driver to use this buffer size – whether the driver respects that or not is not completely under control of SC. And in Linux, hardwareBufferSize is ignored – it’s a JACK or Pipewire setting.)

If the audio driver is configured for a 512-sample buffer, and the SC block size is 64, then there are 8 control blocks per hardware buffer. SC processes all 8 of those as fast as possible. getSynchronous, then, has access only to the bus value at the end of that 8-block run. (getSynchronous means that its reply is synchronous, as distinguished from asynchronous. It doesn’t imply super-high-resolution timing.)

To gain timing access to control blocks in the middle of a hardware buffer cycle, use OSC timestamps. The /c_get replies will be asynchronous. This may be tricky to handle, depending on how fast you’re polling the bus, but the accuracy is much better.

s.boot;

(
fork {
	var cond = CondVar.new;
	var start;
	var replyCount = 0;
	var num = 5;
	var resp;
	
	b = Bus.control(s, 1);
	a = { Sweep.kr(0, 1) }.play(outbus: b);
	
	// ensure the synth is running
	OSCFunc({
		cond.signalAll
	}, '/n_go', s.addr, argTemplate: [a.nodeID]).oneShot;
	cond.wait;
	
	start = thisThread.beats;
	
	z = Array(num);
	
	// keeping the async replies lined up with the array is tricky
	// this is basically queuing the replies up
	resp = OSCFunc({ |msg|
		z[replyCount][1] = msg[2];
		replyCount = replyCount + 1;
		cond.signalAll;		
	}, '/c_set', s.addr, argTemplate: [b.index]);
	
	fork {
		num.do {
			z = z.add([
				thisThread.beats - start + s.latency,
				nil,
				b.getSynchronous + s.latency
			]);
			
			s.makeBundle(s.latency, { b.get { nil } });
			
			0.1.wait;
		};
	};
	
	cond.wait { replyCount >= num };
	
	a.free; b.free; resp.free;
	
	z.do(_.postln);
	"".postln;
	z.flop.collect(_.differentiate).do(_.postln);
};
)

[0.2, 0.18140590190887, 0.22321995422244]
[0.3, 0.28154194355011, 0.31609977483749]
[0.4, 0.38167801499367, 0.40897959172726]
[0.5, 0.4818140566349, 0.50185940861702]
[0.6, 0.58195012807846, 0.61795918345451]

[0.2, 0.1, 0.1, 0.1, 0.1]
[0.18140590190887, 0.10013604164124, 0.10013607144356, 0.10013604164124, 0.10013607144356]
[0.22321995422244, 0.092879820615053, 0.092879816889763, 0.092879816889763, 0.11609977483749]

The last block measures the time differences:

  1. Language side clock: accurate.
  2. Timestamped c_get measurements: 0.10013 vs 0.1, that’s pretty close. Maybe due to the difference between s.sampleRate and s.actualSampleRate.
  3. getSynchronous: 0.092 vs 0.1, quite poor.

hjh

2 Likes

Okay thanks! Good to understand that there is a difference between SC blocksize and hardware blocksize.

I can currently only use and test my laptops onboard Realtek audio device. And I cannot find which buffersize it uses.

I’m not sure if it is true that the hardwareBlocksize determines the poor synchronous read performance. The following simple test shows that I am able to read synchronous at 64 samples resolution, which coincides with SC blocksize, and perhaps, but probably not with hardware blocksize.

t = TempoCLock; // LinkClock.new gives comparable results
o = Bus.control;
(
SynthDef("test",{
	var out;
	out = Sweep.kr();
	Out.kr(o,out);
}).play
)

~old = 0;

(// see if there is a whole number of blocksize between last two synchronous reads, for different assumed buffersizes
var new, ar;
new = o.getSynchronous;
ar = [(new - ~old)/(64/s.sampleRate),(new - ~old)/(128/s.sampleRate),(new - ~old)/(256/s.sampleRate),(new - ~old)/(512/s.sampleRate)];
~old = new;
ar;
)
-> [584.99908447266, 292.49954223633, 146.24977111816, 73.124885559082]
-> [473.00720214844, 236.50360107422, 118.25180053711, 59.125900268555]
-> [1536.9873046875, 768.49365234375, 384.24682617188, 192.12341308594]
-> [1125.0, 562.5, 281.25, 140.625]
-> [750.0, 375.0, 187.5, 93.75]
-> [2040.0009155273, 1020.0004577637, 510.00022888184, 255.00011444092]
-> [674.99542236328, 337.49771118164, 168.74885559082, 84.37442779541]
-> [765.0146484375, 382.50732421875, 191.25366210938, 95.626831054688]
-> [1898.0026245117, 949.00131225586, 474.50065612793, 237.25032806396]
-> [540.00091552734, 270.00045776367, 135.00022888184, 67.500114440918]
-> [666.98455810547, 333.49227905273, 166.74613952637, 83.373069763184]

So the poor timing quality of getSynchronous is not due to it only having access once every 512 samples, it has access every 64 samples.

I am still curious what mechanism then is responsible for the results.

But, I get your advice of using OSC messages with timestamps. I moved away from that because I found it rather hard/annoying, but I can manage that… But also since it introduces a big latency; and I would like to use it with live midi input.
It seems as if I need to chose between accuracy and latency. Are there anyways in Supercollider to get a nice compromise? i.e. latency in the order of 10ms and accuracy in the order of 1ms?

1 Like

As you said, you need to chose between these two. Generally, accuracy can not be smaller than the minimum latency. So the only way to improve timing is to decrease the hardware buffer size.


@jamshark70 is right that both the minimum latency and the timing accuracy for (immediate) OSC messages depends on the hardware buffer size and not on the Server control block size. This is because incoming OSC messages are only dispatched once per audio callback, before processing one or more DSP ticks (as fast as possible).

So the poor timing quality of getSynchronous is not due to it only having access once every 512 samples, it has access every 64 samples.

Out.kr writes to the control bus on every DSP tick, so the possible output of getSynchronous is not restricted. However, the time granularity of the synchronous buffer access is effectively quantized to the hardware buffer size (in the worst case), similar to immediate OSC messages, because there is a timing gap between the audio callbacks. (Remember: the audio callback runs one or more DSP ticks in quick succession.)

Fun fact: a higher Server CPU load improves timing accuracy for synchronous bus reads because the individual DSP ticks take so long that they are spread out more evenly over the duration of the callback.

Finally, be aware that there is always language jitter! TempoClock is not guaranteed to wake up at the exact specified time interval because other tasks might be blocking the interpreter. (It also depends on the OSC scheduler.) That’s why you need to work with logical time (= scheduled OSC messages) if you need precise timing of Server events.

2 Likes

great discussion - would be a very good info for a Help doc…

1 Like

Hi,

Okay, thanks! Let me try to get this right:

  • the hardware buffer (s.options.hardwareBufferSize) describes the chunks of digital audio fed to the audio device which converts is to analog audio output;
  • the CPU needs to calculate all this audio before handing it over to the audio device, so it has exactly the period of hardware buffer to do this. The moment of the audio device requesting/requiring a new buffer of audio, is called the audio callback.
  • normally, in other audio software, the cpu calculates this audio also using the same buffer size, but in supercollider the audio server typically computes several smaller chunks (s.options.blockSize) as fast as possible after each other. I will call this the server buffer. Computing such a (smaller) chuck of audio on the CPU is also referred to as a DSP tick. And the rate at which this happens is also referred to as a control cycle, since the control rate signals are computed once every DSP tick.
  • so typically in Supercollider for each audio callback there are several DSP ticks.

OSC messages

  • by the structure of supercollider, the standard way of communicating from language to server is by OSC messages, which have a considerable latency, upto 200ms.
  • I am still confused about how OSC messages are handled.
  • So I have two questions. 1) Are OSC messages processed every audio callback or every DSP tick? It would also be interesting to know why (: Futhermore, 2) how does SCsynth use this information to schedule things, and to what extent is it then possible to accurately time things? Because according to jamschak70 it is possible to accurately time things with OSC messages, while according to Spacechild1 OSC timing accuracy depends on the hardware buffer size (which means it is not so accurate). The description of SubsampleOffsetOut reads “When a synth is created from a time stamped osc-bundle, it starts calculation at the next possible block (normally 64 samples).” But this does not implicate when the OSC messages are processed.
  • once OSC messages are processed, things are scheduled up to a control cycle. If you want more precise scheduling one can use OffsetOut or SubsampleOffsetOut.

Synchronous access

  • control rate signals written to a bus can be read and written synchronously from the language (meaning that they are read/written just in the way language variable are read/written). So there is no OSC latency.
  • in the case the hardware buffer is 512 samples and the server buffer is 64 samples, and the CPU load is not so high. The CPU will compute things as fast as possible, so on each audio callback it will compute 8 DSP ticks directly after each other. On each DSP tick it caculates 64 samples of audio signal and one sample of control rate signal, and with Out it writes this to some low level buffer associated to the bus. So in time the moments, x, of writing down the control rate signal looks like: xxxxxxx_____________(long pause)______________xxxxxxxx. They are written fastly 8 times after each other at the start of an audio callback, and then there is a pause until the next audio callback. So even though getSynchronous is instant, it is not accurate since the server does not write to the bus evenly spaced through time.

Accuracy and latency

  • with OSC messages there is always a latency; you can timestamp OSC messages (make sure the timestamp is farther ahead then the latency), and the server will schedule the thing at the appropriate DSP tick; you can get more accuracy with OffsetOut at the cost of additional latency;
  • with synchronous access you can communicate to the server immediately, but accuracy depends on hardware buffer vs server buffer and CPU load, worst case accuracy is hardware buffer, best case it is close to server buffer;
  • so I went down the road to communicate to the server with synchronous access for timing critical stuff in a live setting, because it removes the OSC latency; I see two ways to improve accuracy: 1) decrease the hardware buffer, 2) make something like OffsetOut (you could use a additional bus to communicate from language to server the precise timing point you like, and SynchronousOffsetOut should then make sure that what you want happens exactly at that moment). By the way this synchronous access method is much more limited than OSC messages; you can only write to a bus, you cannot create a bus or a synth or whatever.

A lot of repetition, I know, just trying to get this straight for myself (and others).

I would like to end with a more philosophical point. To me music is all about rhythm and timing. So I am a bit, lets say, confused that it appears to be quite a hassle to get timing right in SuperCollider. Before starting to use SuperCollider I did quite some research on alternatives, and I chose for SuperCollider mainly for how active its user base is, how elaborate the documentation is, and how many UGen there are. I also looked at ChucK, which might be a better suite for my need/interest for timing, but it appeared to be more academic and have a smaller user base, and maybe dying in some years. However, I do still believe it would be very nice if Supercollider would make it as easy and transparent as possible to achieve proper timing. I have the feeling that it is currently a bit of a thing for more advanced users, while this should not be case, in my humble opinion. And and and, I would like to put some work in this, under some guidance, and with the disclaimer that my work tempo is irregular and general slow. The direction I am thinking is providing better documentation, and making something like this SynchronousOffsetOut. However, real improvements would more be to get rid of the OSC communication protocol to reduce latency, but that’s a whole different story off course, and I have not the slightest clue about what I am even talking.

The actual transmission time of OSC messages may be < 1 ms on the local machine. Message timestamps need to account for hardware buffer duration, which may be up to 50 ms. 200 ms is not a real number – it’s purely made up to be much bigger than any situation you’d encounter on a local machine.

Now, think about this for a minute.

If the server responds to the audio callback by processing n number of control blocks as fast as possible… what happens if an OSC message arrives in the middle of the hardware buffer duration, but all the control blocks have already finished processing?

The only possible thing to do at that point is to wait until next hardware buffer.

This is why OSC messages without timestamps are processed on DSP ticks.

But, if a message is timestamped for hardware buffer number 17384, control block #4 of 8 – if that message arrives before the indicated hardware buffer, then the server knows how many control blocks to wait before processing that message.

it is possible to accurately time things with OSC messages if there’s a timestamp.

If there is no timestamp, then timing accuracy depends on hardware buffer size.

It would not be valid to compare timing accuracy of timestamped vs non-timestamped messages.

They must arrive before the hardware buffer during which they need to be processed. They will be processed on the control block tick corresponding to their timestamp (or, if no timestamp, then as soon as possible). It would not make sense to have a timestamped OSC message belonging to control block #7 of 8 and process it at the beginning of the hardware buffer – you don’t want that synth to start at the beginning of the hardware buffer because it was scheduled for the 7th block – it makes sense to process the OSC message just as it’s needed.

hjh

Okay, thanks! Let me try to get this right:

You got it right, except for the following parts:

OSC messages

by the structure of supercollider, the standard way of communicating from language to server is by OSC messages, which have a considerable latency, upto 200ms.

That’s for OSC bundles with timetags. OSC messages are sent out and received as fast as possible. That’s the difference between s.sendBundle and s.sendMsg.

Actually, you don’t have to use a fixed OSC bundle latency. In fact, Server.latency is only used by certain parts of the Class Library, most notably the EventStreamPlayer and Server.bind.

  • So I have two questions.
  1. Are OSC messages processed every audio callback or every DSP tick? It would also be interesting to know why (:slight_smile:

Incoming OSC messages are only collected once per audio callback. In theory, we could also do this between control blocks but there is not much point in doing so. The behavior would depend heavily on the CPU load: with low CPU load, there is very little time for messages to slip between control blocks, but as the CPU load rises, messages are distributed more evenly. That’s not something I would want to depend on. Audio callbacks, on the other hand, happen at relatively regular intervals.

Futhermore, 2) how does SCsynth use this information to schedule things, and to what extent is it then possible to accurately time things? Because according to jamschak70 it is possible to accurately time things with OSC messages, while according to Spacechild1 OSC timing accuracy depends on the hardware buffer size (which means it is not so accurate).

I only said this regarding OSC messages! OSC bundles come with a timetag (except for immediate bundles).

The Server timing mechanism is explained in detail in the following guide: Scheduling and Server timing | SuperCollider 3.12.2 Help

This is how OSC messages/bundles are received and dispatched internally:

  1. a network thread continuously receives OSC messages/bundles and puts them on a lockfree queue

  2. on every audio callback, OSC messages/bundles are removed from the queue and handled as follows:
    a. OSC message or immediate OSC bundle → dispatch immediately
    b. OSC bundle with timetag → add to the scheduler priority queue

  3. for every control block the Server calculates the logical OSC time and then dispatches all scheduled bundles with a timetag between this and the next block. With every dispatched bundle, the (sub)sample offset is advanced accordingly, so it is visible to the actual Server command. For example, when you create a Synth (/s_new), the current (sub)sample offset is copied into the Graph structure, so that it is later visible to individual UGens, such as OffsetOut or SubsampleOffset.

once OSC messages are processed, things are scheduled up to a control cycle. If you want more precise scheduling one can use OffsetOut or SubsampleOffsetOut.

OSC messages don’t have a timetag and are only dispatched at control block boundaries. OSC bundles with timetags, on the other hand, are scheduled at the appropriate (sub)sample offset. However, it’s the responsibility of the user to actually do something with this offset, e.g. by using OffsetOut instead of regular Out.

(Side note: the (sub)sample offset can also be used by so-called unit commands, which are messages that are sent directly to a specific UGen. I’m using this feature in VSTPlugin to implement sample-accurate MIDI messages.)

So I am a bit, lets say, confused that it appears to be quite a hassle to get timing right in SuperCollider.

Yes, I would agree that timing in SuperCollider can be quite confusing, especially if you are not familiar with the internals.

I also looked at ChucK, which might be a better suite for my need/interest for timing, but it appeared to be more academic and have a smaller user base, and maybe dying in some years.

Did you have a look at Pd? Timing is dead simple because the message system and DSP processing are synchronous and fully deterministic. Depending on what you want to do, Pd might offer additional advantages. However, there are certain things that are objectively much easier to do in SC, generally anything that involves dynamic creation of Synth voices or reordering of Nodes.

However, I do still believe it would be very nice if Supercollider would make it as easy and transparent as possible to achieve proper timing.

The Pattern system (really the EventStreamPlayer) already does this because everything is automatically scheduled with Server latency. Otherwise there’s indeed a bit of a learning curve and many users don’t seem to know how to manually schedule Synths with precise timing, or why that might be necessary. (Server.bind is great for this purpose, but for some reason many users don’t seem to know about it…)

However, real improvements would more be to get rid of the OSC communication protocol to reduce latency,

Again, that’s a big misunderstanding because as I’ve explained above, OSC messages really have no significant latency. The latency of scheduled OSC bundles, on the other hand, is fully intentional and strictly necessary. In SC’s architecture the language (= Client) runs independently from the sound engine (= Server), so messages must be scheduled into the future to ensure proper timing. There’s just no way around it.

As I already hinted, Pd uses a completely different scheduling model where this kind of Server latency is not necessary. However, it comes with its own set of drawbacks. There is no silver bullet!

If you are interested in a comparison between different scheduling models, see my keynote at the recent SuperCollider symposium: Documentation | The Future of SuperCollider (The videos are not online yet.)