Server clock vs. Language clock

That would be amazing!

You are right, I misspoke when counting the packet sending as part of the server - client timing.

One of the first projects of miSCellaneous_lib was dedicated to a special case of this problem. I wanted to have synth values as control data in Pbinds. Therefore, I designed a mechanism that introduces some extra latency for the response and allows a synchronisation – though obviously not sample-exact, and this is also not necessary for control. I found that this framework has become obsolete by the invention of synchronous buses and didn’t use it since then. However, it still works and might be useful in some contexts. See “Guide to HS and HSpar” as a starting point.

When I was working on my OSC sync clock quark (which I ended up abandoning because of these findings), I found that an “as soon as possible” ping roundtrip between two machines on the same LAN completed very quickly (0.5 ms, on that order) but that the sync messages from the leader to follower clocks (which were timestamped btw) exhibited timing jitter greater than the roundtrip time, but oddly, not when Mac was the receiver :face_with_raised_eyebrow:

  • Mac or Windows sending, Linux receiving: Weird pattern of jitter. (Also if Windows and Linux switch places.)
  • Anything sending, Mac receiving: Significantly less jitter.

Sync messages are going out to a broadcast address, which may affect the behavior – but it was puzzling to me that the only variable that mattered was the OS of the receiver.

This may not be immediately relevant to this specific question, but I guess, if we do get timestamps for server → client messages, there’s a chance that timing may yet not be super accurate in Linux or Windows. But this was all years ago and hazy in my memory.

OSCBundle is supposed to support something like this, but the implementation has a couple mistakes. My ddwPlug quark fixes that with a method “sendOnTime” – it’s pretty neat, it can send SynthDefs immediately, sync to server, and then adjust the latency so that the synth is exactly on time.

hjh

I went on to test the server time stamps and found the same oddity as when testing the language clock in my first example: If latency is non-nil, the server time stamps are sample locked, but when latency is set to nil, the same 4 seconds periodicity appear in the server time stamps and is also seen when inspecting audio produced by quantized patterns - values are off by pretty much the same amount in a 4 seconds cycle. I would assume latency = nil would result in more randomized offset from the quantized value, but values are very consistent.

So then I thought I could use this information to offset the delta times in a pattern with Prout (like the swingify Prout from the help files) to get better timing with latency = nil. The code I am working on force me to work with latency set to nil, since the extra added latency from ie. latency = 0.05 (which has been the smallest I can go in the past without getting late messages) breaks the ‘real time - feel’.

For this idea to work for eg. a pattern quantized to 16th notes, the deviation between server time stamp and quantized value for any given 16th note in the current 4 second cycle should be the same of very close to that of the previous 4 second cycle.

Below is the test, which on my setup shows that the majority (something like 95%) of 16th notes has the same deviation as in the previous cycle.

The test was done on a first gen. M1, SC 3.13.0, OSX 13.6.4, Apollo Solo interface, sr = 48000. I would be very curious to know if other SC users get similar results.

There are some odd cases where ‘this offset’ significantly differs from ‘previous offset’ (see post window). I don’t really know what to make of these cases.

(
s.latency = nil;
Pdef.removeAll;
~beats = 4;
~tatum = 1/~beats;
l = Array.newFrom(0!(~beats * 4));
t = TempoClock(1);
x = Synth(\time);

Pdef(\analyze,
	Pbind(
		\type, \set,
		\id, x,
		\args, #[\trig],
		\trig, 1,
		\dur, Pseq([0.25], inf),
)).play(t, quant: 1);

OSCdef(\o, {|msg| 
	// feedback starts after 4 seconds of play
	var i = (t.beats.round(~tatum) * ~beats).asInteger%(~beats * 4);
	var val =  msg[3] - msg[3].round(~tatum); 
	if (t.beats >= 4) 
	{ 
		var dev = (val - l[i]);
		dev.debug('Deviation from previous cyle');
		if (dev.abs > 0.001) 
		{ 
			val.debug('This offset'); 
			l[i].debug('Previous offset');
			i.debug('i of 16th note in 4-bar-loop')
		}
	};
	l[i] = msg[3] - msg[3].round(~tatum); 
}, '/time');
)

In danger of stating the obvious, it just isn’t possible to get predictable timing – without scheduling messages in advance as timestamped bundles – due to several factors:

  1. quantization effects due to the audio hardware blocksize
  2. jitter in the audio callback
  3. jitter in the language
  4. network jitter (on localhost, that’s actually the least problem!)

Make sure that you understand each of these factors!

1 Like

Minimum usable latency will depend on the soundcard’s hardware buffer size. Reducing the HW buffer should also reduce the lower limit for messaging latency (although I haven’t tested to see what the practical relationship is).

EDIT: Did test. With HW = 2048 samples (at 48 kHz), 2048/48000 = 42.6667 ms and I could go to about s.latency = 0.067 but no lower. HW = 512, 512/48000 = 10.6667 ms and s.latency = 0.0175 was OK :+1:

hjh

1 Like

I did some tests just now, and I think my code was running ok at a hw buffer size of 64 and 0.02 latency, 0.0175 got me in trouble. I think hw size 128 or 256 might work better for me, especially if I can get the latency even lower. Audio input with buffer size 128 still feels ‘instant’ at 48kHz, I think 256 is fairly ok, at 512 samples / 10.6 ms you are hurting. Anything under 5ms is ok, 256 samples is just an that edge.

My main obstacle is the audio to midi conversion which is done by MIDI GUITAR 2. MIDI GUITAR 3 is in beta testing but does so far not really show much improvement in latency. My testing with both MIDI GUITAR 2 and 3 on a prerecorded file seemed to indicate that regardless of the buffer size, which can be set to 64, 128 or 256, the latency is about 20-25 ms on avg, with best cases around 12 ms, worst case around 30 or 35 ms, with the beta version performing worse than vs. 2 on all almost all parameters - they promise it can be improved. They - or really I think, just one guy named Ole who, I think is Danish like myself and also pretty secretive - are using ML to train a model for the pitch recognition and it can now also tell which string the note came from which I think I can utilize somehow in future designs. Despite the latency, which I know is hard to overcome - FFT analysis with high confidence takes times - and all the garbage notes which is part of any audio-to-midi system, it is still a very impressive piece of software.

One issue with this whole setup is logging the SC clock time of played notes on the guitar. So when a pattern is playing on a clock and I input new notes in time with the pattern, how much to offset the timing to compensate for the overall latency of the input chain taking into account also the output latency. With latency set to nil (I might just change that now in light of your input and my new testings) I am surprised how much I have to offset the readings of the clock - something like 55 ms. There are many jitter and latency sources including SC processing time, which is around 5 ms avg/10 ms worst case ( I am doing a of analysis) for a midi noteOn with some less time critical processed handled by routines. The system still feels real time responsive. At the same time I am trying to optimize in all the ways I can to try and shave milliseconds off the overall response time.

Ok, if I understand correctly, you are trying to play live with your guitar (triggering Synths on the Server) with as little latency as possible while at the same time you are playing Patterns.

One thing I’m wondering: why do need to care about the latency of the Patterns? If the Patterns run independently from your live input, the latency can be just as high as needed. After all, you can hear the patterns and play along. Could it be that the patterns themselves are triggered/influenced by the live guitar input?

I did some tests just now, and I think my code was running ok at a hw buffer size of 64 and 0.02 latency, 0.0175 got me in trouble. I think hw size 128 or 256 might work better for me, especially if I can get the latency even lower.

This sounds like a contradiction to me. Larger hardware buffer size means higher latency and vice versa.

If you want to minimize overall latency, you first need to figure out the lowest possible hardware buffer size that gives a stable audio signal without dropouts. From there you would need to make a tradeoff between latency and accurate timing on the language side. You simply can’t have both!

One thing I’ve also mentioned elsewhere: if the Pattern sequencing is causing language jitter that ruins the live input, you may consider running both processes in dedicated SuperCollider instances. (It’s also possible to have multiple sclang instances control a single Server.)

If the Patterns run independently from your live input, the latency can be just as high as needed

The challenge is to start a pattern when I play a note on guitar (depending on the analyzed midinput). When this happens, the clock’s beats is set to 0 and the pattern is played on the clock with quant: 1, ergo immediate execution. Any latency other than nil will add to the latency of the first downbeat of the pattern. With latency = nil, the overall latency, judging from looking at the recorded audio in a sample editor, is around 30 ms of which the audio to midi conversion is responsible for 20-25 ms. With a latency of e.g. 0.03, the latency would double which is undesirable. I am messing around with a hacky way of dealing with it: s.latency = nil for the first downbeat, then 0.1 beats later set the latency to 0.03 and leave it there. The timing will of course be a little strange in the very beginning of the pattern but my initial test seem to indicate that this is less of a perceptual problem than one would think. Once the pattern is going, the latency is much less of a an issue. I noticed that regardless of latency setting (if other than nil) the reported times from the server are less than the latency settings. Eg. with a latency of 0.2 the server time stamps are consistent around of value of approx 0.175 added to each beat. I wonder why it is not closer to 0.2? I know the difference is small, just wondering…

if you want to minimize overall latency, you first need to figure out the lowest possible hardware buffer size that gives a stable audio signal without dropouts.

Yes I had it backwards, thinking higher buffer size would allow less latency. With a buffer size of 64 I get occasional crackles in the audio, so that I might have to up the size to 128. A buffer size of 64 and a latency of 0.02 almost works…I will have to do more testing.

One thing I’ve also mentioned elsewhere: if the Pattern sequencing is causing language jitter that ruins the live input, you may consider running both processes in dedicated SuperCollider instances. (It’s also possible to have multiple sclang instances control a single Server.)

I don’t really know if the patterns cause additional language jitter, and I am not sure how to test this. How would you go about running multiple slang instances?

Ok, that’s what I thought. So you are really controlling the Patterns with your live input.

I noticed that regardless of latency setting (if other than nil) the reported times from the server are less than the latency settings

In your test you start a Synth with a continuous ramp and sent its values periodically back the Client. It takes time before the Client receives these replies, so naturally the Server will report a (sample) time of N while the Client’s network thread already sees a time of N+k.

Also, please keep in mind that these values will slowly drift over time. The Client measures system time, but Sweep measures sample time.

of which the audio to midi conversion is responsible for 20-25 ms.

I guess you need to find a better audio to MIDI converter :slight_smile:

How would you go about running multiple slang instances?

I think in your particular setup it won’t help anyway, because you also want to control the Pattern with the live input.

1 Like

I guess you need to find a better audio to MIDI converter :slight_smile:

To be fair, the minimum latency cannot be lowder than the period of the lowest pitch, which in case of the e-guitar would be 12 ms for the low E string (82.5 Hz). However, this assumes that we need the same latency for all strings, which is no necessarily true! With a guitar we can track the strings seperately, so we may decide to track each string as fast as possible. After all, the period of the highest string is only 1.5 ms – quite a difference! It’s a bit of a trade-off: should all strings have the same latency, or the shortest respective latency, or something in between? IMO there cannot be a single solution that fits all playing styles. Actually, it would be even useful to switch tracking modes between modes of playing (e.g. riff playing on low strings vs. shredding on high strings).


Now, I just realized that MIDI GUITAR 2 is a plugin! This means it can’t track the strings independently, so the latency really cannot be shorter than 12 ms (in reality, probably twice as long). Are you using the plugin in the Server with VSTPlugin? That would explain why you are interested in Server->Client timing (which otherwise would be completely irrelevant). In that case you will indeed have problems achieving the lowest possible latency because you need a full Server->Client->Server roundtrip. For this kind of live-tracking, Pure Data or Max/MSP might be more appropriate because you can immediately handle the MIDI notes on the audio thread with no additional latency involved.

I think you should really look into proper MIDI guitar pickups! Not only is the tracking faster, you can directly receive the MIDI notes in sclang.

To be fair, the minimum latency cannot be lowder than the period of the lowest pitch, which in case of the e-guitar would be 12 ms for the low E string (82.5 Hz).

That is true, however the ML algo they use (don’t how it is set up) could, I think, theoretically allow for a faster detection than one complete cycle of the fundamental since at least the first 9 partials of the signal of an electric guitar has significant energy. However, when I have experimented with onset and pitch detection in SC, using vanilla and Flucoma ugens, the time it takes to detect a note with high pitch confidence is considerably higher, but also, I am not upsampling the signal which I think could speed up the detection.

The string detection feature is added to MIDI GUITAR 3 which in beta-testing.

For my usage of MG (which is most certainly different than most people’s, because I analyze the midi output, where most people probably just play the output) I would probably prefer a more uniform latency because it would make it easier to analyze.

Are you using the plugin in the Server with VSTPlugin ?

I have been using the standalone version of MIDI GUITAR 2 and receiving MIDI through MIDIdefs in SC, but when a vst3 version of MIDI GUITAR 3 is ready I want to see if I get better results running MG as vst plugin in SC. I guess it would still have to go through MIDIdefs.

For this kind of live-tracking, Pure Data or Max/MSP might be more appropriate because you can immediately handle the MIDI notes on the audio thread with no additional latency involved.

Just thinking about porting my 7000 line code to Pure Data or Max makes my head ache:) I initially chose SC because I think all the logic controlling the behavior of the system would be nightmarish to achieve with PD or Max, but also, I have no experience with PD or only very minimal experience with Max. Can you explain what you meant with

…because you can immediately handle the MIDI notes on the audio thread with no additional latency involved.

?

I think you should really look into proper MIDI guitar pickups! Not only is the tracking faster, you can directly receive the MIDI notes in sclang.

At least up until recently MG was as good as any hardware I have seen or tested. I think for instance that MG perform as well as the Roland pickup system. I saw a new Fishman solution which claims to deliver faster tracking, I think around 13 or 15 ms. It is hard to compare the different solutions based on these kinds of specs for a number of reasons: latency is not inherently steady (as you pointed out), the amount of bogus notes (false positives) is definitely also a big factor when analyzing (I do various kinds of filtering to the midi roll to remedy this) and also, how high is the pitch confidence?. The pitch confidence for MG is extremely high, ie. for the ‘real notes’ as estimated by me (as opposed to the false positives), MG very rarely miss the pitch - almost to a point where I would prefer slightly lower pitch confidence if it meant faster detection at the expense of missing the pitch of a note now and then.

This would show SC is a truly Hegelian system, wherein the master/slave dialectics, the slave is the one with true consciousness. “The truth of the independent consciousness is accordingly the consciousness of the servant…being a consciousness repressed within itself, it will enter into itself, and change around Into the real and true independence.” That’s “SC4” right there.

But seriously, changing this hierarchy would be a big qualitative shift, with many different effects.

Ok, wow!

Just thinking about porting my 7000 line code to Pure Data or Max makes my head ache:)

Sure! If you have already written your project in SC, there’s not much point in porting it to Pd or Max unless you really hit a wall.

…because you can immediately handle the MIDI notes on the audio thread with no additional latency involved.

?

Pd is synchronous: for every block of 64 samples, Pd first does all the messaging (including clock timeouts, network I/O, MIDI I/O, etc.) and then does the audio processing. This means you can, for example, receive a MIDI message, send some Pd messages in turn and Pd will immediately turn it into audio. Also, any sequencing in Pd is perfectly sample accurate by default! There is no notion of “Server latency”.

SC is asynchronous: MIDI is received in sclang, then you need to send a message to the Server, which is (hopefully) received as soon as possible and then turned into audio. Even without Server latency, you may get a delay of a full hardware buffer size in the worst case. And then there is also language jitter.

Personally, the synchronous and deterministic nature of Pd is one of the reasons I still prefer it over SuperCollider, even though patching takes much more time than writing sclang code.

How much of it is the lack of optimization of sclang, or is it server latency proper?

Assume that two processes A and B are running independently from each other. Process A might send a message to B just when the latter has already started or finished its cycle, so the message will only be dispatched at the next cycle of B.

Note that the Server computes control blocks in batches, For example, with a hardware buffer size of 256 samples, every audio callback will compute 4 blocks of 64 samples in a row. Now, if you’re out of luck, your message might be received just while or after the last block has been computed, after which the audio thread goes to sleep for the remaining time slice. This is the reason why message dispatching is quantized to the hardware buffer size (in the worst case) and not to the Server block size (typically 64 samples). The actual quantization depends on the CPU load. If the CPU load is low, the audio callback finishes very quickly and spends most of its time sleeping, so quantization approaches the hardware buffer size period (e.g. 5.3 ms for 256 samples @ 48kHz). If the CPU load is high, the quantization is less pronounced as the callback spends less time sleeping and more time processing blocks, giving messages the opportunity to “sneak in”.


To illustrate further why you cannot “directly” turn MIDI data into audio in sclang:

First the MIDI timer thread needs to wake up and read incoming MIDI messages. Then it needs to obtain the global interpreter lock – which might be currently held by another sclang thread! Only then it can dispatch the MIDI message to the user code which may in turn send messages to the Server. The Server will finally receive the message and dispatch it in the upcoming control block. Depending on the “phase” of the audio callback, the message may be dispatched in a few microseconds – or in a few milliseconds.

1 Like

I believe not much effort has been put into fine-tuning and optimizing this. There must be ways to mitigate this.
First, optimizing the language. Creating larger Slots (128 bits) and parallelizing the computation. MIDI is a sensible part, and it could also have its priority. And then, also trying to have some sort of “soft sync” with the server. (I know, we only know it some “optimization” works testing it, but it never got much attention).

Or “hard sync” as you mentioned before.

Or maybe even think of another language (not sclang) that has a dedicated separate real-time clock in sync with the server.