Keeping sclang and scsynth in hard sync

Hello,

I think this question has been discussed before but I couldn’t find back the thread in a quick search which is why I dare to pose it again:

Wouldn’t it make sense for sclang and scsynth to use the sample clock of the audio device the server is connected to as a common time basis? I looked at the relevant code but I cannot tell if it is realistic to implement this as an optional mode in order to avoid the extremely tedious drifting of the language and server time bases, which regularly trips me off (and I guess not only me).

I’d be more than happy to help develop such a mode if others also think this could make sense and solve all timing related problems when running language and server on the same machine.

Best regards,

Gerhard

That would be great! Though it would be good to have a switch to turn it off, particularly for Ableton Link support (which deliberately sacrifices sample-accuracy in exchange for “good enough” sync between multiple machines).

An option for sample accurate OSC timestamps would be valuable.

hjh

In a first experiment I have locked the oscTime in appIOProc in SC_CoreAudio.cpp to the sample clock in order to test if the scheduling in SC_CoreAudioDriver::Run works reliably, which it does, i.e. I get long stretches of the expected sample accurate results. My test case is a 1kHz pulse train created by individual synths using OffsetOut. The sample and subsample offsets all make sense. Of course, every now and then there is a glitch because the language clock and the sample clock drift too far apart. This makes me confident that if the language time would be synced to the sample time as well, things could work out.

On the language side, would it make sense to have something like a SampleClock that works like the TempoClock, but uses the sample rate of the same audio device the server is connected to? This would probably also mean that the language has to open that device and run sample callbacks to resync its clock every block. Or are there smarter solutions to this?

I believe it’s technically not very feasible to schedule sclang wake-ups on the audio device clock - generally this kind of scheduling is only done on a kernel-managed audio thread, and running an entire sclang process on an audio thread has the potential to hurt overall audio performance. There may be platform- or driver-specific solutions to this, though?

If we assume that wake-ups will be scheduled on a non-audio clock, then we’re already in a problem space where we need to manage drift (this is pretty normal for audio applications) - right now this is done on the scsynth server, when scheduling OSC bundles. This makes the most sense for the overall architecture of SuperCollider, where there’s no guarantee that the server is even local to the sclang instance. Implementing a SampleClock on the sclang side would probably look like moving the logic that translates between clocks (accounts for drift etc) from sclang to the new SampleClock implementation.

Probably: when a wake-up needs to be scheduled, we compare our local “CPU time” to the reported “audio device time”, and choose a wake-up pragmatically so that we’re early enough to meet deadlines for our audio time. This step is where e.g. drift adjustment happens, since the relationship between “CPU time” and “audio device time” will be in flux. The drift compensation could (should?) basically be copied from the same logic being used on sclang (since sclang is doing a roughly equivalent clock translation) - though there are platform specific ways of dealing with this as well (at least for CoreAudio) that may do a better job.

There’s a separate problem for “sample accurate time” - namely, that sclang does not have a good way to represent accurate integer time, which is required for OSC time tags to get sample-accurate behavior. Properly supporting this would require a rational number implementation in sclang, which is … fairly non-trival to implement, and VERY non-trivial to get predictable results from (considering how much of sclang assumes floating point time). Probably it’s reasonable to just use floating point time as it works now in sclang - as long as all of your events are being scheduled by the same sclang client, even if some fidelity is lost due to floating point accuracy, you’d still have a pretty firm guarantee that two events scheduled at the same floating point time would also occur at the same sample-time.

I guess what we are looking into here has some similarities with LinkClock. Does anyone know who implemented it?

First of all I should say that I am only trying to find a solution to the timing problems of SC for the case where the language and the server run on the SAME machine.

In my understanding there are TWO problems with the current solution: jitter and drift of the system time with respect to the sample clock. Both are compensated for in the server. This compensation leads to more frequent and smaller deviations from the desired points in time (one or two samples) and less frequently to larger ones (tenth of samples).

I assume that the sample clock is much more stable than the system time, so fluctuations of the system time are probably the source of the jitter, causing even the smoothed sample rate to fluctuate too much. So much for jitter.

Naturally, there will always be a drift between the sample time and the system time, which is the problem I would like to address with a language side SampleClock, which works like a TempoClock but adjusts it time regularly with the sample clock. When creating such a clock we will probably pass a (running) server as an argument to which we want the clock to be in sync with.

The OSC time-tags (and their representation as doubles) arriving from the language are precise enough to express the kind of temporal relations we want to express, even at highest sample rates and over long stretches of time.

The job of scheduling in the language is to send the time-tagged OSC bundles in due time to the server so that they can be scheduled there at the right time. This requires for the language and server times bases not to drift with respect to each other. Since the server has two time bases, one of which is the sample clock and that is our reference when synthesising sound, we want the language to sync to this clock as well to avoid drift.

I am under the impression that we don’t need a rational number representation to achieve all this because we have enough bits in doubles to do the required calculations.

If we assume that wake-ups will be scheduled on a non-audio clock, then we’re already in a problem space where we need to manage drift (this is pretty normal for audio applications)

Not necessarily. The client can actually be driven by the Server. This doesn’t have to happen synchronously. Instead the server can notify the client whenever it has computed a new block of audio. The client scheduler wakes up, increments its logical time and dispatches any scheduled functions/commands that fall in this time slice. The client might either run in a dedicated thread or process and simply waits on a (named) semaphore. Alternatively, the server might send a message over a (local) network connection, e.g. for remote servers. (In fact, a single remote server could drive several clients.)
Consequently, the server simply dispatches incoming OSC bundles based on logical sample time.

Of course, the logical sample time will drift from the system time. This is unavoidable. Personally, I hardly ever feel the need to work with system time, except maybe for long-running installations. Pure Data, for example, only works with logical sample time; you simply can’t schedule things precisely in terms of system time, but in turn you get deterministic sub-sample accuracy. If I want a section to be 60 seconds, I really want it to take 2880000 samples @ 48 kHz, not 60 seconds as measure with an atomic clock.

Personally, I would not even try to mix logical sample and system time. If you use the former, forget about the latter.

A more tricky case is controlling several servers from a single client. If all servers are local, the individual sample clocks won’t drift apart - at least if the servers are driven by the same audio device. But if you control several remote servers, then the individual sample clocks will drift. In that case, it might make more sense to stick with a single system time clock.

Another tricky thing is how to dispatch incoming OSC bundles relative to a sample clock. You can use a time DLL filter to compensate for drift - as scsynth is currently doing -, but you somehow have to tell the OSC receiver to use the appropriate clock source. This is largely a matter of API design. E.g. OSCFunc could have an (optional) clock argument.


Generally, much of SuperCollider’s complexity wouldn’t exist if a client could only ever control a single server… In that case, the whole client scheduler could simply be driven - asynchronously(!) - by the audio callback.

Yes, I think so too. As a preparation, I performed the following test:

After having deactivated the periodic calls to syncOSCOffsetWithTimeOfDay in both the language and the server (two lines of code commented) and using the sample clock as time basis in the server (two lines of code replaced), I could run 15 minutes of sample accurate synthesis without a single glitch. It seems that the default server latency of 200 ms allows for the drift accumulating over such a limited time span.

I assume that a simple OSC message from the server every few seconds would be enough to continuously adjust the language time basis such that the two don’t drift too far apart - similar to what SC unsuccessfully tries in keeping the OSC time of server and language aligned.

I think a fundamental question is, should we

a) have a dedicated SampleClock (+ SampleTempoClock ?) that is driven by the server
b) let the entire language scheduler be driven by the server

I tend towards b) because we wouldn’t need to introduce new objects. Instead, both SystemClock and TempoClock would work on the logical sample time. The only thing that is needed is a switch in ServerOptions.

One downside of b) is that you can’t control more than one Server at the time. To me that would be a reasonable tradeoff, but other people might think differently.


assume that a simple OSC message from the server every few seconds would be enough to continuously adjust the language time basis such that the two don’t drift too far apart

If the client is driven by the server, there is no need for synchronization at all: the language time base would be exactly the same as the server time base. Server scheduling would be 100% deterministic and sub-sample-accurate.

You would only need to synchronize when you interact with systems that are based on wall clock time. One example is sending/receiving timestamped OSC bundles to/from other programs. However, I think this could be handled transparently, i.e. the sample time <==> NTP time conversion can happen under the hood. Right now, I can’t think of anything else…

Are you saying this would be an option or the default? I always run 5 servers, so this would be very bad for me if it were the only way.

Sam

Are you saying this would be an option or the default?

Certainly an option :slight_smile:

1 Like

[…] I always run 5 servers, so this would be very bad for me if it were the only way.
Out of curiosity, what is the reason to you run five servers?
Thanks, Peter

I’m running lots of things at once, and for real-time processing I need the buffer size to be at 64. I also have some cpu-intensive neural net things that basically take up a whole cpu (this is mostly due to the 64 sample hardware buffer). The nice thing about multiple servers is they each can run on their own core without me doing anything. So it just works across a multiprocessor system without having to deal with supernova shenanigans. It has its own shenanigans though, as I need to do some goofy loopback stuff to send audio between servers. I counted and I actually use 7 servers, which is down from 11 on the intel machine (each M1 core is much faster than each intel core and can handle more processes). This approach may become obsolete as machines get faster, but it is currently my way.

2 Likes

I think this would be too radical a shift, with all kinds of backwards-compatibility issues (e.g. support of multi and remote server setups). I would prefer a), i.e. introduce a new kind of TempoClock that - together with a properly configured server - uses the sample clock as time basis.

I have a proof-of-concept implementation of this idea working. Both the TempoClock and scsynth do have the resyncing with the system time disabled. When switching the scsynth to follow the sample clock, the two clocks would not drift enough to cause trouble, for at least an hour - using the built-in audio system as well as external USB audio interfaces, and on different macOS versions (10.13.6 and 12.2).

This raises two questions:

  1. Do we actually need to have language and server in hard sync if they don’t really diverge, at least not more that a normal server latency would not be able to compensate, even for hours?

  2. What is wrong with the resyncing mechanism that it actually causes trouble instead of avoiding it?

I will nevertheless come up with a way to have the server regularly update the language about the sample time so that it could be used by a TempoClock to schedule sample accurate server events. My intention is to change as little as possible of the current implementation. I think your option b) would involve more radical changes to SC.

With my test implementation the following plays a perfectly sample accurate 1 kHz pulse train for a bit more than quarter of an hour at 48 kHz:

(
Server.default.quit();
Server.default.options.useSampleClock = true;
Server.default.waitForBoot({
	SynthDef.new(\dirac, {
		OffsetOut.ar(0, FreeSelf.kr(Impulse.ar(0)));
	}).add();
	Server.default.sync();
	TempoClock.useSampleClock(true);
	Routine.new({
		1000000.do({
			Server.default.bind({ Synth.new(\dirac) });
			0.001.wait();
		});
	}).play();
});
)
1 Like

Of course it would have to be an optional feature. Actually, it would also work with remote servers! The problem is only about multiple local/remote servers because they do not share a common time base.

When switching the scsynth to follow the sample clock, the two clocks would not drift enough to cause trouble, for at least an hour - using the built-in audio system as well as external USB audio interfaces, and on different macOS versions (10.13.6 and 12.2).

It would be interesting to test this also on Windows and macOS. Also with audio interfaces that use an external clock. Anyway, the fact that it does eventually drift is indeed a problem. Some kind of synchronization mechanism is absolutely required.

more that a normal server latency would not be able to compensate

Server latency does not compensate for clock drift. Its purpose is to compensate for network latency and language jitter.

I will nevertheless come up with a way to have the server regularly update the language about the sample time so that it could be used by a TempoClock to schedule sample accurate server events.

You might have a look at what NTP clients do. They periodically query the NTP server to get the current “real” time, but instead of immediately resetting their own clock - which might cause a discontinuity -, they typically “spread” the timing difference over a certain time frame. You can search for “NTP clock slew”.

That being said, I still think that it would be better to let the Server (optionally) drive the TempoClock. It does not require complex and brittle clock synchronization algorithms, it is 100% deterministic and it even allows for sub-sample-accuracy. However, it would require a partial rewrite of the language scheduler.


Thinking a bit further: having Server + language scheduler or TempoClock in hard sync would also enable a new kind of NRT synthesis which essentially works the same as RT synthesis. The only difference is that the Server also waits for the Client.

RT synthesis with sample clock:

  1. Server: notify Client → run scheduled messages/bundles → compute audio
  2. Client: wake up → advance logical time + run scheduled Routines → wait
    etc.

NRT synthesis with sample clock:

  1. Server: notify Client → wait
  2. Client: wake up → [receive Server replies → ] advance logical time + run scheduled Routines → notify Server → wait
  3. Server: wake up → run scheduled messages/bundles → compute audio
    etc.

(For NRT synthesis we probably wouldn’t use semaphores, instead we would have to synchronize via OSC messages, to make sure that OSC messages/bundles between Server and Client are dispatched deterministically.)

I like the idea of allowing the server to drive a clock on the language side - it removes the need for the language to add additional clock primitives or know anything about a hardware clock at all. Probably there’s already a significant amount of logic in LinkClock that would apply equally to an external server clock.

Maybe you have some thoughts on a few question @Spacechild1? In your model, you have:

  1. Server audio thread wakes at audioDeviceTime=1000 and notifies client that logicalTime=1000.
  2. Client wakes at audioDeviceTime=1000+n with a message that logicalTime=1000.
  3. Client processes any scheduled actions for logicalTime=1000, sending OSC “now” messages with a timestamp of scheduledTime=???.
  4. Based on the next item in it’s scheduler, the client schedules another wake-up at nextScheduledWakeup=logicalTime + delta.

So, my questions…

  • In [3], when does the client schedule “now” events, e.g. messages meant for the current logical time=1000? It can’t schedule them for time=1000 because this time has already passed on the server. Probably it schedules them for logicalTime + server.latency? But then, for debugging and code purposes, the server is processing events at audioDeviceTime=1200 that are nonetheless meant for logicalTime=1000 - this seems like a recipe for confusion? It would be nice to have an implementation where the logical time on both the client and the server were semantically identical, and the latency was implemented by waking up the client early and effectively hiding the offset from latency from most code? In other words: server sends wake-up message for client at audioDeviceTime=1000-latency with logicalTime=1000.
  • In [4], the client needs to tell SOMETHING about logical time of it’s next wake-up (we can’t wake the client every 16 samples…). This presumably happens as an OSC message?

In sclang there’s no fundamental reason why events cannot be scheduled on different clocks, e.g. AppClock - or threads, e.g. a network callback thread. In order to reasonably schedule server events (meaning: send any OSC message at all…), we need SOME notion of what the current logical time is, even if it’s inaccurate. This imposes some requirements:

  1. In addition to scheduler-based wake-ups, the server needs to regularly update the sclang client’s logicalTime even if no events are being processed. This is fine, but these updates will happen with SOME granularity - either one of our choosing (so that we avoid flooding with updates every 16 samples) or the hardware buffer size.
  2. When scheduling events from anywhere that isn’t a SampleClock wake-up, we want a best possible estimate of what our current time is. Given the granularity in [1], we will need to infer a more accurate logicalTime. The only way to do this is to use some kind of accurate non-audio-device clock, and infer an accurate logicalTime based on ticks sent from the server.
  3. [2] above is basically LinkClock: an external clock that gets periodically synchronized to correct for drift or change. Given that, aren’t we basically stuck implementing a two-clock synchronized operation anyway? If that’s the case, why not just simplify things - have the server simply send out the equivalent of LinkClock ticks, and synchronize everything using one mechanism - one that’s already built and works well?

Client wakes at audioDeviceTime=1000+n with a message that logicalTime=1000.

Waking up another process is a matter of microseconds. If the Client is currently idle, it will be woken up more or less immediately. If it is still busy with a previous task, that is the typical language jitter - which can be compensated with Server latency.

Note that even in the current implementation, the Client might wake up a bit later than the desired time point because std::condition_variable::wait_for is not 100% precise, either.

  • In [3], when does the client schedule “now” events, e.g. messages meant for the current logical time=1000? It can’t schedule them for time=1000 because this time has already passed on the server.

Yes, the Client would simply use the current logical time as the basis for scheduled OSC bundles. Of course, if latency is zero, the bundle will always be late, but this is also the case with the current implementation!

Generally,the Client can only schedule things for the future, never for the present. The only exception is a NRT Server that can wait on the Client (as outlined in my other post above).

OSC messages, on the other hand, are not scheduled; they are sent immediately, so the time base doesn’t matter.

Probably it schedules them for logicalTime + server.latency?

It simply schedules for logicalTime + delay. The latter can be Server.latency or any other value, just like with the current system.

But then, for debugging and code purposes, the server is processing events at audioDeviceTime=1200 that are nonetheless meant for logicalTime=1000

This only happens if the OSC bundle arrives late, just like with the current system.

  • this seems like a recipe for confusion? It would be nice to have an implementation where the logical time on both the client and the server were semantically identical, and the latency was implemented by waking up the client early and effectively hiding the offset from latency from most code?

What would waking early mean? For OSC messages it would make no difference. For OSC bundles the only effect would be that you get some hidden extra latency. This is not necessary because we already control the latency in the Client.

In [4], the client needs to tell SOMETHING about logical time of it’s next wake-up (we can’t wake the client every 16 samples…). This presumably happens as an OSC message?

The Client simply dispatches all scheduled Routines that fall into the current time slice (= duration of a block). Something similar already happens in the current language scheduler: when the Client wakes up, it reads the current time, compares it against the desired schedule time - it might have been woken up late! - and dispatches all ready Routines.


In addition to scheduler-based wake-ups, the server needs to regularly update the sclang client’s logicalTime even if no events are being processed. This is fine, but these updates will happen with SOME granularity - either one of our choosing (so that we avoid flooding with updates every 16 samples) or the hardware buffer size.

We only have this problem if we want to introduce a dedicated SampleClock that should coexist with SystemClock and TempoClock. But why would we need this in the first place? I think it would just make things more complicated.

Personally, I still think it makes more sense to drive the entire Client scheduler from the audio callback - as an option, of course! -, so that there is only one logical time (= sample time) that is used by SystemClock and TempoClock.

In sclang there’s no fundamental reason why events cannot be scheduled on different clocks, e.g. AppClock - or threads, e.g. a network callback thread. In order to reasonably schedule server events (meaning: send any OSC message at all…), we need SOME notion of what the current logical time is,

The AppClock implementation does not have to change at all: after receiving a signal, it reads the current logical time and compares it against the top of the priority queue to schedule the next UI clock event. Logical sample time and system time are close enough, so unless you schedule an event several hours in the future, this should be ok.

For sending/receiving OSC bundles (to other applications) we need to know the NTP time. This can be done by sampling the NTP time on the Server and sending it to the Client. This means that for every block we know the logical sample time + the corresponding NTP time. We might use a time DLL filter to estimate time points between blocks.


Multiple (local) Servers are a bit tricky. One solution would be to set one Server as the master that drives the language scheduler. Each Server samples the current system time at the very first callback and uses it as an offset. Local servers running on the same audio device should not drift apart, so the respective logical sample times wouldn’t drift, either. However, there would be a slight constant offset - depending on the accuracy of the system clock and the time difference between the Server starts.

What I mean is: we cannot guarantee that, for the duration of one audio thread wake-up at audioClockTime=1000, we will be able to wake the client (which includes: sending a TCP/UDP message, waking a client thread, and potentially waiting for the global lock to be freed by another process) and have it get a response back to the server in time for it to process any audioClockTime=1000 events. If we want accurate scheduling, latency is a hard requirement - it’s already too late for us to make deadlines for audioClockTime=1000 once we’re in our 1000 audio callback. The only possibility here is to wait the audio thread for a response - but an unbounded wait like this break all real-time guarantees, so this isn’t an option.

This can be accounted for by latency just fine, as it is now. This case just feels confusing to me because, I would have an expectation as a user that the if I schedule an event for audioClockTime=1000 on the server, it occurs at that time and not latency seconds later. This can already be confusing with the current clock implementations, it only seems slightly worse to imply that we have “sample accurate clock sync” when in fact it would be quite hard to e.g. schedule events in a way that synced up with other clients to that audio clock.

When I say “wake early” - we can resolve the semantic weirdness here by simply applying the latency to our wake up times rather than when we schedule bundles. In other words, when we to process logicalTime=1000 we send the wake-up message to do this at audioClockTime=1000-latency so we have enough time to produce any messages scheduled for that time.

I don’t think it’s feasible or efficient to wake sclang every e.g. 32 samples. This could be ~300 microseconds - many thread sync mechanisms have a latency in the 10-100 microsecond range - this means we could be spending 1/3 of our time JUST waiting for threads to wake. If we want regular, predictable wake-ups, the only way I know to get them is to schedule them on a clock rather than using sync mechanisms. This means either scheduling on a non-audio clock and doing drift/jitter compensation, or simply making the client have it’s own audio thread that just does wake-up scheduling (but this misses the cool part of the whole proposal, namely that the server specifies wake-ups rather than having another audio thread in the client).

I understand that this may solve some problems elegantly. But consider: if we wake the client on the audio callback, this means that all client code run during that callback must be realtime-safe. All of a sudden, users without deep knowledge of realtime audio processing will have to make decisions about e.g. which parts of the Pattern / Event system they are able to use based on what might get called during this audio callback. This is an enormous burden to add to a system that has never had to support this. If I had to guess, I would say that most of the Pattern/Event system would be unreliable or unusable in this case? DEFINITELY no event scheduling/processing that I’m doing in any personal projects could reliably hit audio clock deadlines, though my cases are definitely more complex than the norm.

Putting this aside: other clocks are needed to schedule UI updates and interact with systems that are bound to specific threads - these will still need to exist. This is why e.g. things that were on AppClock (processed on the UI thread) cannot work on an audio clock.


Sorry, not wanting to shoot down the entire proposal, which I think is fundamentally very good - arbitrary server-based clocking is a fantastic idea (consider also, for example, being able to use audio-based impulse clocks, the kind of thing used in modular synthesis - this is another form where the server needs to send time updates to the client, which would drive it’s scheduler in some way). But tight coupling between the server and client here brings a LOT of baggage that is either impossible to work around, or deeply affects how sclang can possibly be used.

Generally, in RT synthesis, “now” is always too late. Unless you secretely change the meaning of “now” by adding extra latency :slight_smile: I just don’t see the need. For OSC messages, we do not need any time base, they are just dispatched as soon as possible. For OSC bundles, users are already aware that they need to apply latency to avoid late bundles… The Pattern system hides this fact by automatically scheduling OSC bundles with Server.latency.

I don’t think it’s feasible or efficient to wake sclang every e.g. 32 samples. This could be ~300 microseconds - many thread sync mechanisms have a latency in the 10-100 microsecond range - this means we could be spending 1/3 of our time JUST waiting for threads to wake.

It is very much feasable. While developing the plugin bridge for VSTPlugin, I have made some IPC benchmarks. With platform semaphores, waking up a subprocess + waiting for a reply takes about 3-15 microseconds on my crappy Windows laptop. (~3 us when running in a tight loop, ~15 us with a simulated workload between iterations). One-way wake up times are significantly shorter, accordingly. Here’s the benchmark code: test/shm_test.cpp · master · Pure Data libraries / vstplugin · GitLab

Also, it’s not like the Client has to go to sleep after every wake up; instead it will often just decrement the semaphore and move on.

But consider: if we wake the client on the audio callback, this means that all client code run during that callback must be realtime-safe

No, with the RT Server, the Client runs independently. Yes, some Routines will take longer than others, but the scheduler will eventually catch up. This is the same with the current system. This kind of language jitter is one of the reasons why we need Server latency in the first place.

Only with the NRT Server, the Client would be in hard sync.

Putting this aside: other clocks are needed to schedule UI updates and interact with systems that are bound to specific threads - these will still need to exist. This is why e.g. things that were on AppClock (processed on the UI thread) cannot work on an audio clock.

In my last post I tried to explain how this works. I don’t see a real problem there.

But tight coupling between the server and client here brings a LOT of baggage that is either impossible to work around, or deeply affects how sclang can possibly be used.

On the contrary, I claim that the user only has to set a Server option and otherwise would not have to be aware of it. I think it would not affect existing code significantly.

I guess the only case that’s worth discussing is the bundle case - since non-bundled messages aren’t scheduled and are processed ASAP, it doesn’t really matter WHICH clock we use: they happen when they happen. But these messages aren’t really compatible with most usages of bundled messages, since you’ll lose any ordering that you might have on the client side.

I’ll put aside my comments about keeping the “same” logical time between client and server (e.g. the reverse latency proposal) since this already diverges from the current way things work. This for me is a nice-to-have, but it wouldn’t be a regression if it wasn’t there.

I believe it will HAVE to go to sleep between wake-ups, otherwise other threads will not be able to acquire a language lock. Specifically, this means things like: UI can’t update, ScIDE messages can’t be processed, messages from other MIDI/network ports will not be processed.

Thanks, this clears up a few of my assumptions. Maybe to clarify, here’s what I imagine would happen with your proposal - maybe you can check if my assumptions here are correct?

  1. scsynth wakes up at audioClockTime=1000. It sends an OSC message to the client, something like [\audioClockTime, 1000].
  2. Some time in the future, the sclang TCP/UDP thread reads the incoming OSC message.
  3. TCP/UDP thread waits to acquire the language lock if sclang is already running on another thread (e.g. UI thread).
  4. TCP/UDP sets the current time of SampleClock, e.g. SampleClock.beats = 1000.
  5. This triggers the processing of all scheduled events with (1000 - bufferSize) < time <= 1000 on the TCP thread.
  6. New OSC bundles will be scheduled at SampleClock.beats + latency, meaning something like 1000 + 19200.
  7. OSC bundles are received by the server and executed at the appropriate times.

This feels workable - I would argue that this doesn’t require any additional server work at all - it just requires a monotonic server clock that sends every hardware buffer… Something like SendReply.ar(Impulse.ar(hardwareBufferSize/SampleRate.ir), '/audioClockTick', Phasor.ar(1, end:inf)).

At most, this needs (a) a mechanism to sync SendReply to the beginning of a hardware block, and (b) a Phasor replacement that can do proper OSC time tags with integer time.

If this were our model, I would maybe have four concerns:

  1. What are the performance implications of sending an OSC message every hardware buffer?
  2. What are the performance implications of receiving an OSC message, acquiring a language lock, and querying the scheduler at realistic worst-case buffer sizes (e.g. 3000 times/second for hardwareBufferSize = 32).
  3. Driving a scheduler from the audio hardware clock means that our wake-up granularity is linked to our hardware buffer size. This has the effect of linking the latency requirements to the buffer size also. Since buffer size can easily change in ways that are not visible to the user / between audio devices, will this create user scenarios where patch might have NO late messages with one audio device, and MANY late messages for another audio device?
  4. Is our worst-case wake-up time granularity enough that there will be user-observable consequences? For example, something like: Routine({ inf.do { midiValue.set(x); (1/30).wait } }).play(SampleClock) looks like it will send MIDI message at a rate of 30/second, but with hardware buffer size of 4092 @ 44100, you’ll get only 10 wake-ups per second, resulting in three MIDI messages being sent at once, every 1/10 of a second. Of course, one answer is: don’t run anything off of SampleClock except musical event stuff… But this limitation feels non-ideal?