Keeping sclang and scsynth in hard sync

Spacechild1 · March 17, 2022, 1:34pm

When the Client wakes up, the Server might already have posted to the semaphore several times. Or the Server might post to the semaphore while the Client is running a Routine. (This is almost guaranteed to happen because of language jitter and because the Server tends to process blocks in batches.) In that case, the Client can simply decrement the semaphore and run the next time slice. Maybe I was not precise enough: what I meant was that the Client does not have to sleep between each time slice. Anyway, I am pretty sure that wake up latency is not a thing to worry about.

Maybe to clarify, here’s what I imagine would happen with your proposal - maybe you can check if my assumptions here are correct?

To be clear: I don’t aim to drive the language client with OSC messages - at least not with the RT Server -, instead I would use a named semaphore together with a lockfree FIFO in a shared memory segment. The language scheduler would run independently from the network thread, just like the current system. There is no need for a SampleClock, either. SystemClock and TempoClock would all be running on logical sample time.

With a NRT server, on the other hand, it would make sense to drive the language with OSC messages because it gives us an easy way to sync with Server reply messages! Asynchronous commands are executed synchronously and the /done message is guaranteed to be delivered at the same slice - before computing any audio. This means that people can use action functions and OSC responders with the NRT Server and get deterministic results! Note that this would only work reliable with a TCP connection:

Server sends /tick to Client and waits for incoming messages
Client receives /tick and dispatches Routines
the Client might send OSC messages/bundles to the Server
[for each asynchronous command, the Client waits for the /done message]
finally, the Client sends /tick_done to the Server and waits for more messages
the Server reads all incoming messages up to the /tick_done message
finally, the Server computes a block of audio

Since buffer size can easily change in ways that are not visible to the user / between audio devices, will this create user scenarios where patch might have NO late messages with one audio device, and MANY late messages for another audio device?

The audio hardware buffer size already plays a role when trying to find the minimum workable Server latency! The sad answer is: the latency has to be adjusted per system.

looks like it will send MIDI message at a rate of 30/second, but with hardware buffer size of 4092 @ 44100, you’ll get only 10 wake-ups per second, resulting in three MIDI messages being sent at once, every 1/10 of a second

Uhhh, I forgot about MIDI. Thanks for pointing this out! Actually, Pd has this very problem. However, sclang has a similar problem: although MIDI messages are scheduled with latency to compensate for language jitter, this latency value is only actually used in the CoreMIDI backend - in the portmidi backend is completely ignored! Check the implementation of prSendMIDIOut in SC_CoreMIDI.cpp and SC_PortMIDI.cpp. I remember discussing this issue on the mailing list 1-2 years ago.

For both kinds of schedulers, the solution could be to use a dedicated MIDI send thread for the portmidi backend.

For the sample time language scheduler, we can do the same as for OSC bundle scheduling/dispatching: for each tick, the Server estimates the current NTP with a DLL filter (like it currently does for OSC bundle dispatching) and sends it to the Client together with the logical sample time. In the Client, we would then know the (estimated) NTP time for each logical time point and both MIDI backends can use it for their scheduling.

jamshark70 · March 17, 2022, 2:35pm

A possibly naive question: Is it necessary for the client to run all scheduled tasks immediately upon a tick callback?

If the tick is for time 1000 and we expect this time slice to cover 1000.0 - 1000.023 (23 ms ~= 1024/44100), couldn’t the scheduler pop off all the items within the time slice and offset them by the difference of their scheduled time minus the time slice’s start time?

It’s a bit more complex that way but in theory you could have language activities between ticks. I had considered something like this for MIDISyncClock (but I think I never actually did it).

hjh

Spacechild1 · March 17, 2022, 2:57pm

Yes, certainly! Otherwise everything would be aligned to block boundaries - which would be far from sample-accurate The scheduler would pop items from the priority queue as long as they fall within the current time slice. On each item, it sets the current logical time accordingly and then executes the Routine.

but in theory you could have language activities between ticks

A Routine can, of course, schedule another Routine that falls within the same time slice. We might also release the language mutex between dispatching items, so that another thread has a chance to grab the mutex.

jamshark70 · March 17, 2022, 11:25pm

What I was thinking goes more like this: scztt raised the case of MIDI, where a large hardware buffer (say, nearly 100 ms) would effectively cause outgoing MIDI to be quantized to block boundaries, because tick = 1000 covering 1000.0 to 1000.1 (roughly) would immediately run a task scheduled for 1000.002 and one for 1000.07 at the same time.

And you’re saying that they would run with logical time set appropriately – but you haven’t said whether they are physically waking up as soon as possible after tick time or not (but it sounds to me like they would).

If you get a tick for 1000.0, could 1000.07 wait for 70 ms first, before firing?

I’m ignoring jitter in the tick transport layer, but is there any other reason why it has to be “pop, set logical time, go” instead of “pop, schedule for the real wake-up time and let that thread handle it”?

I guess that would be less efficient for small block sizes (which could be a deal-breaker), but more accurate for large blocks.

FWIW I’m well out of my depth here – speculating. I wouldn’t be surprised if there’s a very good reason why not to do that.

hjh

Spacechild1 · March 18, 2022, 12:21pm

Hi @jamshark70,

you’re raising a good point here!

If you get a tick for 1000.0, could 1000.07 wait for 70 ms first, before firing?
I’m ignoring jitter in the tick transport layer, but is there any other reason why it has to be “pop, set logical time, go” instead of “pop, schedule for the real wake-up time and let that thread handle it”?

Yes, we might indeed sleep between Routine callbacks! Here’s a modified pseudo-version of the current scheduler code, showing both options:

static void schedRunFunc() {
    using namespace std::chrono;
    unique_lock<timed_mutex> lock(gLangMutex);
    // The scheduler may have already been stopped by the time we acquire this
    // lock, so we need to check the condition now.
    if (!gRunSched) {
        return;
    }

    VMGlobals* g = gMainVMGlobals;
    PyrObject* inQueue = slotRawObject(&g->process->sysSchedulerQueue);

    while (true) {
        assert(inQueue->size);

        // wait for next block
        lock.unlock();
        gServerSemaphore.wait();
        if (!gRunSched || gServerBlockQueue.empty())
            goto leave;
        auto blockInfo = gServerBlockQueue.pop();
        auto logicalSampleTime = blockInfo.sampleTime;
        auto deadline = logicalSampleTime + blockDuration;
        auto logicalSystemTime = blockInfo.systemTime;
        lock.lock();

        // dispatch all ready events
        while (inQueue->size > 1) {
            auto nextTime = slotRawFloat(inQueue->slots + 1);
            if (nextTime >= deadline)
                break; // not ready yet
        #if 1
            // a) execute task immediately
            // ....
            lock.unlock();
            // give another thread a chance to grab the language mutex
            lock.lock();
        #else
            // b) wait until the next event's logical system time.
            // NB: we need the loop because we might be woken up early.
            high_resolution_clock::time_point now;
            do {
                now = high_resolution_clock::now();
                auto delta =  nextTime - logicalSampleTime;
                auto schedPoint = oscTimeToChrono(logicalSystemTime) + sampleTimeToChrono(delta);
                if (now >= schedPoint)
                    break; // ready!
                // wait
                gSchedCond.wait_until(lock, schedPoint);
                if (!gRunSched)
                    goto leave;
            } while (inQueue->size > 1);
            // perform all events that are ready - might be more than the one!
            while (inQueue->size > 1) {
                auto nextTime = slotRawFloat(inQueue->slots + 1);
                auto delta =  nextTime - logicalSampleTime;
                auto schedPoint = oscTimeToChrono(logicalSystemTime) + sampleTimeToChrono(delta);
                if (nextTime >= deadline || schedPoint > now)
                    break; // not ready
                // execute task
                // ...
            }
        #endif
        }
    }
leave:
    return;
}

Now the question is: what does it buy us?

send timestamped OSC bundles to the Server: it doesn’t make a difference as they are scheduled anyway.
send OSC messages to the Server: In theory, they can now be sent with proper delays, but the Server still processes audio blocks in batches. I.e. the Client might be able to send OSC messages every 1.5 ms, but with a hardware buffer size of 1024 samples @ 48 kHz, the audio callback might only spend, like, 5 ms computing audio and sleep for 17 ms. We don’t really gain much here. If you need precise timing, you have to schedule as OSC bundles, or at least decrease the hardware buffer size.
send MIDI: assuming that we fix MIDIOut.latency, MIDI messages would be timestamped, so it wouldn’t make a difference.
receive MIDI: here it would help to maintain the relative timing between MIDI messages, which is important if you want to schedule things on the Server. If you only send OSC messages, you will again experience Server jitter, as explained above. Of course, there is always language jitter.
send/receive OSC messages to other applications: might help to preserve relative timing. (Of course, there will always be network jitter.)
send timestamped OSC bundles to other applications: no difference
receive timestamped OSC messages: no difference, unless the timestamp falls within the current time slice and before a scheduled task (in this time slice). In this case, waiting before executing scheduled tasks gives the network thread a chance to grab the language mutex before the task and thus prevent the bundle from being “late”. With large hardware buffer sizes or block sizes, it can make a real difference!

I guess that would be less efficient for small block sizes (which could be a deal-breaker), but more accurate for large blocks.

I don’t see a problem regarding efficiency. After all, the current scheduler does exactly this. If there are no tasks, there are no (extra) wakeups.

On the other hand, waking up the Client for every audio block does indeed come with a performance cost, but we are talking about something like 0.5-2% of a core when idle. (When I run Pd with -noaudio -sleepgrain 1, meaning it would wake up every 1 ms, the idle CPU load is 1.5% of a single core.) Once the Client gets busy, there is less chance to actually go to sleep, so it becomes less of an issue.