TempoClock loop between slow operations + short waits causes sclang freeze

The following code causes the interpreter to freeze and become permanently unresponsive on my machine.

(
Routine({
	loop {
		var x;
		x = 3;
		10000.do { x = x.cos; };
		0.0001.wait;
	};
}).play;
)

The behavior is dependent on the wait time and the number of iterations. For example, with 1000.do or with 0.001.wait it is fine. With 100000.do it’s an immediate crash with a message saying grey count error, and the lowest order of magnitude wait time that doesn’t freeze or crash is 0.1s. So it is not just short waits, but a combination of short waits and expensive operations. You might need to adjust the numbers to repro on your machine, as I suspect this is a race condition.

Is there any workaround to this? I could empirically test what the minimum safe wait time is in seconds and use a bandage like max(1e-3 * thisThread.clock.tempo, intendedWaitInBeats).wait, but then the timing of the Routine will become musically inaccurate, and it’s still not a guarantee against a freeze. (Attempting to work around that by forking at each expensive operation is also a no-go, because then I get errors about the scheduler queue being full.)

EDIT: I have reproduced this issue in 3.13 on Linux. Originally found the issue on my client’s machine, 3.12 on macOS. The code I wrote for him worked fine in testing, and had safeguards against very short waits, but as he started using the software at a larger scale, the inter-wait operations became slower and his live setup started freezing at high tempos.

What version are you running?

3.13. I can try upgrading if you believe that is relevant. This is also repro’d on 3.12.

Yes I think it is as I’m pretty sure I fixed this Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. by JordanHendersonMusic · Pull Request #6261 · supercollider/supercollider · GitHub

I can unfortunately reproduce this on a fairly recent develop build (3.15.0-dev fb0f85a924 from around a month ago), MacOS 14.7.2

TempoClock is designed to try as hard as possible not to drift, under the musical use case of rhythmic wait times (or, wait times well above the human threshold for detecting separate attacks) with relatively short operations.

Part of that effort is that, when a scheduled activity yields control and is rescheduled (0.0001.wait), if the new scheduled time is earlier than the current physical time, the TempoClock’s internal while will continue and pop the rescheduled thing off the queue, which will again exceed its wait time, so that the time deficit continues to grow.

The request here seems to be for TempoClock to detect a growing deficit automatically and allow logical time to catch up with physical time. But I’m not sure how to do that without breaking musical timing for the case of two parallel rhythmic threads where one of them occasionally incurs a time deficit, but not consistently so. In this latter case, the same design that causes the reported lockup is also the clock’s recovery mechanism from isolated time deficits.

In the absence of preemptive threading, I’m not sure that this can be fixed, not without making SC’s main use case worse.

Edit: If the expected behavior for a specific case is to let it drift off of logical time, maybe try AppClock? It might still lock up, but it would be easier to fix that lockup in AppClock because it makes no promises about maintaining logical time consistency.

hjh

I get the hang on 3.15, but no grey count error (so far).

GC should be irrelevant to this issue.

hjh

Hopefully the issue is the too short wait time and what to do about it, not the grey count error…

I suspect this can be explained because frames aren’t collected normally. They are manually freed when returning from a frame in the interpreter, but the grey count isn’t decremented which leads to an overflow, a negative shift (implementation defined behaviour), and eventually a segfault on some systems, or printing grey count error. This is fixed from 3.14 and on onwards by making the type of grey count a u64.

TempoClock and SystemClock have always worked like this. It’s a known and understood issue for the couple of decades that SC has existed; recent GC changes have not introduced a new hang that wasn’t there before.

In the TempoClock source code, there’s a while loop that can be summarized like this.

  • while(true) (outer loop)
    • if(top of the queue time > now) then sleep until reaching top of the queue time
    • update logical time = top queue time
    • while(top queue time <= now)
      • run the thing
      • if it returned a number, reschedule for logical time + that number
    • end while
  • end while

For example, if a task woke up at beat 100.0, and it takes 0.01 seconds, and you rescheduled it for 0.001 seconds later, then the reschedule time is 100.001 but “now” is 100.01. So “while(top queue time <= now)” is true, and the inner loop runs again, and the next reschedule time is 100.002 and now is 100.02, and so on. The TempoClock thread never has a chance to sleep in that case, so it keeps its lock on the interpreter, and then it’s hanging.

Rescheduling based on the item’s logical time is the only valid way to preserve rhythm. “Fixing” it by always rescheduling for later than physical time would be wrong (and that’s what AppClock is for, anyway). Note also that it’s valid to reschedule for 0 beats later (Ppar and siblings depend on this).

The only thing I can think of is to set a threshold for either the number of times an item can run a deficit, or for the size of the deficit, and let the item drift off of logical time after that threshold is hit.

hjh

Thanks! The scheduler is something I haven’t found the time to dig into yet.

So it’s as if the wait wasn’t even there, yes that’s a problem, but it also reinforces why performance of sclang.matters!

Perhaps a possibile work around could be some new method like…

{ ... Expensive action...}.waitForOrElse(0.001) { 
   |timetaken|
   Error("Wait too short").throw   
};

Or, this could even be a part of the loop interface, as you’re only really likely to have problems with this when doing rapid iteration.

loop(minimumWait: 0.001) {
   ... Expensive stuff ...
   ... Always releases the lock ...
}

Is it possible to release the lock, but reschedule for the next available loop?

AppClock is designed to allow drift… does it have to be TempoClock? (I guess “it’s the default clock used by fork” but we need TempoClock for strict logical time cases, and this isn’t that.)

I kinda think the best solution is to ensure that AppClock releases its lock periodically and then use that.

hjh

FWIW: I just confirmed that AppClock does allow other activities –

(
r = Routine({
	loop {
		var x = 3;
		10000.do { x = x.cos; };
		0.0001.wait;
	};
}).play(AppClock);

AppClock.sched(1.0, { r.stop; "stopped".debug; });
)

// prints (1 second later):
stopped

(
r = Routine({
	loop {
		var x = 3;
		10000.do { x = x.cos; };
		0.0001.wait;
	};
}).play;

TempoClock.default.sched(1.0, { r.stop; "stopped".debug; });
)

// ^^ stop func never runs = hanging

So I believe the solution to the stated case is to use AppClock instead (and recheck the documentation to make sure this is written down somewhere).

I also printed out logical and physical time in these loops. With TempoClock, consistent with my explanation of the while loop behavior, when scheduling runs a deficit, logical time advances according to wait, getting further and further behind physical time (which, in an infinite loop case, means that the clock will never catch up with reality). (“Before” is before 10000.do; “after” is after “.do” and before “.wait”.)

// TempoClock.default
BEFORE: logical time = 217.187944623, physical time = 217.188259749
 AFTER: logical time = 217.187944623, physical time = 217.18958872
BEFORE: logical time = 217.188044623, physical time = 217.190512407
 AFTER: logical time = 217.188044623, physical time = 217.191982423
BEFORE: logical time = 217.188144623, physical time = 217.192802132
 AFTER: logical time = 217.188144623, physical time = 217.194019431

But in AppClock, logical time in the next iteration is always greater than the ending physical time of the previous iteration: the first “after” ends at physical time 232.836248716, and the next iteration starts at logical time 232.838330235 (where, with TempoClock, the second iteration’s logical time is earlier than the end of the previous iteration). Other scheduled activities can wake up in this gap.

// AppClock
BEFORE: logical time = 232.8353119, physical time = 232.835355601
 AFTER: logical time = 232.8353119, physical time = 232.836248716
BEFORE: logical time = 232.838330235, physical time = 232.838361538
 AFTER: logical time = 232.838330235, physical time = 232.839031765
BEFORE: logical time = 232.840721854, physical time = 232.840753487
 AFTER: logical time = 232.840721854, physical time = 232.841276844

hjh