DoesNotUnderstand error causing a crash

Unfortunately no simple reproducer, but I have a chain that goes like:

OSC message receipt → NotificationCenter → handler function → a “scene manager” Proto → selects a function from an array (based on OSC arg) → runs the function

At this point (within the chosen function), if I take an empty “process” (player) object and try to put it into the GUI, the GUI link-up tries to get info from the empty process, which fails (DoesNotUnderstand) and at this point sclang crashes.

Basically, a pretty deep stack, but I couldn’t find out how deep because dumpBackTrace truncates the printout :confused: But, if I do the faulty operation directly as a command (stack is much less deep), then it’s fine, the error is reported normally.

I can guard the offending calls in my code and that’s fine for now. But… DoesNotUnderstand became a lot more complicated in 3.14, with new primitives I think…? Probably some bugs there (very easy to miss something in a primitive).

hjh

A possibly related reproducer (not totally sure):

(
f = { |level = 100|
	protect {
		if(level > 0) {
			f.(level - 1)
		} {
			// "valid result".postln;
			f.doesnotunderstand;
		};
	};
	level
};
)

f.(22);  // grinds much longer than you think but eventually...
ERROR: Message 'doesnotunderstand' not understood.

f.(23);  // grinds, then crashes

The crash disappears if no error is thrown at the bottom of the recursion chain.

Curiously, though, if it’s Error("throw something else").throw at the bottom, rather than the misunderstood message, then f.(22) crashes too.

Without protect it can do a million recursive steps, no problem.

hjh

I am running: SC 3.14 Dell Laptop OS=Windows 11 RAM=16GB CPU=Intel i7

When I looked up: DoesNotUnderstandError I got this warning:
WARNING: SCDoc: In …_SuperCollider3_14-Install_\HelpSource\Classes\DoesNotUnderstandError.schelp

  • Method new has 4 args, but doc has 3 argument:: tags.

I ran your short code f = { |level = 100| …

Before running SC was using 96MB
I ran f.(22)
After the stack trace is posted:
SC is using 9GB & the total machine memory usage peeks at 97%
SC does not give back the 9GB.

After about 15 minutes (not using the machine) SC memory usage has droped to 4.27GB.

After another 3 hours (not using the machine) SC memory usage has dropped to 1.46GB.

A few more findings:

It isn’t specifically DoesNotUnderstandError – if I throw any error at that point in the code, it also crashes. But, even more bizarre, if I insert a this.dumpBackTrace somewhere before the error (which I did because I wanted to see what the call stack looks like from the OSC response), then it doesn’t crash, and I get the full call stack from the error dump…? That is weird.

The full stack, it turns out, isn’t so deep: 12 frames outside protect, 19 inside. Kinda… trivial…? Why is it crashing?

So the f test can’t be exactly the same. My live-set crash happens very quickly without a large memory spike, consistent with a stack trace of only 30-ish frames. f is chewing up RAM and taking a fairly long time at it.

In the f test, every level of recursion adds 4 stack frames. There are 3 frames at the beginning (Process:interpretPrintCmdLine, Interpreter:interpretPrintCmdLine and a < closed FunctionDef > generated by the interactive compiler), and 3 frames for Error-throw at the end. So f.(22) incurs 3 + (22*4) + 3 = 88+6 = 94 frames. If that’s taking 9+ GB of RAM, it’s … like… 90MB per frame? No way. So there is something wrong in getBackTrace for sure. Probably somebody at some point should valgrind that, though it’s probably only obliquely related to my original case.

Puzzled.

hjh

ERROR: bindBP preemptively throwing other error [hjh: that's my test error]

PROTECTED CALL STACK:
	a FunctionDef	0x7831d9433418
		sourceCode = "{ |bp, adverb, parms|
			var index = ~getIndexFromAdverb.(adverb, 0, \\fader),
			// hjh: the '.chan' call was failing
			mixer = Error(\"bindBP preemptively throwing other error\").throw, //  bp.v.chan,
			newMap;
			if(mixer.isNil) {
				mixer = bp.event[\\voicer];  // mixer as temp here
				if(mixer.notNil) { mixer = mixer.asMixer };  // mixer is really a Voicer at the start of this!
			};
			if(parms.isNil) {
				parms = ()  // setMute: false
			};
			// may be multiple mixers
			// note: 'do' handles [mixer, mixer...], and mi...etc..."
		arg bp = BP('nonexistent')
		arg adverb = 12
		arg parms = nil
		var index = 12
		var mixer = nil
		var newMap = nil
	a FunctionDef	0x598fa0872500
		sourceCode = "<an open Function>"
	a FunctionDef	0x598fa0870540
		sourceCode = "<an open Function>"
	Function:prTry	0x598f9fa7d100
		arg this = a Function
		var result = nil
		var thread = a Thread
		var next = a Function
		var wasInProtectedFunc = true
	Function:protect	0x598f9fa7c680
		arg this = a Function
		arg handler = a Function
		var result = nil
	Proto:use	0x598fa086ffc0
		arg this = a Proto
		arg func = a Function
		arg failSilentlyAfterFree = false
		var result = nil
		var saveEnvir = Environment[(pingCond -> a CondVar), (tempoCtl -> a GenericGlobalControl), (mainFx -> false), (focusWatcher -> a SimpleController), (mylib -> a SynthDescLib), 
  (hwOut -> class MixerChannel('hwOut', localhost, 2, 2)), (saveAddr -> a NetAddr(127.0.0.1, 57110)), (editor -> a Proto), (debugAddr -> a DebugNetAddr(127.0.0.1, 57110)), (rvb -> Synth('cll_freeverb2' : 1001)), 
  (longrvbmc -> class MixerChannel('longrvb', localhost, 2, 2)), (timer -> a DDWTimer), (shortrvb -> Synth('cll_freeverb2' : 1002)), (longr...etc...
	Proto:doesNotUnderstand	0x598fa0871dc0
		arg this = a Proto
		arg selector = bindBP
		arg args = nil
		var result = [BP('nonexistent'), 12, nil]
		var item = nil
	Object:chuck	0x598fa1d7b900
		arg this = BP('nonexistent')
		arg dest = a Proto
		arg adverb = 12
		arg parms = nil
	a FunctionDef	0x598fa49dfe28
		sourceCode = "<an open Function>"
	a FunctionDef	0x598fa0de11c0
		sourceCode = "<an open Function>"
	Function:prTry	0x598f9fa7d100
		arg this = a Function
		var result = nil
		var thread = a Thread
		var next = a Function
		var wasInProtectedFunc = true
	Function:protect	0x598f9fa7c680
		arg this = a Function
		arg handler = a Function
		var result = nil
	Environment:use	0x598fa0de0d40
		arg this = Environment[(pingCond -> a CondVar), (tempoCtl -> a GenericGlobalControl), (mainFx -> false), (focusWatcher -> a SimpleController), (mylib -> a SynthDescLib), 
  (hwOut -> class MixerChannel('hwOut', localhost, 2, 2)), (saveAddr -> a NetAddr(127.0.0.1, 57110)), (editor -> a Proto), (debugAddr -> a DebugNetAddr(127.0.0.1, 57110)), (rvb -> Synth('cll_freeverb2' : 1001)), 
  (longrvbmc -> class MixerChannel('longrvb', localhost, 2, 2)), (timer -> a DDWTimer), (shortrvb -> Synth('cll_freeverb2' : 1002)), (longr...etc...
		arg function = a Function
		var result = nil
		var saveEnvir = a Proto
	a FunctionDef	0x598fa49e02e8
		sourceCode = "{
		var sceneIndex = ~scene;
		var scenes = ~scenes;
		[~scene, ~scenes[~scene * 2]].debug(\"updateScene, calling func for\");
		topEnvironment.use { scenes[sceneIndex * 2 + 1].value };
		~updateSceneDisplay.();
	}"
		var sceneIndex = 21.0
		var scenes = [dummy, a Function, harmonics, a Function, alias, a Function, intro2, a Function, fmOvertones, a Function, distbell, a Function, thinkick, a Function, kalimba, a Function, guzheng, a Function, piano, a Function, ice, a Function, pmGrain, a Function, pulses, a Function, pulses2, a Function, bass, a Function, dxbass, a Function, drums1-retro, a Function, drums2-amb, a Function, drums3-tek, a Function, pad, a Function, wbass, a Function, testcrash, a Function]
	a FunctionDef	0x598fa49e1128
		sourceCode = "{ |index|
		~scene = index;
		~updateScene.();
		currentEnvironment
	}"
		arg index = 21.0
	a FunctionDef	0x598fa49e2688
		sourceCode = "<an open Function>"
		arg msg = [/sceneMenu, 22.0]
		var i = 21.0
	a FunctionDef	0x598f9fa81640
		sourceCode = "<an open Function>"
	a FunctionDef	0x598fa0870540
		sourceCode = "<an open Function>"
	Function:prTry	0x598f9fa7d100
		arg this = a Function
		var result = nil
		var thread = a Thread
		var next = nil
		var wasInProtectedFunc = false
	
CALL STACK:
	Exception:reportError
		arg this = <instance of Error>
	Nil:handleError
		arg this = nil
		arg error = <instance of Error>
	Thread:handleError
		arg this = <instance of Thread>
		arg error = <instance of Error>
	Object:throw
		arg this = <instance of Error>
	Function:protect
		arg this = <instance of Function>
		arg handler = <instance of Function>
		var result = <instance of Error>
	Proto:use
		arg this = <instance of Proto>
		arg func = <instance of Function>
		arg failSilentlyAfterFree = false
		var result = nil
		var saveEnvir = <instance of Environment>
	OSCMessageDispatcher:value
		arg this = <instance of OSCMessageDispatcher>
		arg msg = [*2]
		arg time = 109.674036664
		arg addr = <instance of NetAddr>
		arg recvPort = 57120
	< FunctionDef in Method Collection:collectInPlace >
		arg item = <instance of OSCMessageDispatcher>
		arg i = 0
	ArrayedCollection:do
		arg this = [*2]
		arg function = <instance of Function>
		var i = 0
	Collection:collectInPlace
		arg this = [*2]
		arg function = <instance of Function>
	FunctionList:value
		arg this = <instance of FunctionList>
		arg args = [*4]
		arg kwargs = [*0]
		var res = nil
	Main:recvOSCmessage
		arg this = <instance of Main>
		arg time = 109.674036664
		arg replyAddr = <instance of NetAddr>
		arg recvPort = 57120
		arg msg = [*2]

OK, so I just abstracted out a VSTPlugin-loader function and used it when creating a player object, and now it’s not even necessary to have an error for it to crash. It just crashes further down a different code path.

If I run my live setup from the command line (to try to gdb it), then it doesn’t crash that way. (Can’t do that onstage.) So I can’t even get info about where it’s crashing.

Look, I got a show on Friday. Last ditch effort will be to try to defer the operation to drop some of the stack frames. EDIT: This seems to be an OK workaround for now. Failing that, I guess I will have to revert to the latest SC 3.13 because Still, this is really not OK. I don’t have time to reboot the SC interpreter a hundred times in a day (yes, it really is like that). Never had this problem before.

hjh

In the above code, the f function is making a (recursive) call to itself from inside of the protect {} block. At run time this means that there is a protect fame wrapped around a protect frame wrapped around a protect frame and so on, N levels deep.

The following version has added a protect ‘error handler’ and a couple of post lines.

( // Version 2: added a protect error handler
f = { |level = 100|
	"In f with level=%\n".postf(level);
	protect {
		if(level > 0) {
			f.(level - 1)
		} {
			// "valid result".postln;
			f.doesnotunderstand;
		};
	}
	{ // protect error handler
		"In protect error handler with level=%\n".postf(level);
	};
	level
};

f.(5)
)

If we run this with a low number we can see the nesting of the protect blocks.

In f with level=5
In f with level=4
In f with level=3
In f with level=2
In f with level=1
In f with level=0
In protect error handler with level=0
In protect error handler with level=1
In protect error handler with level=2
In protect error handler with level=3
In protect error handler with level=4
In protect error handler with level=5
ERROR: Message ‘doesnotunderstand’ not understood.
Perhaps you misspelled ‘doesNotUnderstand’, …

In the next version, the recursive call to ‘f’ has been moved out of the protect {} block.

( // Version 3: call moved out of protect
f = { |level = 100|
	"In f with level=%\n".postf(level);

	if(level > 0) {
		f.(level - 1)
	} { // ELSE ... do something dangerous
		protect {
			f.doesnotunderstand;
		}
		{ // protect error handler
			"In protect error handler with level=%\n".postf(level);
		};
	};  // end-of: if-ELSE
	level
};
f.(100)
)

The code that causes the error is still in the protect block, but protect blocks are now not nested, and this version runs many times without any unexpected problems.

In f with level=100
In f with level=99
In f with level=98
In f with level=97
In f with level=96
In f with level=95
// many lines removed
In f with level=12
In f with level=11
In f with level=10
In f with level=9
In f with level=8
In f with level=7
In f with level=6
In f with level=5
In f with level=4
In f with level=3
In f with level=2
In f with level=1
In f with level=0
In protect error handler with level=0
ERROR: Message ‘doesnotunderstand’ not understood.
Perhaps you misspelled ‘doesNotUnderstand’, or meant to call ‘doesnotunderstand’ on another receiver?

The error when thrown is 100 (function call stack) levels deep, but only 1 level of protect is used.

It seem that the implementation of ‘protect’ cannot handle deep nesting of itself around itself.

If you have similar nesting of protect in your complex code, then this might be the cause of the problem you are experiencing. If not, then this might be unrelated.

This also works:
deep stack before calling function f, then deep stack in f before exception thrown:

I compacted both functions f & g, so that they take up less space in the post window for trace back.

(
f={|n|
“f.(%)\n”.postf(n);
if(n>0) {f.(n-1)} {
protect {f.doesnotunderstand}
{“f-handler,n=%\n”.postf(n)}};n};

g={|m,n|
“g.(%,%)\n”.postf(m,n);
if(m>0) {g.(m-1,n)} {protect {f.(n)}
{“g-handler,m=%\n”.postf(m)}};m};

g.(79,73)
)

Function g calls itself recursively m times, and then function g calls function f from within a protect block, and then (as before) function f calls itself n times recursively, and then from within a protect block causes an error/exception.

This runs fine with large numbers for m & n.

Both functions have an exception-handler for their protect block.
The exception is first caught in the exception-handler of the protect in function f.
But since SClang re-throws the exception after the handler is finished, we also see that
the exception is also caught by the exception-handler of the protect in function g.

OUTPUT:
g.(79,73)
g.(78,73)
g.(77,73)
g.(76,73)
g.(75,73)
g.(74,73)
. . .
g.(5,73)
g.(4,73)
g.(3,73)
g.(2,73)
g.(1,73)
g.(0,73)
f.(73)
f.(72)
f.(71)
f.(70)
f.(69)
. . .
f.(4)
f.(3)
f.(2)
f.(1)
f.(0)
f-handler,n=0
g-handler,m=0
ERROR: Message ‘doesnotunderstand’ not understood.
Perhaps you misspelled ‘doesNotUnderstand’. . .

Thanks for this. I think this is adjacent to, but not exactly the same as, my issue (and I can demonstrate that).

I’ll make two posts, because the more I look at it, the more I’m convinced these are separate issues. My recursive case was an attempt to produce a deep protect stack, but I think it’s triggering a different bug. So, continuing forward with this, I think it will be important not to mingle the cases.

Recursive case

Based on a bit more testing this morning, I suspect that the getBackTrace primitive doesn’t properly handle the case of reentry into the same function.

If I use some code-generation trickery to produce a deep chain of functions, where each function is compiled individually and guaranteed to be a distinct object, then the crash, high memory usage and poor performance disappear.

(
~iToVar = { |int|
	// letters are 0 to 25 i.e. mod 26
	// array size is predictable as the ceiling of log base 26 of int
	var array = Array.fill((log(int+1) / log(26)).ceil.asInteger max: 1, {
		var digit = int % 26;
		int = int div: 26;
		digit
	});
	var str = String(array.size);
	array.reverseDo { |ascii|
		str = str.add((97 + ascii).asAscii);
	};
	if(str == "pi") { str = str ++ "1" };
	"~" ++ str
};

~makeFuncChain = { |n|
	var c = CollStream.new;

	n.do { |i|
		if(i % 200 == 0) { c << "{\n" };
		c << ~iToVar.(i) << " = { protect { ";
		if(i == 0) {
			c << "Error(\"test\").throw"
		} {
			c << ~iToVar.(i-1) << ".value";
		};
		c << " } };\n";
		if(i % 200 == 199) { c << "}.value;\n\n" };
	};
	if(n % 200 != 0) { c << "}.value;\n\n" };

	c << ~iToVar.(n-1);
	c.collection.interpret  // returns head of func chain
};
)

For example, then ~makeFuncChain.(5) is compiling the following code:

{
~a = { protect { Error("test").throw } };
~b = { protect { ~a.value } };
~c = { protect { ~b.value } };
~d = { protect { ~c.value } };
~e = { protect { ~d.value } };
}.value;

~e

The outer { }.value block is to circumvent a limit on the number of literal functions allowed within a single function.

I can go up to multiple hundreds this way, without issue.

Let’s get a rough estimate of the performance profile:

a = Array.fill(100, { |i|
	var now;
	f = ~makeFuncChain.((i+1) * 10);
	now = Main.elapsedTime;
	try { f.value };
	Main.elapsedTime - now
});

I can’t explain the hiccup in the middle, but it is quasi-linear (as one would expect – there is no reason for stack frame objects to get bigger just because there are more of them). So this is a non-problematic case (and it’s closer to my real situation than the recursive case). Note that it’s going up to 1000 nested protect calls, and the total time to do the calls and then unwind the stack in getBackTrace is still within a couple ms. Not bad really (and rules out the idea that SC “can’t handle” stack-depth, as I was worrying in my real case).

In the truly recursive case, we’re observing an exponential increase in time. Using my original f function:

a = Array.fill(10, { |i|
	var now = Main.elapsedTime;
	try { f.value(i+5) };
	Main.elapsedTime - now
});

This suggests that getBackTrace is accidentally creating a recursive data structure, the size of which would grow exponentially per layer, and if it requires copying at each step, the performance would degrade.

That is, I took a shortcut to try to reproduce the issue, but ended up reproducing a different issue (still an unpleasant bug).

hjh

My real case

Here’s what I know:

  • The offending operation is being triggered by an OSC message (via OSCFunc, not the alternate code path that I wrote in the first message), e.g. ['/sceneMenu', 11.0]. The OSC layer has nothing insane going on.

  • The OSC response is to pull a function out of an array and evaluate it.

  • At this point, I’ve seen two distinct crashes:

    • The bp.v.chan crash. Here, the func being evaluated does BP(\something) =>.aNumber t where t calls into the function shown here, where v doesn’t exist so .chan throws an error, and it crashes then. TBH this is not a super-deep stack: OSCFunc stuff (6 stack frames), a protect, then the BP() =>... does a second protect, then dies.
    • A crash loading a VSTPlugin. Here, I didn’t have a crash when I just wrote the logic directly into the player object. But then I needed to load a couple of them, so I abstracted that part out into a separate function, and the combination of OSC + scene handler + calling that function crashed, where OSC + scene handler + VSTPlugin logic written inline did not crash.
  • Even more strangely, if I defer the evaluation of the scene function, then the crashes disappear. But the AppClock overhead in terms of stack frames is heavier than OSCFunc’s (11 frames vs 6)!

  • If I trigger the scene update by manual code instead of OSC, the crash disappears.

  • If I run my live set-up in a command-line sclang, then the original code (without the defer workaround) fails to crash. So I can’t get any stack traces from gdb :face_with_symbols_over_mouth: :face_with_symbols_over_mouth: :face_with_symbols_over_mouth:

Beyond that, I have no idea. I don’t see what is so exotic about any of this that it should take down the interpreter. And, since the boundaries of the crashing scenario are not known, I can only test my code before a gig and hope that it behaves the same on stage as at home.

hjh

is the problem introduced in 3.14?

maybe a case for git bisect to find the offending change if so?

OK, I won’t have time until after the show, though.

hjh

This thread is a little long, and if I understand correctly, there are two issues.

DoesNotUnderstand has a slow init classlib: DoesNotUnderstandError: defer expensive search until needed by jamshark70 · Pull Request #7105 · supercollider/supercollider · GitHub

protect invokes super-linear memory usage?

True, but not relevant to this thread.

… if the identical function object is reentered multiple times in the call stack (however that is also not relevant to the issue I was initially reporting).

This is the issue I’m really concerned about: DoesNotUnderstand error causing a crash - #10 by jamshark70

Should I close and delete this thread, and reopen with more focus? The process of analyzing the symptoms I was observing was not cleanly linear and organized, but that’s kinda normal. Would it help to remove all the junk?

hjh

I had a look at the protect issue in SC.lang code, and couldn’t find any reason for super linear growth. I’ll do some debugging in the interpreter when I get the chance.

I don’t actually understand what the issue is, I think I’d need some coffee to run to be able to follow.

I wasn’t suggesting this thread being messy was a bad thing, only looking to make sure all bugs are documented :slight_smile:

Thanks for having a look. I should log that formally.

Because it only happens when reentering the same function, my guess is that some data from the frame are being linked (somehow?) to the function object. Then, when encountering another stack frame with the same function object, it’s getting not only the function but also the extra data from the previous frame = a bit larger. Then the third picks up that composite. I haven’t looked at the C++ structures at all, so that’s a very loose conjecture, but it’s a crucial point that we can go up to 1000 levels of protect when each level is in a distinct function, but SC starts choking on the same at just a couple dozen recursive iterations of protect.

I don’t actually understand what the issue is

I… don’t either. Some magic combination of OSC response plus Proto usage (protect is here) plus a specific code path.

I think semiquaver is right that I should try to bisect it.

OK, cool. And it’s not that I’m unconcerned about the other issues, just that the one I highlighted could crash my set on Friday and those other two issues won’t :laughing:

hjh

The 2nd crash type (VSTPlugin) “feels like” a timing issue, particularly given the circumstance under which it does not happen.

Your original code worked (loading one VST), then you refactored your code to have a separate function for loading VSTs, and you used that to load multiple VSTs, is that correct? And then you get a crash?

Perhaps the load of the second VST started before the load of the first VST finished, and that caused the problem?

Also, if defer fixes the problem, why not use it?
I think your original concern was about stack depth, but now you think “protect with a deep stack” is not the issue, perhaps defer can be a work around for Friday?

Not exactly. My original code worked (loading one VST), and then I re-factored it, and then it crashed for one VST. That is, instead of function A directly calling the VSTPlugin methods, the crashing case was function A calls function B to call VSTPlugin stuff. ??? That’s the only difference.

The reason for refactoring was to load two plugins more easily, but it still crashed for only one.

Also note that the heavy lifting in VSTPlugin is in the server. The language is only sending OSC messages. You’re right that my code doesn’t control concurrency here, but, if sclang can send two /s_new messages in the same execution tick, then it should be able to send two /u_cmd, ... ..., /open messages in the same execution tick. There is nothing special about the /u_cmds (and FWIW the server doesn’t crash even when the requests are sent concurrently).

Of course, I’ve already committed that change into my environment.

I’m dissatisfied with the defer workaround because there’s no good reason why the behavior should be so radically different depending on whether the code path was initiated through Main:recvOSCmessage or Process:tick.

However, that dissatisfaction would not stop me from doing what’s necessary to have a successful show.

hjh

Well I am glad at least you have a temporary solution.
Good luck with the show tomorrow!
:level_slider: :musical_note: :control_knobs: :notes: