Loading buffers with samples exit server with code 0

kesey · November 27, 2024, 11:05am

You ask me to test the boundaries, that’s why I try with small values at first then gradually increase with larger numbers until it fails.
Now you say “400” is way too large sorry but I don’t understand, this is nonsense to me.

kesey · November 27, 2024, 11:09am

@jamshark70 & @smoge
Is the debugging process over ?
Cause if it’s the case I’ll try to build SuperCollider 3.13 and see if I can use FluComa with this version.

Thank you very much for your help.
Goog luck to anyone who will face the problem in the future

jamshark70 · November 27, 2024, 11:40am

There are two objectives here:

Find out what’s going wrong with the large number of requests.
Set up a configuration on your machine that works.

You found the boundary of the problem – good.

Jordan’s comment, and my follow-up, are intended to say that if you want a working environment and if it’s not working to send multiple hundreds of requests at once without a pause, then reducing the number of at-once requests is likely to move you toward a working environment.

What you’re saying isn’t necessarily wrong – in principle, the server shouldn’t stall or crash under a large number of sequenced commands. But, if it does stall or crash in that condition, and if the cause can’t be identified (quickly), then it may be a reasonable step to modify the load sequence to avoid the stress condition, and then be able to get on with your work.

I for one think that having a working configuration that allows me to get on with my work is a rather sensible goal.

If I communicated that unclearly, I apologize.

hjh

smoge · November 27, 2024, 3:26pm

I will quote a straight-to-the-point definition of shrinking from a blog post. My selection is not based on the “best” ideas or concepts behind the theory of property-based testing. I selected the parts that are simpler to understand and simpler to do and will help the current context of this particular thread. If you want to know something substantial, read the original articles.

Our goal is to provide simpler counterexamples. But what do we mean by “simpler” or “minimal”? The general notion has to do with information. We want to remove noise from the test case. We want to get rid of artefacts that do not participate to the error. We want our arguments to contain only the information needed to make our test case fail.

We can, therefore, understand shrinking as the process of removing some information from our arguments until all the information contained in these arguments is necessary for the test case to fail.

Later, there are some “hands-on” concepts when you have to do this. This si to give a flavor of what you are looking for! See:

The first step toward being able to provide simpler counterexamples is to figure out a way to reduce the amount of information of each of the arguments that lead to the property failing.

To know where to start, we will enumerate some of the things we know and want about this shrinking process
[IMPORTANT: THIS IS RELATED TO EXAMPLE IN THE ARTICLE]]:

Quantifying information, thus reducing it, highly depends on the considered type

It involves search: there is no known universal solution to find a minimal test case

Shrinking might not make sense for all values (zero) or types (functions)

We would like equal values to shrink in a similar deterministic way

To sum up, we need a way to express an optional process that is neither random nor generic and involves non-determinism.

Grab those bits of information and guidance from here. I hope that is useful for you!

Source related to this post Code your own QuickCheck (Shrink)

Side Note (off-topic)

Those three blog posts are also helpful to undo some myths and folklore (which is not uncommon in general in the sc dev community, unfortunately, not just about this topic in particular) as if property-based tests were something “complex” or “difficult to implement”, and many nonsense opinions I’ve heard when I was trying to bring ideas to the team here. I hope the community also improves in this sense: less things-like-that-give-the-name-you-whant-here and more genuine dialog.

First post in the 3-blog-post series

kesey · November 27, 2024, 9:32pm

hm hm, so calling the scramble method isn’t a good idea.

fork {
	var numToTest = 50;
	var chunkSize = 50;
	var paths = PathName(Quarks.at("DirtSamples").localPath).deepFiles
	// .scramble // non deterministic when uncommented
	.keep(numToTest)
	.collect(_.fullPath);
	
	b = paths.collect { |path, i|
		if(i + 1 % chunkSize == 0) {
			s.sync;
		};
		Buffer.read(s, path);
	};
	
	"done".postln;
};

aside from that, I can simplify a little, but it isn’t really helping me to understand the problem

(
fork {
	var sampleNumber = 50;
	var chunkSize = 50;
	b = List.new;
	
	sampleNumber.do { |item, i|
		if(i + 1 % chunkSize == 0) {
			s.sync;
		};
		b.add(Buffer.read(s, "path/to/your/sample"));
	};
	
	"done".postln;
};
)

smoge · November 29, 2024, 4:53pm

Well, “non-deterministic” is not good or bad by itself. (Another assumption with poor justification that happens, but there is no real ground for that)

Some of it can help you achieve deterministic results and “shrink” the problem to its simplest form.

.scramble may not always help with this goal, but it may in some specific cases.

For example, if the bug only happens with certain permutations of a list, in this case, a list of sound files to load buffers (and you know that because of previous tests), some additional test methods would make sense. But that would be a stage of a general, more refined approach to tests.

kesey · December 1, 2024, 9:35pm

I can confirm that Flucoma works with SuperCollider 3.13.
Thank you

kesey · December 1, 2024, 9:48pm

I´m sorry to be back in this thread and to bother you again.
I found a minimal example that reproduces the problem (I think it´s difficult to achieve more minimal).
I get the message:

Server 'localhost' exited with exit code 0.
server 'localhost' disconnected shared memory interface

just by loading one buffer with a long sample (281 mo):
Buffer.read(s, /path/to/my/sample)

I have another older machine with half the ram and older cpu, less core etc. and I can run the above code (with the same sample) on it without problems.

For info, I rebuild SuperCollider in 3.13 version.

smoge · December 2, 2024, 3:09am

That’s a plot twist. Strangely, there are no other clues as to why this happens when you compare it to your older system.

jamshark70 · December 2, 2024, 4:21am

That’s useful because it rules out a whole lot of possibilities: It seems less likely to me that it’s a thread sync problem (though not impossible – but I’d have expected a thread sync problem to result in a deadlock rather than an outright crash), and it also couldn’t be stress from overloading the command queue (which was a reasonable guess at the time, but which evidence now contradicts).

Do you get anything different from gdb when loading this one large file?

I’d use RelWithDebInfo for this test (and in general fwiw).

hjh

smoge · December 2, 2024, 10:41am

So there was no deadlock in World_WaitForQuit ? How is that called? (I want to understand)

(as a matter of fact, there was more than one problem at the beginning of this thread.)

smoge · December 2, 2024, 10:49am

Debug will give more details, as far as I’m concerned.

jamshark70 · December 2, 2024, 2:01pm

Would a deadlock result in the server process exiting?

Anyway I’m a bit removed from the thread so perhaps just ignore my message.

hjh

kesey · December 3, 2024, 11:05pm

The informations returned by gdb seems pretty similar to the previous ones:

first I tryed again by following these steps:

jamshark70:

In SC: s.options.asOptionsString to get the server command line parameters.

Then, in a terminal:

gdb --args scsynth (and paste the cmd line args here)

... gdb info stuff...

// at gdb prompt:
run

Then, back in SC:

Server.default = s = Server.remote(\debug, NetAddr("127.0.0.1", 57110), s.options);

... should see:
Requested notification messages from server 'debug'
debug: server process's maxLogins (1) matches with my options.
debug: keeping clientID (0) as confirmed by server process.

Maybe also need to do:

s.connectSharedMemory;

and I can´t reproduce the exit (like it was the case with SuperDirt.start)
by doing Buffer.read(s, /path/to/my/sample) with the long sample of 281 Mo

If I boot the server without gdb and without running this:
Server.default = s = Server.remote(\debug, NetAddr("127.0.0.1", 57110), s.options);

I get a server exit with code 0 after running Buffer.read(s, /path/to/my/sample) with the long sample of 281 Mo.

After that, I tryed by attaching scsynth to gdb (sudo gdb -p 15311).
The server status bar turn from green to yellow.
I run Buffer.read(s, /path/to/my/sample) with the long sample of 281 Mo
I can see Buffer(0, nil, nil, nil, /path/to/my/sample) in the post window and no server exit

this is the output of gdb (whole output with the where command etc.):

GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 15311
[New LWP 15320]
[New LWP 15319]
[New LWP 15318]
[New LWP 15317]
[New LWP 15315]
[New LWP 15314]
[New LWP 15313]

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/jack/libjack.so.0

warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libcap.so.2

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-support.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-journal.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_eqbw.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/delay.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamAutoSat-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_reflector.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamComp-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/mvclpf24.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_chorusflanger.so

warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libcsound64.so.6.0

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamDynamicEQ-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGEQ31-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_autopan.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_dynamics_m.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_eq.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamTube-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGateX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/amp.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGrains-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/cs_chorus.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_dynamics_st.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/filter.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZaMultiCompX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_limiter.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamEQ2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/sine.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGate-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_pinknoise.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/cs_phaser.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_reverb.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_tremolo.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamCompX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_tubewarmth.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/mvchpf24.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_doubler.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamDelay-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/noise.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_sigmoid.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_vibrato.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_deesser.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_rotspeak.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_echo.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_pitch.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZaMaximX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-dbus.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-rt.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-protocol-native.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-client-node.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-metadata.so
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007ed9c6698d61 in __futex_abstimed_wait_common64 (private=<optimized out>, 
    cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x6455f9d7ce80)
    at ./nptl/futex-internal.c:57

warning: 57	./nptl/futex-internal.c: Aucun fichier ou dossier de ce nom
(gdb) where
#0  0x00007ed9c6698d61 in __futex_abstimed_wait_common64 (
    private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, 
    futex_word=0x6455f9d7ce80) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, 
    abstime=0x0, clockid=0, expected=0, futex_word=0x6455f9d7ce80)
    at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (
    futex_word=futex_word@entry=0x6455f9d7ce80, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
    private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x00007ed9c66a4f0f in do_futex_wait (sem=sem@entry=0x6455f9d7ce80, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x00007ed9c66a4fa8 in __new_sem_wait_slow64 (sem=0x6455f9d7ce80, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x00006455f908d8af in boost::sync::linux_::semaphore::wait (
    this=0x6455f9d7ce80)
    at /home/fabien/Logiciels_Son/SuperCollider/external_libraries/boost_sync/include/boost/sync/detail/semaphore/semaphore_posix.hpp:83
#6  World_WaitForQuit (inWorld=0x6455f9e32140, unload_plugins=true)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/SC_World.cpp:766
#7  0x00006455f904fb7f in operator() (__closure=0x7ffef0ea53e0)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
--Type <RET> for more, q to quit, c to continue without paging--
#8  std::__invoke_impl<void, scsynth_main(int, char**)::<lambda()>&> (__f=...)
    at /usr/include/c++/13/bits/invoke.h:61
#9  std::__invoke_r<void, scsynth_main(int, char**)::<lambda()>&> (__fn=...)
    at /usr/include/c++/13/bits/invoke.h:111
#10 std::_Function_handler<void(), scsynth_main(int, char**)::<lambda()> >::_M_invoke (__functor=...) at /usr/include/c++/13/bits/std_function.h:290
#11 std::function<void ()>::operator()() const (this=0x7ffef0ea53e0)
    at /usr/include/c++/13/bits/std_function.h:591
#12 EventLoop::run(std::function<void ()>) (waitFunction=...)
    at /home/fabien/Logiciels_Son/SuperCollider/common/SC_EventLoop.hpp:38
#13 scsynth_main (argc=25, argv=<optimized out>)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
#14 0x00007ed9c662a1ca in __libc_start_call_main (
    main=main@entry=0x6455f904d3a0 <main(int, char**)>, argc=argc@entry=25, 
    argv=argv@entry=0x7ffef0ea56e8)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#15 0x00007ed9c662a28b in __libc_start_main_impl (
    main=0x6455f904d3a0 <main(int, char**)>, argc=25, argv=0x7ffef0ea56e8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffef0ea56d8) at ../csu/libc-start.c:360
#16 0x00006455f904ea35 in _start ()
(gdb)

kesey · December 3, 2024, 11:17pm

For info I run a memtest on this machine last night, 9 pass and 0 error.

jamshark70 · December 4, 2024, 12:30am

If the server is inactive at the moment of trying to load the buffer, then it didn’t try to load the buffer, so nothing was tested.

The futex, then, is what was happening at the moment of attaching the process, and doesn’t provide any information about the buffer load problem. (I guess that’s consistent with my comment the other day, that an unexpected server exit doesn’t seem compatible with the idea of a deadlock.)

At the (gdb) prompt, running the command continue will reactivate the server, and then the status bar in the IDE turns green again.

So I think the steps should be: 1. Boot the server. 2. Attach gdb. 3. continue 4. Try to load the buffer and see if the server crashes.

(And, apologies for confusion about the process.)

Hm, so there must be some difference in the way that the server is starting, in gdb vs in sclang. At this time, I don’t have any idea what that could be.

hjh

kesey · December 4, 2024, 1:20am

Thank you very much.

I follow the steps you describe.
After the status bar turn green again, I run Buffer.read(s, /path/to/my/sample) with the long sample of 281 Mo
The server exit with

Server 'localhost' exited with exit code 0.
server 'localhost' disconnected shared memory interface

and gdb output:

(gdb) continue
Continuing.
[Thread 0x7cbba62006c0 (LWP 22373) exited]
[Thread 0x7cbba6c006c0 (LWP 22372) exited]
[Thread 0x7cbba76006c0 (LWP 22371) exited]
[Thread 0x7cbba80006c0 (LWP 22370) exited]
[Thread 0x7cbba8a006c0 (LWP 22368) exited]
[Thread 0x7cbc42c006c0 (LWP 22367) exited]
[Thread 0x7cbc436006c0 (LWP 22366) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.

jamshark70 · December 4, 2024, 2:56am

Hm. This isn’t a crash, then. If the server crashes, gdb will report that something bad happened at a particular place in the code.

Here, something has sent a “kill” signal to scsynth.

According to UNIX stackexchange, 1/ you might find some messages about this in a system log (the exact file name depends on your OS/distribution) – probably dmesg – and 2/ a likely culprit is the UNIX/Linux kernel’s “out of memory killer.”

You can see if the OOM-Killer is involved by looking at the output of dmesg, and finding a messages such as:

[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB

You’re not supposed to run out of memory – there should be swap space configured, so that even in a low memory condition, processes should still be able to allocate memory.

I’m not very familiar with these settings, but its looking like a plausible explanation for the problem is that this particular system configures memory overcommits differently from a system where SC is loading the buffers without issue, and/or the memory usage profile of the problem system is very different from that of the working systems.

hjh

jordan · December 4, 2024, 9:50am

What is 281 Mo?

(and some characters)

jamshark70 · December 4, 2024, 12:26pm

Mega-octets which is French for megabytes.

That’s one of the odd things here – it’s a big file but on modern systems, it’s not outrageous.

One other thought is, if s.options.memSize is very large but below the threshold to kill the process, adding a quarter gig audio file could push it over the threshold. One of the things that those “memory overcommit” settings is supposed to handle is when apps request more memory than they’re going to use right away. That doesn’t describe scsynth buffers, but it might describe the real-time memory pool.

memSize is currently 2**21 kb or 2GB. This is probably excessive. @kesey are you sure that, say, 2**16 or 2**17 wouldn’t be enough?

hjh