Loading buffers with samples exit server with code 0

kesey · November 25, 2024, 6:24pm

Thank you for your answer.

I´ll follow your advice and I´ll rebuild SC to 3.13.0 after the debug session.

I have to confess that I´m totally confuse cause in the link that you share:

github.com/flucoma/flucoma-sc

Bug: FluidDataSet results in server crash on Linux

opened 11:06AM - 13 Aug 24 UTC

chriskiefer

bug

### Please tell us what you were doing! You can include code and files by drag a…nd dropping them into the text area. ~inData = FluidDataSet(s); ### What was the expected result? A new FluidDataSet ### What was the actual result? -> FluidDataSet(0) Server 'localhost' exited with exit code 0. server 'localhost' disconnected shared memory interface ### What operating system were you using? Linux ### Operating system version Ubuntu 24 LTS, SC 3.13.0 ### FluCoMa Version 1.0.7

the bug is present with this setup:

### Operating system version

Ubuntu 24 LTS, SC 3.13.0

### FluCoMa Version

1.0.7

This is the first post,
and sorry for the redundant informations but later in the thread, I can read:
“i’ve found a workaround - compiling the latest SuperCollider 3.14.0-dev from source fixes the issue, and works fine with 1.0.2 and 1.0.7 - so it sounds like it could be an issue with the SC 3.13 ubuntu package rather than flucoma”

Maybe I´m missing something here but aren´t the informations in this thread in contradiction with your advice ?

One last thing:
I have another machine with the same version of SuperCollider 3.14.0-dev but I built it more than one year ago.
I have no problem with Flucoma 1.0.7 and SuperCollider 3.14.0-dev on this machine.

So if I understand your explanations about the update of boost library.
A build of SuperCollider 3.14.0-dev form one year ago and a build of SuperCollider 3.14.0-dev from today is not the same software, the version of boost has been change but the version of SC stay the same.
It´s confusing for me.
I wonder how many quarks or library who use boost like Flucoma are broken today by this change.

Sorry if this is silly questions or reflections.

kesey · November 25, 2024, 6:35pm

Thank you.

This is the output of the bt command:

(gdb) bt
#0  0x0000758ff3098d61 in __futex_abstimed_wait_common64 (
    private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, 
    futex_word=0x5a615f94f950) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, 
    abstime=0x0, clockid=0, expected=0, futex_word=0x5a615f94f950)
    at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (
    futex_word=futex_word@entry=0x5a615f94f950, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
    private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000758ff30a4f0f in do_futex_wait (sem=sem@entry=0x5a615f94f950, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000758ff30a4fa8 in __new_sem_wait_slow64 (sem=0x5a615f94f950, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x00005a615e463d0f in boost::sync::linux_::semaphore::wait (
    this=0x5a615f94f950)
    at /home/fabien/Logiciels_Son/SuperCollider/external_libraries/boost_sync/include/boost/sync/detail/semaphore/semaphore_posix.hpp:83
#6  World_WaitForQuit (inWorld=0x5a615fb245c0, unload_plugins=true)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/SC_World.cpp:765
#7  0x00005a615e4238df in operator() (__closure=0x7ffe45047860)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
--Type <RET> for more, q to quit, c to continue without paging--

and if I type RET, I got this:

--Type <RET> for more, q to quit, c to continue without paging--
#8  std::__invoke_impl<void, scsynth_main(int, char**)::<lambda()>&> (__f=...)
    at /usr/include/c++/13/bits/invoke.h:61
#9  std::__invoke_r<void, scsynth_main(int, char**)::<lambda()>&> (__fn=...)
    at /usr/include/c++/13/bits/invoke.h:111
#10 std::_Function_handler<void(), scsynth_main(int, char**)::<lambda()> >::_M_invoke (__functor=...) at /usr/include/c++/13/bits/std_function.h:290
#11 std::function<void ()>::operator()() const (this=0x7ffe45047860)
    at /usr/include/c++/13/bits/std_function.h:591
#12 EventLoop::run(std::function<void ()>) (waitFunction=...)
    at /home/fabien/Logiciels_Son/SuperCollider/common/SC_EventLoop.hpp:38
#13 scsynth_main (argc=25, argv=<optimized out>)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
#14 0x0000758ff302a1ca in __libc_start_call_main (
    main=main@entry=0x5a615e421240 <main(int, char**)>, argc=argc@entry=25, 
    argv=argv@entry=0x7ffe45047b68)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#15 0x0000758ff302a28b in __libc_start_main_impl (
    main=0x5a615e421240 <main(int, char**)>, argc=25, argv=0x7ffe45047b68, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffe45047b58) at ../csu/libc-start.c:360
#16 0x00005a615e422795 in _start ()
(gdb)

which looks very similar to the output of the where command

dscheiba · November 25, 2024, 6:42pm

I am not too deep into Flucoma and haven’t read the whole thread that I linked, I only linked the last comment which sums up the situation.

Around 2 months ago we introduced a boost upgrade via upgrade to boost 1.86 · supercollider/supercollider@10619ee · GitHub - everything after this commit is using the new boost version which currently is incompatbile with the flucoma plugins that are for download on their website because flucoma expects a different boost version from SuperCollider.

It should be possible to get Flucoma running if you compile it with the newest SuperCollider version because then it would also use the updated boost version. But I haven’t tried this.

I think the easiest way to work with flucoma currently is to stick to 3.13.0.

The alternative would have been to not be able to compile SuperCollider with newer macOS versions anymore, so there was really no alternative to this. The Flucoma people used some (valid) hack which now breaks things for them (only flucoma is affected).

Using development versions gives you benefits of the most recent features, but also provides some additional work, but we are always happy about people using the development version and reporting errors!

smoge · November 25, 2024, 6:52pm

two things:

Memory corruption from the error fluid::client::copyReplyAddress(void*)

2, Deadlock in World_WaitForQuit waiting on a semaphore (high number of threads)

Then, the program terminates…

There was a memory corruption with FluidManipulation before (or during?). So maybe both things have to do with this.

Can you test without the package? If it works, then recompile the package and try again loading all the buffers again?

kesey · November 25, 2024, 7:49pm

Thank you for the details.

Excuse my lack of knowledge in the domain.
I would have imagine that it would be possible to have different build for different OS (like there is different releases) and in this case having the boost update just for macOS build but there certainly is a good reason for that not to be the case.

kesey · November 25, 2024, 8:04pm

Without Flucoma, if I run the bt command in gdb,
I get:

sudo gdb -p 13285
[sudo] Mot de passe de fabien : 
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 13285
[New LWP 13294]
[New LWP 13293]
[New LWP 13292]
[New LWP 13291]
[New LWP 13289]
[New LWP 13288]
[New LWP 13287]

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/jack/libjack.so.0

warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libcap.so.2

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-support.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-journal.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_eqbw.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/delay.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamAutoSat-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_reflector.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamComp-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/mvclpf24.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_chorusflanger.so

warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libcsound64.so.6.0

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamDynamicEQ-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGEQ31-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_autopan.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_dynamics_m.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_eq.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamTube-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGateX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/amp.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGrains-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/cs_chorus.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_dynamics_st.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/filter.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZaMultiCompX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_limiter.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamEQ2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/sine.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamGate-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_pinknoise.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/cs_phaser.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_reverb.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_tremolo.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamCompX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_tubewarmth.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/mvchpf24.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_doubler.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZamDelay-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/noise.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_sigmoid.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_vibrato.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_deesser.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_rotspeak.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_echo.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/tap_pitch.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/ladspa/ZaMaximX2-ladspa.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/spa-0.2/support/libspa-dbus.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-rt.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-protocol-native.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-client-node.so

warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/pipewire-0.3/libpipewire-module-metadata.so
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007cf955e98d61 in __futex_abstimed_wait_common64 (private=<optimized out>, 
    cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x57fc3d019370)
    at ./nptl/futex-internal.c:57

warning: 57	./nptl/futex-internal.c: Aucun fichier ou dossier de ce nom
(gdb) bt
#0  0x00007cf955e98d61 in __futex_abstimed_wait_common64 (
    private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, 
    futex_word=0x57fc3d019370) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, 
    abstime=0x0, clockid=0, expected=0, futex_word=0x57fc3d019370)
    at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (
    futex_word=futex_word@entry=0x57fc3d019370, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
    private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x00007cf955ea4f0f in do_futex_wait (sem=sem@entry=0x57fc3d019370, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x00007cf955ea4fa8 in __new_sem_wait_slow64 (sem=0x57fc3d019370, 
    abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x000057fc3bd8fd0f in boost::sync::linux_::semaphore::wait (
    this=0x57fc3d019370)
    at /home/fabien/Logiciels_Son/SuperCollider/external_libraries/boost_sync/include/boost/sync/detail/semaphore/semaphore_posix.hpp:83
#6  World_WaitForQuit (inWorld=0x57fc3d1b7640, unload_plugins=true)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/SC_World.cpp:765
#7  0x000057fc3bd4f8df in operator() (__closure=0x7fff8406fb80)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
--Type <RET> for more, q to quit, c to continue without paging--

then I press RET and get:

--Type <RET> for more, q to quit, c to continue without paging--
#8  std::__invoke_impl<void, scsynth_main(int, char**)::<lambda()>&> (__f=...)
    at /usr/include/c++/13/bits/invoke.h:61
#9  std::__invoke_r<void, scsynth_main(int, char**)::<lambda()>&> (__fn=...)
    at /usr/include/c++/13/bits/invoke.h:111
#10 std::_Function_handler<void(), scsynth_main(int, char**)::<lambda()> >::_M_invoke (__functor=...) at /usr/include/c++/13/bits/std_function.h:290
#11 std::function<void ()>::operator()() const (this=0x7fff8406fb80)
    at /usr/include/c++/13/bits/std_function.h:591
#12 EventLoop::run(std::function<void ()>) (waitFunction=...)
    at /home/fabien/Logiciels_Son/SuperCollider/common/SC_EventLoop.hpp:38
#13 scsynth_main (argc=25, argv=<optimized out>)
    at /home/fabien/Logiciels_Son/SuperCollider/server/scsynth/scsynth_main.cpp:460
#14 0x00007cf955e2a1ca in __libc_start_call_main (
    main=main@entry=0x57fc3bd4d240 <main(int, char**)>, argc=argc@entry=25, 
    argv=argv@entry=0x7fff8406fe78)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#15 0x00007cf955e2a28b in __libc_start_main_impl (
    main=0x57fc3bd4d240 <main(int, char**)>, argc=25, argv=0x7fff8406fe78, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fff8406fe68) at ../csu/libc-start.c:360
#16 0x000057fc3bd4e795 in _start ()

This looks not so different from the previous ones.

smoge · November 25, 2024, 8:25pm

Yea. The deadlock in World_WaitForQuit happens in both cases - it’s not FluCoMa-specific
scsynth process last words before being browned by the semaphore:

__libc_start_main_impl
scsynth_main
EventLoop::run
World_WaitForQuit
boost::sync::linux_::semaphore::wait
 __futex_abstimed_wait_common64

The bug is multi-threaded related, hang, failed shutdown, maybe race condition, or something nasty along those lines ??

The relation with the Boost update may be possible because:

boost::sync::linux_::semaphore::wait(this=0x57fc3d019370)

One would need to check if the update touched that.

@kesey Dude, you may be a bit tired of that, but you could do one more test. Check out the commit before the boost update, compile supercollider, and run everything like this time. If this happens after one commit, this has to be bisected.

@dscheiba, can you help him with the commit hash and git commands?

kesey · November 25, 2024, 9:27pm

No problemo, I´m not tired at all.
I´d learn things here with the debug process etc and I want to say thank you, one more time, at everyone involved in this thread who try to help me and maybe other users who might have the same problem in the futur.
I´m happy to do what ever it takes to resolve the problem.

A little help for the commit hash and git command is very welcome.

Spacechild1 · November 25, 2024, 10:17pm

boost::sync is not an official boost library though. It’s a library that Tim proposed to boost but never got accepted so SC ships the sources. Therefore it can’t be affected by a boost upgrade.

smoge · November 25, 2024, 10:17pm

I just now saw what was/wasn’t touched. However, many codes were modified everywhere, such as boost::asio, filesystem, etc.

But Tim’s code’s 10th birthday is today in the repo. And it’s touché for me on that one.

jamshark70 · November 26, 2024, 1:18am

It may also be useful to find a minimal example that reproduces the problem. Neither SuperDirt nor FluCoMa are “minimal.”

Such as perhaps:

(
fork {
	var numToTest = 50;
	var chunkSize = 50;
	var paths = PathName(Quarks.at("DirtSamples").localPath).deepFiles
	.scramble
	.keep(numToTest)
	.collect(_.fullPath);
	
	b = paths.collect { |path, i|
		if(i + 1 % chunkSize == 0) {
			s.sync;
		};
		Buffer.read(s, path);
	};
	
	"done".postln;
};
)

// after:
b.do(_.free);

Trying this with different numbers of files to read, and different chunk sizes, would help find the boundary of the problem.

I have to admit that I don’t understand the relationship between the several high-priority threads that smoge noted vs buffer reading. Buffer reading should be on the lower priority thread, and in the server code, buffer reading is handled by “sequenced commands” which run sequentially in a queue, not in parallel. I checked this empirically by sending two Buffer.read’s at the same time, the first for a half-gig file, and the second for a11wlk01.wav. If the reads are parallel, then the small file read should finish first. But the large read always finishes first. So it is queued.

Weird problem.

hjh

smoge · November 26, 2024, 1:24am

That’s the principle of shrinking in property-based testing. I thought about that too, even to document some kind of “safe zone” with no bugs.

smoge · November 26, 2024, 1:40am

Actually World_WaitForQuit does nothing; the semaphore does not know what to do (meaning: semaphore in World_WaitForQuit isn’t being correctly signaled? Perhaps in this case, could be a few other thread(s) may be holding resources that shutdown needs?), and the program doesn’t shut down correctly.

But there still can be many things related to thread synchronization…

Disclaimer: But it was not a solemn analysis; I was done quickly, and I’m just an apprentice in life. Because of other threads, (and it just needs a word misplaced.) I must add: If you spot a mistake, keep the conversation constructive, participate, and let’s always be respectful.

kesey · November 26, 2024, 8:59pm

Your exchange with @Spacechild1 is not clear for me.
Should I rebuild SuperCollider before the update boost ? or not ?

smoge · November 26, 2024, 9:11pm

He’s the boss. Check with @jordan too

kesey · November 26, 2024, 10:15pm

This is what I get from testing with your code:
the server exit with Server 'localhost' exited with exit code 0. server 'localhost' disconnected shared memory interface just above these values:

var numToTest = 600;
var chunkSize = 550;

this is an average, sometimes it works with greater values like (var numToTest = 800; var chunkSize = 750; )

I tried lke this too:

(
fork {
	// var numToTest = 600;
	var chunkSize = 400;
	var paths = PathName(Quarks.at("Dirt-Samples").localPath).deepFiles
	.scramble
	// .keep(numToTest)
	.collect(_.fullPath);
	
	b = paths.collect { |path, i|
		if(i + 1 % chunkSize == 0) {
			s.sync;
		};
		Buffer.read(s, path);
	};
	
	"done".postln;
};
)

and a chunk size of 400 is the average when I load all the samples.

jordan · November 26, 2024, 10:38pm

400 is way to large, try more like 10 for a reliable result - should be fast enough.

I’ve found switching to TCP also helps greatly - don’t know how that works with tidal though …

smoge · November 26, 2024, 11:05pm

@kesey, did you check if it does crash with TCP?

kesey · November 26, 2024, 11:46pm

How can I switch to TCP ?

jamshark70 · November 27, 2024, 1:58am

I agree with Jordan (“400 is way to large, try more like 10 for a reliable result - should be fast enough”).

How does it work with this?

var numToTest = 600;
var chunkSize = 50;

That’s going back to an earlier comment of mine – if it isn’t working to dump all of the b_allocRead requests onto the server at once, break them up into smaller chunks.

If you reported that it wasn’t working for a small chunkSize either, then I’d be very concerned. But if it does work by loading a large number of buffers in smaller chunks, then… at least that’s a workaround.

hjh