Safe(r) way to launch a subprocess?

jamshark70 · June 15, 2022, 9:35am

I’m finding that launching VCV Rack with unixCmd causes something to go wrong during SC’s shutdown sequence. One consequence is that port 57120 remains blocked after shutdown.

Run my setup (which launches a2jmidid and Rack).
After playing, close Rack manually.
Reboot interpreter.
NetAddr.langPort is now 57121. (The port will eventually be released, could be minutes or hours later.)
Oh also at this point, if I try to quit the IDE, it will hang until I force kill it from the command line.

I can’t think of any reason why subprocesses would cause goof-ups with network port management, or why already-terminated processes would block scide from shutting down, but there we are.

Is there anything different I can do so that it actually works properly?

Or do we have unix-pipe bugs that nobody else has found yet?

hjh

PS Actually the culprit might be a2jmidid… but I already have "kill -9 %".format(Library.at(\jackmidipid)).systemCmd; in ShutDown and it seems not to be doing anything.

Spacechild1 · June 15, 2022, 8:08pm

On Unix environments a subprocess is typically created with fork() + exec(). fork inherits all file descriptors of the parent process (fork(2) - Linux manual page) - unless the file/socket/pipe has the FD_CLOEXEC flag set. This can be done with fcntl (c - What does the FD_CLOEXEC fcntl() flag do? - Stack Overflow) or directly in open with O_CLOEXEC (open(2) - Linux manual page)

Is there anything different I can do so that it actually works properly?

a) don’t start and detach child processes from within sclang
b) open an issue on GitHub and ask to set the FD_CLOEXEC bit on all files and sockets
c) use Windows :-p

Spacechild1 · June 15, 2022, 8:14pm

I’ve now remembered that I’ve been bitten by this issue in the context of VSTPlugin. There was a certain plugin that would start a background daemon. Consequently, the Server would not boot anymore because the port appeared to be taken. Took me quite some time to figure out what was going on…

jamshark70 · June 15, 2022, 11:14pm

I’m not a C guru so… what’s the drawback?

I mean, c) is clearly a sarcastic answer, and a) is not exactly sarcastic but could be. So I don’t know if b) is something that should generally be done but often isn’t, or if it’s problematic in other ways.

Returning to my “PS” comment though – we do have a problem with ShutDown – the interpreter shuts down preemptively and doesn’t wait for ShutDown functions to finish. This might be a good thing in case a ShutDown function infloops (for example, someone could accidentally create a self-recursive data structure within the global Archive, and then the attempt to write the Archive to disk upon shutdown would never finish), but I tend to think that issuing a “kill” command by systemCmd (which 1/ I would expect to finish very quickly and 2/ is supposed to block for completion) should be given a chance to finish without the language cutting it off… I did try to manage the processes, but the relevant SC feature doesn’t quite work as one would expect.

hjh

Spacechild1 · June 15, 2022, 11:21pm

I’m not a C guru so… what’s the drawback?

I don’t think there’s a drawback, it just needs some work Please report this issue on GitHub!

a) is actually a serious answer. Why do you need to launch VCV Rack from within sclang instead of the terminal?

jamshark70 · June 16, 2022, 1:12am

OK – thanks, it wasn’t clear to me.

My live setup is (or, can be):

SC.
a2jmidid as a MIDI bridge. (This is necessary because of a catch-22 between SC and Rack MIDI. If I MIDIClient.init first, then launch Rack, then SC doesn’t see Rack’s MIDI port. If I launch Rack first, then its MIDI input modules will not find SC, reverting to null. This requires manual intervention, which I don’t want. This solution is to open the bridge first. Then, SC MIDIClient and Rack can both find the bridge.)
Rack.

It’s more reliable to launch a live setup on stage if it’s completely automated. I should be able to run one block of code, wait some seconds, and then be ready to play.

I would really rather not, for instance, have to open two terminal tabs, run a pair of batch scripts, then run something in SC. This increases the risk of mistakes.

Actually, I found yesterday that when I manually killed a2jmidid, then port 57120 was released – so the real problem is that, even though I have a shutdown action to kill that process, ShutDown doesn’t actually do it.

In light of that, I think, no, I would really rather not accept a clumsy onstage startup sequence because of a bug somewhere else.

A possible solution to that might be to introduce an intrinsic variable into ShutDown, for a number of seconds to wait before finally closing the language. It’s currently assumed (I think) that shutdown will be instantaneous – I don’t believe this is a consciously taken design decision.

hjh

MarcinP · June 16, 2022, 1:15am

I have used .unixCmd somewhat extensively at times (though mostly on macOS). There are projects that would be very very very inconvenient to operate if that didn’t work reliably

As to finding what’s wrong… did you try using the action in “unixCmd” to check whether the process finished and what was the exit status? Does .unixCmd and .systemCmd result in the same behavior? Does running the command in a new shell possibly circumvent this (sh -c \"command\")?

EDIT: here are some workarounds for a likely unrelated bug (though also with .unixCmd, so maybe related?). The takeaway is that sending a command to the background might change how SC behaves. See ssh command locks up GUI thread · Issue #4175 · supercollider/supercollider · GitHub

jamshark70 · June 16, 2022, 4:24am

a2jmidid doesn’t terminate. It should stay up until the end of the session.

In a terminal, you can kill it manually. I’d rather not open a terminal window for it though. A “kill” command should also work (but hasn’t been).

When I’m seeing the problem, a2jmidid remains in “ps” output after the interpreter has already stopped – so the process doesn’t finish.

hjh

prko · June 16, 2022, 5:04am

jamshark70. 19h
I’m finding that launching VCV Rack with unixCmd causes something to go wrong during SC’s shutdown sequence. One consequence is that port 57120 remains blocked after shutdown.

Run my setup (which launches a2jmidid and Rack).

After playing, close Rack manually.

Reboot interpreter.

NetAddr.langPort is now 57121. (The port will eventually be released, could be minutes or hours later.)

Oh also at this point, if I try to quit the IDE, it will hang until I force kill it from the command line.

The phenomenon of incrementing the port number of currently-launched sclang is caused by the not-freed port number used by sclang when quitting sclang.

I had a similar problem:

I used the main laptop (MacBookPro) and four very slow laptops with a detachable keyboard. These four laptops were used as tablets to show sheet music to individual players. I wanted to control the sheet music page displayed on all five machines, so all sheet music on the screen can spontaneously synchronise since all devices are connected via Wi-Fi. I used SC-IDE only on the MBP; on other laptops, sclang was launched via command prompt using a batch file. It was not bad, but the four laptops were extremely slow and sometimes had to be forced to quit. The problem was that the port number used by the previous instance of sclang was not freed by the command prompt, and the newly-launched instance of sclang took a new port number; thus, I could not control the newly launched instance of sclang. It was impossible to resolve the problem using sclang or the command prompt window, which previously ran sclang. I tried to make a new command window from the previous command prompt window, but it could not also resolve the problem, and the port number of sclang was increasing further.

How did I fix the problem?

I had to quit sclang and the command prompt window manually. However, this manual way is very inefficient, so I automated this process by sending the OSC message for stopping sclang to Max on each machine; as soon as Max received the OSC message, it executed the batch file containing the following three lines:

taskkill /f /im scsynh.exe
taskkill /f /im sclang.exe
taskkill /f /im conhost.exe

I do not use Linux, but I think you could resolve the problem, in the same manner using pd or python. From my own experience, I could say that we should quit sclang from outside of sclang by using an application running entirely independently from the process related to sclang to reboot sclang without encountering the issue of an incremented port number. For example, if the first launching of sclang is done in a terminal window, even that terminal window itself should be destroyed before launching a new instance of sclang.

Spacechild1 · June 16, 2022, 8:32am

@prko The problem here is not sclang but rather a detached child process that keeps running independently. See my explanation at the top of this thread: Safe(r) way to launch a subprocess? - #2 by Spacechild1

Spacechild1 · June 16, 2022, 8:39am

Ahhh, so you do not want the child processes to outlive sclang. In this case your usage of unixCmd is less problematic and the actual issue is that somehow you did not manage to kill the processes. Can you show your exact shutdown code?

We should still address the issue with inherited file descriptors, though. At the very least there should be a warning in the help file.

prko · June 17, 2022, 2:15am

Yes, I know that the problem is related to a subprocess, even though I do not perfectly understand what you technically wrote in your post.
In my post, I wanted to say that quitting sclang should be executed not as a subprocess but as an independent process from sclang or from the terminal that runs sclang. The reason is that the subprocess which lets sclang reboot keeps the port number of the previous instance of sclang further even though it is quitted.

I wrote my post not clearly and was too wordy. Sorry for it.

jamshark70 · June 17, 2022, 10:04am

Here is how I am launching a2jmidid – using Array:unixCmd, because this bypasses an enclosing shell process.

["a2jmidid", "-e", "-u"].unixCmd;
-> 11003

I confirm that the pid really does belong to a2jmidid and not to a shell.

"ps x | grep a2jmidid".unixCmd;
  11003 ?        SLl    0:00 a2jmidid -e -u

If I kill this ID, the process goes away – so the general approach is sound.

"kill -9 11003".systemCmd;

"ps x | grep a2jmidid".unixCmd;
-- it's gone

That’s what I expected with ShutDown, but… here’s a reproducer:

Library.put(\a2jpid, ["a2jmidid", "-e", "-u"].unixCmd);

MIDIClient.init;

ShutDown.add { "kill -9 %".format(Library.at(\a2jpid)).systemCmd };

// now Language --> Reboot interpreter

"ps x | grep a2jmidid".unixCmd;
-- oh, there it is

I found just now that the issue doesn’t reproduce without opening MIDIClient – so my theory now is that MIDIClient shutdown takes longer than a few milliseconds and blocks other shutdown actions, and then sclang says “pffffft this is taking too long, I’m exiting anyway,” leaving garbage behind.

hjh