Loading buffers with samples exit server with code 0

Thanks, didn’t know that!

I´ve read the link you share, thank you for that.
I have the overcommit_memory set to 0.

but I can find any message looking like:

[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB

neither

[1962.987529] myapp[3303]: segfault at 0 ip 00400559 sp 5bc7b1b0 error 6 in myapp[400000+1000]

I don´t know if I´m doing it right, but these are the steps I followed.
boot the server in SC and make it exit with code 0 by calling Buffer.read(s, /path/to/myLongSample)
and just after that run sudo dmesg in a terminal and I´m looking for Out of memory or segfault at (like in the previous message) in the whole terminal output but I can´t find anything like that.
But I find this about OOM:

[    2.354513] systemd[1]: Listening on systemd-oomd.socket - Userspace Out-Of-Memory (OOM) Killer Socket.

I also try to search in /var/log/dmesg and I can find the same thing:

[    2.354513] systemd[1]: Listening on systemd-oomd.socket - Userspace Out-Of-Memory (OOM) Killer Socket.

Aside from that I can confirm that I have a swap file:

cat /proc/swaps
Filename				Type		Size		Used	Priority
/swap.img                               file		33554428	0	-2

And I try to set memSize at 2 ** 16 in my startup file but I get the same behaviour.

Sorry for that I´ll use MB in the futur

I’m certainly out of my depth here – IANALS (I an not a Linux sysadmin).

What is clear from your last gdb result is that something external to scsynth is killing scsynth, and this seems to be in response to a relatively large memory allocation request. (It’s not a segfault; it’s not a memory request that was denied.)

I did a search for ways that Linux could kill a process automatically, and top of the list was the OOM killer. There might be other reasons, which I don’t know. In any case, if you do further searching, this is the area to look around in. The gdb result (“Program terminated with signal SIGKILL, Killed”) is unambiguous that the scsynth process is being killed by something outside of itself, and we know it isn’t user action (you didn’t “kill -9 scsynth” in a terminal) – so it pretty much has to be some OS-level governor.

hjh

1 Like

I finally found a solution to my problem, a little by chance to be honest.
After trying all possible settings for overcommit_memory and overcommit_ratio.
I have eliminated the possibility of an oom killer issue.

At that point, I still thought it was probably coming from my kernel and started thinking about trying another one.

For your information, my kernel is as follows:
6.8.0-49-lowlatency #49.1-Ubuntu SMP PREEMPT_DYNAMIC

But before changing, I tried some adjustments by editing the grub file /etc/default/grub
I add the following:

GRUB_CMDLINE_LINUX_DEFAULT=“preempt=full nohz=on nohz_full=all threadirqs rcu_nocbs=all rcutree.enable_rcu_lazy=1 quiet splash”

After having done that, my problem disappeared, I can load the long sample (which caused the server to exit with code 0) and longer samples without worries.

Thank you for your help

2 Likes

That is a remarkable saga – we did need to eliminate a lot of possibilities before you could arrive there.

Would never have guessed that. Glad it’s working now.

hjh

Yes that’s why, I thank you all, your advice has put me in the right direction.

1 Like