Experimental native PipeWire audio backend for scsynth (Linux), feedback wanted

Hi,

Most current Linux distros ship PipeWire as the system audio server. When you run scsynth with AUDIOAPI=jack on Ubuntu 22.10+, Fedora 34+, Arch, etc., libjack.so actually resolves to PipeWire’s libjack compatibility shim, and every audio callback is translated through it. The shim works, but it adds overhead.

I’ve written an experimental native PipeWire backend for scsynth that talks to PipeWire directly through pw_stream / pw_thread_loop, no shim. It’s full-duplex, uses -H for device targeting, -S for sample rate, -Z for buffer size, and integrates with the PipeWire graph the same way pw-cat or any other native client does. I’ve been using it daily for over a month.

Performance

Measured on my system with AMD Ryzen AI 9 365 with a Yamaha DM3 USB interface, PipeWire 1.0.5, 48 kHz / 1024-sample quantum. Both binaries built from the same scsynth source, only the audio backend differs.

N voices PipeWire CPU JACK-shim CPU savings
500 23% 37% −37%
1000 36% 69% −48%
1500 55% 90% −39%
2000 72% 128% −43%

(Default synthdef, N voices at distributed frequencies. The “JACK” baseline is the libjack-on-PipeWire shim, so what most Linux desktops actually run today, not jack2 with a real jackd.)

Practical effect: roughly 40–50 % less CPU at the same DSP load, or equivalently ~50 % more sustainable voices before deadline misses. On this machine sustainable voice count goes from ~1500 to ~2500.

Try it

git clone --recurse-submodules -b pipewire-backend-experiment https://github.com/lucdoebereiner/supercollider.git
cd supercollider
mkdir build && cd build
cmake -DAUDIOAPI=pipewire -DSUPERNOVA=OFF ..
make -j$(nproc) scsynth
# verify the binary links pipewire and not libjack:
ldd server/scsynth/scsynth | grep -E 'pipewire|jack'
# expected: libpipewire-0.3.so.0  (no libjack line)
./server/scsynth/scsynth -u 57110

Build dependency on Debian/Ubuntu: sudo apt install libpipewire-0.3-dev
I believe on Fedora that is: sudo dnf install pipewire-devel

SUPERNOVA=OFF is only because supernova isn’t ported yet.

To switch back to the JACK build, cmake -DAUDIOAPI=jack . and rebuild.

Quick test

After building, start the new pw server:

./build/server/scsynth/scsynth -u 57110

You should see boot output ending with:

PipeWireDriver: negotiated 1024 samples @ 48000.0 Hz, 8 out / 8 in channel(s)
SC_AudioDriver: sample rate = 48000.000000, driver's block size = 1024
SuperCollider 3 server ready.

For an audio test, point sclang at the same port and play a something:

Server.default.options.numOutputBusChannels = 8;
Server.default.options.numInputBusChannels  = 8;
Server.default.addr = NetAddr("127.0.0.1", 57110);
{ SinOsc.ar(440, 0, 0.2) ! 2 }.play;

Performance tests on your hardware

If you’d like to check whether the performance improvments hold on your machine:

git clone --recurse-submodules -b pipewire-backend-experiment \
    https://github.com/lucdoebereiner/supercollider.git
cd supercollider
mkdir build && cd build

# 1) Build the JACK backend variant and save the binary
cmake -DAUDIOAPI=jack -DSUPERNOVA=OFF ..
make -j$(nproc) scsynth
cp server/scsynth/scsynth ../tools/jack_signal_bench/scsynth_jack

# 2) Switch to the PipeWire backend and build again
cmake -DAUDIOAPI=pipewire .
make -j$(nproc) scsynth
cp server/scsynth/scsynth ../tools/jack_signal_bench/scsynth_pw

# 3) Run the sweep on both. Each cell takes ~10 s.
cd ../tools/jack_signal_bench
: > results.jsonl
for bin in scsynth_jack scsynth_pw; do
  for n in 100 500 1000 1500 2000; do
    python3 scsynth_sweep.py --binary ./$bin --synths $n --seconds 8 \
        --label "${bin}_n${n}" >> results.jsonl
    sleep 2
  done
done

# 4) Read the table
python3 -c "
import json
rows = [json.loads(l) for l in open('results.jsonl')]
for r in sorted(rows, key=lambda x: (x['binary'], x['synths_requested'])):
    print(f\"{r['binary']:<14} n={r['synths_requested']:>5}  avg={r['avg_mean']:>6.1f}%  peak_max={r['peak_max']:>6.1f}%  xruns={r['xruns']}\")"

This launches a fresh scsynth, loads the default synthdef, spawns N voices at
distributed frequencies, polls /status at 50 Hz for 8 seconds,
records avgCPU / peakCPU and any xrun lines from stderr, quits,
and emits one JSON summary line per cell. Hardware reports, lscpu,
audio interface model, PipeWire version, plus the resulting table,
would be very welcome in this thread.

Status. What works, what doesn’t

Works:

  • Full-duplex playback and capture, autoconnects to default sink / source
  • -H "outSink:inSource" device targeting (use pw-cli ls Node or wpctl status to find names)
  • -S for sample rate (any rate with a clean integer ratio to the PipeWire graph rate, 24k / 48k / 96k / 192k / 384k all confirmed working). You can set it via s.options.sampleRate.
  • -Z for buffer size (256, 512, 1024, 2048, …)
  • Format / latency reporting via PipeWire’s param_changed event
  • Adaptive quantum-change handling (PipeWire renegotiating the graph mid-session no longer breaks the driver)
  • Number of input/ouput channels via server options

Known gaps:

  • supernova not ported, still goes through the libjack shim
  • SoundIn end-to-end not verified yet, capture connects and reaches streaming state but I haven’t run a real passthrough synth
  • Cross-stream scheduling order between in/out streams not explicitly enforced (up to one quantum of capture-to-output latency in the worst case)
  • Rates with no clean integer ratio to the PipeWire graph rate (e.g. 44.1k against a 48k graph) abort cleanly with a diagnostic rather than working through internal re-blocking

Discussion

I’d love feedback, especially:

  • Reports from different hardware / interfaces, does the speedup hold?
  • Anyone who can test with real jack2 (no PipeWire) and see whether the gap is similar or smaller
  • Opinions on whether this is worth a PR

Issue on GitHub: https://github.com/supercollider/supercollider/issues/7499
Branch: https://github.com/lucdoebereiner/supercollider/tree/pipewire-backend-experiment

1 Like

Cool!

Practical effect: roughly 40–50 % less CPU at the same DSP load, or equivalently ~50 % more sustainable voices before deadline misses. On this machine sustainable voice count goes from ~1500 to ~2500.

I find this really strange. If anything, I would expect the Jack shim to have a small and constant overhead. After all, the number of jack API calls in an audio callback is constant. Why should the CPU overhead of the shim increase with the workload of the audio callback? This doesn’t make any sense to me…

I have no access to a Linux machine at the moment, but I would be curious to try it out myself.

1 Like

I’ll give this a shot, when I have a moment.

It actually hadn’t occurred to me to be dissatisfied with scsynth performance running through the jack shim, but if it can run more smoothly, that’s great!

hjh

1 Like

Thanks! Good to be suspicious here. :slight_smile: I profiled both builds with perf at the same load and the answer is not what I had assumed.

Both builds execute the same amount of work, about 28.5 billion instructions over 8 seconds. The driver code itself is well under a tenth of a percent of the time on either side. So the libjack shim is most likely not adding measurable per-callback work, just as you said it shouldn’t.

The two builds differ in the clock speed at which those instructions run. The JACK build averaged 2.64 GHz while the PipeWire build averaged 4.56 GHz.

I’m on a Ryzen AI 9 365, which mixes Zen 5 and Zen 5c cores: ~5 GHz vs ~2.7 GHz peak. The kernel was placing the libjack shim’s audio thread on a Zen 5c core and the native pw_stream’s audio thread on a Zen 5 core. That accounts for the whole frequency gap. It has done this consistently.

When I pin both builds to the same Zen 5 cores with taskset -c 0-3, the CPU usage at 500, 1000, and 2000 voices is within a few percent. At 1500 the JACK build is still about 25 % higher and takes three xruns in that against zero on PipeWire.

So the benchmark gap is repeatable in normal use, but the cause is not extra work in the shim. It is the kernel consistently placing the two backends’ audio threads on different core types. Results from a non-hybrid CPU would be very welcome.

Thanks! Good to be suspicious here. :slight_smile: I profiled both builds with perf at the same load and the answer is not what I had assumed.
Could you share the command line you used for perf?
Did you run scsynth and sclang without the scide GUI?

best, P

1 Like

Hi Peter,

I needed to allow perf to read counters (seems to locked by default on Ubuntu)

sudo sysctl kernel.perf_event_paranoid=1

Start scsynth under a fixed load (a few hundred to a few thousand default.scsyndef voices is enough), without scide. Then in another terminal grab its PID and run two measurements against it.

Hardware counters with total cycles, instructions, average frequency

 perf stat -p $(pidof scsynth) \
      -e task-clock,cycles,instructions,cache-references,cache-misses \
      -- sleep 8

This counts events for 8 seconds on the live process. The cycles line has a second column printed as # X.XXX GHz. That’s the clock frequency averaged over the window. This is how I saw 2.64 GHz on the JACK build vs 4.56 GHz on the PipeWire build. Instructions should be roughly the same on both builds for the same load, since the DSP code is identical.

perf record -p $(pidof scsynth) -g --call-graph=dwarf -o perf.data -- sleep 8
perf report -i perf.data --stdio -g none --sort=overhead,dso,symbol | head -30

Each line in the report says “this function was on the CPU in N % of samples.” On both builds I saw the same top-10 list in roughly the same proportions (VarSaw_next_k, LPF_next, etc.).

This tells you what core and what frequency the audio thread actually ended up on:

ps -L -p $(pidof scsynth) -o tid,psr,policy,rtprio,comm
grep MHz /proc/cpuinfo

I reset this aftwerwards :slight_smile: sudo sysctl kernel.perf_event_paranoid=4

Run this against both AUDIOAPI=jack and AUDIOAPI=pipewire builds of the same scsynth source at the same load, and compare. I’d be very curious to hear what you find.

The issue with SuperCollider’s uneven performance in JACK compared to other JACK clients has been brought up before. See this thread:

Perhaps there are benefits to the PipeWire rewrite of the backend that are not only the difference between JACK and PW but other improvements to the audio thread (e.g. calling in or out, locking some resource) that are also beneficial to the performance. Investigating this could lead to improvements to the current JACK backend as well. This makes Luc’s work doubly interesting in my opinion.

2 Likes

Thanks! Yes, this discussion (and the discussions with you about this discussion) was a principal motivation for the PipeWire backend.

I added a small script that makes profiling easier. Once you have compiled both binaries, you can run the script in tools/jack_signal_bench/forum_bench.sh (this one: supercollider/tools/jack_signal_bench/forum_bench.sh at pipewire-backend-experiment · lucdoebereiner/supercollider · GitHub). It performs the tests and writes a summary of the results into a markdown file. If anyone is experimenting with this, please paste your results here.

Here’s what I get:

scsynth_jack   n=  100  avg=  22.2%  peak_max=  27.7%  xruns=0
scsynth_jack   n=  500  avg=  38.3%  peak_max=  63.4%  xruns=0
scsynth_jack   n= 1000  avg=  71.2%  peak_max=  75.5%  xruns=0
scsynth_jack   n= 1500  avg= 110.2%  peak_max= 379.2%  xruns=8
scsynth_jack   n= 2000  avg= 159.1%  peak_max= 415.5%  xruns=6
scsynth_pw     n=  100  avg=  16.8%  peak_max=  26.2%  xruns=0
scsynth_pw     n=  500  avg=  33.4%  peak_max=  48.9%  xruns=0
scsynth_pw     n= 1000  avg=  44.4%  peak_max=  58.2%  xruns=0
scsynth_pw     n= 1500  avg=  64.7%  peak_max=  85.9%  xruns=0
scsynth_pw     n= 2000  avg=  83.2%  peak_max= 113.3%  xruns=1

Also, warning to anyone else who wants to try this, after installing libpipewire-0.3-dev, I can no longer build scsynth @ Version-3.14.1:

/home/dlm/share/superc-dev/server/scsynth/SC_Jack.cpp:38:14: fatal error: jackey.h: No such file or directory
   38 | #    include <jackey.h>
      |              ^~~~~~~~~~
compilation terminated.

So absolutely do not sudo make install in the process of running this test, because you might find yourself unable to revert your system-wide-installed scsynth back to the stable version. (My bad for doing that by habit… now I have A Problem for which I haven’t been able to find an answer.)

Anyone know how to fix that btw?

Update: I deleted the whole build directory and ran it again, seems OK now…? (Also, at that time, I had experimented with make -j4 for parallel build but maybe SC can’t be built that way – the current successful run is without parallel building.)

hjh

1 Like

Great! Thank you! This is really useful. PipeWire is consistently lower on CPU, between ~24 % at 100 voices to ~48 % at 2000. At 1500/2000 voices your JACK build is spiking a lot (peaks of 379 % / 415 %, 8 and 6 xruns) while PipeWire stays under 115 % with at most one xrun.

Two quick things would help me interpret it:

  1. Your CPU model: lscpu | grep “Model name”

  2. Also could you do a pinned run, so we can tell whether the gap is just the kernel putting the two backends on different cores. Easiest is forum_bench.sh (it does an unpinned and a taskset-pinned pass automatically).

It would be also interesting to compare with Supernova, which has its own Jack implementation.

1 Like

would be also interesting to compare with Supernova, which has its own Jack implementation.

yes, but that’s possibly got some thread priority issues (at least under pw-jack) -
https://scsynth.org/t/supernova-cannot-raise-thread-priority-kubuntu-studio/

I’d be happy to help out with some testing (given reasonably clear instructions; I’m still somewhat of a noob, but I might learn a few things) - primarily in july though, currently preparing a transatlantic move and it’s … not fun

At least on my machine, that proves definitively to be the case. The performance difference disappears when threads are pinned.

Here, I switched to the “performance” CPU governor, which I hadn’t done yesterday.

PipeWire vs JACK-shim — scsynth benchmark

  • CPU: 12th Gen Intel(R) Core™ i5-12500H
  • Logical CPUs: 16
  • Kernel: 6.8.0-111-lowlatency
  • PipeWire: pipewire
  • Load points: 500,1000,2000 voices of the default synthdef
  • Per-cell duration: 8 s, /status polled at 50 Hz
  • Channels: -i 0 -o 2

PipeWire vs JACK-shim — scsynth benchmark

  • Pinned sweep: taskset -c 0-3 applied to both backends

Default scheduling (unpinned)

voices PW avg PW peak max PW xruns JACK avg JACK peak max JACK xruns JACK − PW
500 21.1% 22.6% 0 33.8% 41.0% 0 +38%
1000 41.5% 67.5% 0 71.6% 77.2% 0 +42%
2000 87.1% 93.3% 0 150.7% 384.7% 6 +42%

Pinned to CPUs 0-3 (taskset)

voices PW avg PW peak max PW xruns JACK avg JACK peak max JACK xruns JACK − PW
500 21.3% 23.2% 0 20.0% 21.5% 0 -7%
1000 41.9% 46.2% 0 39.9% 46.6% 0 -5%
2000 84.4% 94.0% 0 87.9% 94.2% 0 +4%

The jack shim, when CPU-pinned, actually performs slightly better except at extremely high load.

I also reran the other script in performance mode, but it doesn’t tell us anything really different from what I posted yesterday.

That test isn’t quite ready to run: Luc said “supernova not ported, still goes through the libjack shim.”

I just tried; seems fine (audio goes through; I haven’t checked with a high-quality mic, though; I didn’t check latency).

One reason to go ahead would be to have better feature parity for audio device selection. I’ve seen it a few times where a new user in Linux gets confused about the fact that s.options.device doesn’t choose device in Linux. Certainly it would remove a little static if linux users could also do ServerOptions.devices and choose devices the same way that Mac and Windows users do.

hjh

What I meant was comparing supernova’s Jack implementation with the scsynth Pipewire implementation, just to see if there is any noticable difference in behavior. This might be particularly interesting because supernova pins the audio thread to a specific core.


Anyway, so I was right to question the performance claims. I think we can safely say that there is no notable difference as long as the threads are pinned, i.e. the Jack shim itself does not add any overhead.

Now my question is: why is there such a big and consistent difference without thread pinning?

Could it be that Pipewire intentionally pins the audio callback to a performance core, but not for the Jack shim?

I would also compare the thread priorities and look for any notable differences.

I briefly tried symlinking supernova → ./scsynth_jack and rerunning the forum benchmark, but it crashed and the test script aborted. I’m curious, but I’m not interested enough to troubleshoot.

hjh

In a way, it is a quite desirable result that the difference is not just PipeWire being better than JACK in general. Since this thread priority/performance behavior might be exploitable also in the JACK case earning us a free performance upgrade. Nice!

In the long run it looks to me like being a native PipeWire client and not rely on a translation layer for an architecture that an increasing number of Linux users aren’t even running anymore is a reasonable move. SuperCollider will of course have to support JACK for some time, but from what I can see, the Linux future is leaning increasingly towards PipeWire. The investigation in this thread can also be seen as supporting this in how the OS by default treats the PipeWire server as more performance critical than the JACK one. These kinds of benefits we want SuperCollider to enjoy.

I think @jamshark70 s suggestion about exposing the choice to Linux users using the device in Server.options is good. The more users actively test both backends the more we will learn about possible differences as they develop moving forward.

Dear Ludvig,

In the long run it looks to me like being a native PipeWire client and not rely on a translation layer for an architecture that an increasing number of Linux users aren’t even running anymore is a reasonable move.
I personally do see the reason for this in the default installation of
pipewire in many linux distributions but not as a the deliberate choice
of (pro audio) linux users. I don’t have any numbers on this though, and
possibly no one has.

The linux-audio-users mailing list might be a good place to discuss this.

[…]

SuperCollider will of course have to support JACK for some time, but from what I can see, the Linux future is leaning increasingly towards PipeWire. The investigation in this thread can also be seen as supporting this in how the OS by default treats the PipeWire server as more performance critical than the JACK one.
It might have slipped by me, but can you remind me again how the OS does
treat it as more performance critical?

Thanks!
P

I am not sure I am interpreting you correctly, but what I am suggesting is just to offer also a PipeWire backend on Linux in addition to the current JACK backend. The device option should probably default to JACK so that nothing changes out of the box.

It is of course just a guess what percentage of those running JACK are actually running through a pw-jack layer and into PipeWire. But with the work of @Luc_Doebereiner we have the option of offering a PW alternative and I think we should, if for no other reason than to learn more about what the differences are in real-world use cases. Unless doing so involves some large developer burden that I am missing.

The consensus of the thread seem to be that since when pinning the server processes to the same thread, the performance boost of the PipeWire server goes away. Therefore, the PipeWire server is either asking to be treated differently or is somehow by the OS or PipeWire itself put on some higher performing thread. @Spacechild1 understands these things better than I, he posits:

Hi Ludvig, list,

thanks for your kind reply. It could indeed be interesting to take this
over to pd-list as well for comparision. I do not know how feasible a
native pipewire audio backend would be for Pd (but again, Christof can
possibly comment on this).

best, P

1 Like

In a way, it is a quite desirable result that the difference is not just PipeWire being better than JACK in general. Since this thread priority/performance behavior might be exploitable also in the JACK case earning us a free performance upgrade. Nice!

That’s what I was thinking as well.

The investigation in this thread can also be seen as supporting this in how the OS by default treats the PipeWire server as more performance critical than the JACK one.

That’s what I have guessed, but we need more evidence. If that turns out to be really the case, then it is very likely a bug in the Jack shim. Of course, we can try to work around it in scsynth, but ideally it should be fixed by the Pipewire team because it likely affects many other existing Jack clients.

I think at this point we should reach out to the Pipewire team because they probably have a better idea what’s going on.


To be clear: I’m not arguing for or against a native Pipewire backend in scsynth/supernova. IMO we can only really discuss this after the current performance degradation has been resolved.