Scsynth behaves differently on macOS ARM vs. Intel builds

Hey there,

I’m noticing some very strange behaviour (or missing something?): The following piece of code behaves differently on my two Mac Computers:

(
var blockSize = 32; // = 1 works on Intel, 32 doesn't
var tempo = 120/60;
var hexStep = { |chars, div=4|
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	phase = phase * div;
	repetitions = (phase.trunc/seq.size).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	step
};

s.options.blockSize_(blockSize);
s.waitForBoot({
	{ hexStep.('1', 4) }.plot(4);
	5.wait; s.quit;
});
)

When I run this on my 2012 MacBookPro, SC 3.14.1, MacOS 13.7.4, I get this (incorrect) output:

When I change the blockSize to 1 on the same machine, I get this (correct) plot:

However I cannot recreate the behaviour on my newer M2 machine (also 3.14.1, Mac OS 14.4.1). I always get the correct plot #2, no matter the blockSize. Samplerate was 48k, hardwareBufferSize 512 in all tests.
I also ran the same example on a Mac OS10.13 Intel machine with SC 3.13, which had the same behaviour as the other Intel Mac.

Does anybody know what could be going on? I have struggled quite a lot with getting the Select.ar-lookup right and preventing the jumping output on the first plot and can’t wrap my head around why this should have anything to do with the blockSize in the first place. Any hints would be greatly appreciated! :slight_smile:

All best and thank you for reading this lengthy post!
moritz

Would you mind converting this to a synthdef and posting the output of .dumpUGens on both systems, they should be the same? This way we can tell if the problem lies in sclang or the audio server.

1 Like

Yes, of course! The SynthDef would be:

(
var tempo = 120/60;
x = SynthDef(\test, { 
	var chars = '1', div = 4;
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	phase = phase * div;
	repetitions = (phase.trunc/seq.size).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	step.poll
});
x.dumpUGens
)

Output on the M2 machine:

test
[ 0_SampleRate, scalar, [  ] ]
[ 1_/, scalar, [ 2.0, 0_SampleRate ] ]
[ 2_Phasor, audio, [ 1, 1_/, 0, 4, 0.0 ] ]
[ 3_*, audio, [ 2_Phasor, 4 ] ]
[ 4_trunc, audio, [ 3_*, 1 ] ]
[ 5_/, audio, [ 4_trunc, 4 ] ]
[ 6_trunc, audio, [ 5_/, 1 ] ]
[ 7_mod, audio, [ 3_*, 4 ] ]
[ 8_floor, audio, [ 7_mod ] ]
[ 9_K2A, audio, [ 0 ] ]
[ 10_K2A, audio, [ 0 ] ]
[ 11_K2A, audio, [ 0 ] ]
[ 12_K2A, audio, [ 1 ] ]
[ 13_Select, audio, [ 8_floor, 9_K2A, 10_K2A, 11_K2A, 12_K2A ] ]
[ 14_+, audio, [ 13_Select, 6_trunc ] ]
[ 15_Impulse, audio, [ 10, 0 ] ]
[ 16_Poll, audio, [ 15_Impulse, 14_+, -1, 18, 85, 71, 101, 110, 40, 66, 105, 110, 97, 114, 121, 79, 112, 85, 71, 101, 110, 41 ] ]

Output on the Intel MacBook Pro:

test
[0_SampleRate, scalar, []]
[1_/, scalar, [2.0, 0_SampleRate]]
[2_Phasor, audio, [1, 1_/, 0, 4, 0.0]]
[3_*, audio, [2_Phasor, 4]]
[4_trunc, audio, [3_*, 1]]
[5_/, audio, [4_trunc, 4]]
[6_trunc, audio, [5_/, 1]]
[7_mod, audio, [3_*, 4]]
[8_floor, audio, [7_mod]]
[9_K2A, audio, [0]]
[10_K2A, audio, [0]]
[11_K2A, audio, [0]]
[12_K2A, audio, [1]]
[13_Select, audio, [8_floor, 9_K2A, 10_K2A, 11_K2A, 12_K2A]]
[14_+, audio, [13_Select, 6_trunc]]
[15_Impulse, audio, [10, 0]]
[16_Poll, audio, [15_Impulse, 14_+, -1, 18, 85, 71, 101, 110, 40, 66, 105, 110, 97, 114, 121, 79, 112, 85, 71, 101, 110, 41]]

The two seem to be identical if you remove whitespaces.
I could maybe work on a simpler/better reproducer on Sunday? (-:
Thank you for looking into this!

Often, when debugging this type of issue, it’s useful to expose intermediate signals – that is, there’s an assumption about what they should be, but the assumption might not be true.

If I change the last line of your plot example to [index, rawStep, repetitions, step], I get (Intel ThinkBook running Linux):

index and rawStep are fine, but repetitions updates one index later than you expect… let’s do a bit polling.

(
var blockSize = 64; // = 1 works on Intel, 32 doesn't
var tempo = 120/60;
var hexStep = { |chars, div=4|
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	var pt;
	phase = phase * div;
	pt = phase.trunc;
	repetitions = (pt / seq.size).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq.debug("sequence")));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	[phase, pt, pt / seq.size, repetitions].poll(Changed.ar(pt));
	[index, rawStep, repetitions, step]
};

s.options.blockSize_(blockSize);
s.waitForBoot({
	{ hexStep.('1', 4) }.plot(4);
	5.wait; s.quit;
});
)

// and in the midst of that output, there's:
UGen Array [0]: 4
UGen Array [1]: 4
UGen Array [2]: 1
UGen Array [3]: 0  -- uh. what.

So (phase.trunc / seq.size) must be very slightly less than 1, truncating to 0.

OK, time for the big guns.

(
var blockSize = 64; // = 1 works on Intel, 32 doesn't
var tempo = 120/60;
var hexStep = { |chars, div=4, buf|
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	var pt, writeTrig, writeIndex;
	phase = phase * div;
	pt = phase.trunc;
	repetitions = (pt / seq.size).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq.debug("sequence")));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	writeTrig = Changed.ar(pt);
	writeIndex = PulseCount.ar(writeTrig);
	BufWr.ar(
		Latch.ar([phase, pt, pt / seq.size, repetitions], writeTrig),
		buf, writeIndex, 0
	);
	[index, rawStep, repetitions, step]
};

s.options.blockSize_(blockSize);
s.waitForBoot({
	var cond = CondVar.new;
	b = Buffer.alloc(s, 1000, 4);
	s.sync;
	{ hexStep.('1', 4, b) }.plot(4);
	5.wait;
	fork {
		b.getToFloatArray(action: { |data|
			d = data;
			cond.signalAll;
		});
	};
	cond.wait;
	s.quit;
	d.postln;
});
)

// improvised exploratory code...

d[16..19]  // the droids we're looking for
-> FloatArray[4.0, 4.0, 0.99999994039536, 0.0]

e = FloatArray[4.0, 4.0, 1.0, 1.0];  // expected

(
f = { |float| float.as32Bits.asBinaryDigits(32).clump(8).collect(_.join).join(" ") };
(16..19).do { |i|
	"index = %\n".postf(i);
	f.(d[i]).post; "\tactual".postln;
	f.(e[i-16]).post; "\texpected".postln;
	"".postln;
};
)

index = 16
01000000 10000000 00000000 00000000	actual
01000000 10000000 00000000 00000000	expected

index = 17
01000000 10000000 00000000 00000000	actual
01000000 10000000 00000000 00000000	expected

index = 18
00111111 01111111 11111111 11111111	actual
00111111 10000000 00000000 00000000	expected

index = 19
00000000 00000000 00000000 00000000	actual
00111111 10000000 00000000 00000000	expected

So we’re fine right up to the division, and for some reason Intel botches it. (Which is not the first time Intel has botched division, IIRC.)

(It’s possible, though, that sclang’s conversion to double and then reconversion to 32 bits might obscure something wrong at i = 16 or 17… probably need to write the buffer to disk and pull bytes manually to bypass that, but I’m out of time for now.)

Workaround, I guess, would be to round rather than trunc-ing, or round to some moderate precision and then trunc: repetitions = (phase.trunc / seq.size).round(0.001).trunc; seems to do it.

hjh

1 Like

Might not be relevant but trunc is currently broken and due floor when the value is negative.

Or rather, there was a debate about what the behavior should be where one side of the debate was that it’s broken…

… but in this case, all the values are >= 0.

hjh

Thank you so much for digging in @jamshark70!
After a bit of trial and error, I think I just found the source of the inconsistency in the line repetitions = (phase.trunc/seq.size).trunc. In my example seq.size is control rate (right?). If I change it to audio rate, everything works as expected, no matter the blockSize (or computer):

(
var blockSize = 32;
var tempo = 120/60;
var hexStep = { |chars, div=4|
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	phase = phase * div;
	repetitions = (phase.trunc/K2A.ar(seq.size)).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	step
};

s.options.blockSize_(blockSize);
s.waitForBoot({
	{ hexStep.('1', 4) }.plot(4);
	5.wait; s.quit;
});
)

If anybody knows why this behaves differently on different computers, I’d be very interested :nerd_face:

All best and thanks so much for spending so much time on this!
moritz

I’d try to run this with the same explicit hardwareBufferSize on both machines to confirm if that’s what causes different behavior.
Marcin

1 Like

Here are further results with SC 3.15.0-dev (latest build):

  • On an M1 Max machine (MacBook Pro 2021) running macOS 15.7.3:
    • Running macOS natively: the difference is observed.
    • Running operating systems via Parallels Desktop on the same machine shows mixed behaviour:
      • Windows 11 via Parallels Desktop: no difference.
      • macOS 26.2 via Parallels Desktop: difference.

From this test, it appears that the difference may not be attributable to the ARM versus Intel architecture… ??

2 Likes

One additional route that could be investigated: Is there a difference between 3.14 using an intel-only build on apple silicon which would be emulated via Rosetta-2 and a apple-silicon native build?

We published a legacy build for 3.14.1 which is x64 only.

2 Likes

I just tried running the 3.14.1 legacy build on my M2 MacBook Pro - still getting the correct output (which differs from what I get on the Intel machine if I run my original example.

This sounds interesting @prko! By “difference observed” you mean a result like plot #2 and “no difference” you mean a result like plot #1 I suppose? :slight_smile:

All best,
moritz

yes! This is what I wanted to say.

I could run SC3.15.0-dev via Rosetta-2 on macOS 15.7.3:

(
var blockSize = 1;
var tempo = 120/60;
var hexStep = { |chars, div=4|
	var phase = Phasor.ar(1, tempo/SampleRate.ir, 0, 4);
	var c = 0;
	var table = (0..15).collect(_.asBinaryDigits).collect(_[4..7]);
	var seq = chars.asString.asList
	.collect{|char| table.at( ("0x"++char).interpret) }.flatten
	.collect{|bool| c = c + bool };
	var length, step, repetitions, trig, index, rawStep;
	phase = phase * div;
	repetitions = (phase.trunc/K2A.ar(seq.size)).trunc;
	index = (phase).mod(seq.size).floor;
	rawStep = Select.ar(index, K2A.ar(seq));
	step = rawStep + (repetitions * seq.last);
	trig = Changed.ar(step);
	step
};

s.options.blockSize_(blockSize);
w = Window(blockSize.asString);
w.front;
s.waitForBoot({
	{ hexStep.('1', 4) }.plot(4, parent:w);
	5.wait; s.quit;
});
)