Spectral analysis of buffer to determine if it is monophonic of polyphonic

I am trying to find a way to determine (or maybe more realistically, estimate) wether a buffer holds mono- or polyphonic material. I am not trying to extract the fundamentals, I just need to be able to determine if the buffer holds a single note (monophonic) or several notes (polyphonic). I have had a look at some of the Fluid Ugens, like FluidBufChroma but I am not sure how to use the information produced by the analysis to solve this.

I need this to work for electric guitar signals which are very harmonic, with only a little bit of in-harmonic information at the attack phase of the signal. Here I am stacking sinewaves to emulate the sound of the guitar although in reality the amplitudes decay differently from this abstraction. ]

How could I go about this?

// Create two buffers for testing - a is monophonic - b is polyphonic
(
a = Buffer.alloc(s, 0.5 * s.sampleRate);
b = Buffer.alloc(s, 0.5 * s.sampleRate);

{
	var chord = [57, 61, 64]; // A major
	var env = Env.perc(0.2, 0.3).ar;
	var tone = {|midinote| 8.collect{|i| SinOsc.ar(midinote.midicps * (i + 1), 0, 0.1 * 1/(i+1)) }.sum };
	RecordBuf.ar([chord[0]].collect(tone.(_)).sum * env, a, loop: 0, doneAction: 2);
	RecordBuf.ar(chord.collect(tone.(_)).sum * env, b);
	0
}.play
)

// inspect
a.play // the note A
b.play // the chord A major
{ [PlayBuf.ar(1, a), PlayBuf.ar(1, b) ] }.plot(0.5)
// how to determine which one is polyphonic and which is monophonic?

FYI, flucoma stuff might be better asked over here: https://discourse.flucoma.org/

1 Like

Yes, I will try that but I was also wondering if some of SCs own fft classes could be usedā€¦

I think my attempt is very primitive (you should also evaluate several codes separately) and may not be accurate, but it seems to work:

(
{
	var sig = PlayBuf.ar(1, a);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
o = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioA = msg[3..3+3] };
}, '/strongFreqs');
)
o.free;

(
{
	var sig = PlayBuf.ar(1, b);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
q = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioB = msg[3..3+3] };
}, '/strongFreqs');
)
q.free;

(~ratioA/~ratioA[0]).round(0.1) == (1.0 .. 4.0) // true: monophonic
(~ratioB/~ratioB[0]).round(0.1) == (1.0 .. 4.0) // false: polyphonic

Thanks for the code. When I run the code these two tests both answer ā€˜falseā€™ - did you get it to answer true?

(~ratioA/~ratioA[0]).round(0.1) == (1.0 .. 4.0) // true: monophonic
(~ratioB/~ratioB[0]).round(0.1) == (1.0 .. 4.0) // false: polyphonic

Another analysis that you might take a look at is the confidence value from FluidBufPitch.

I would try approaching this from a validation perspective, by which I mean, rather than imitating the sound with stacked sine tones, build a data set of ā€œpolyphonic recordingsā€ and ā€œmonophonic recordingsā€ that way you can test a few different analysis on many different examples and see what works best. That way the examples will be more likely to generalize to your use case. Something like in the videos below (sorry the demos are mostly in Max).

(replace the idea of ā€œtimbreā€ with ā€œis it poly- or mono- phonicā€)

1 Like

Looking forward to watching the videos. Yes, training a model to detect chords vs. single notes would definitely be a good way to get started with ML. From the little I know about the subject, it seems this would be a perfect problem for a neural network. I watched the one about picking out an oboe and a trombone (I think) a while ago, I think that was done in SC right? So I guess I could extract some concepts of how to set up ML in SC from that video or do you have other or better suggestions for SC resources? I also just joined the Flucoma Discord forum which will probably be helpful in the future.

My result is as above. Could you test again the code below?

( // step 1: your code
a = Buffer.alloc(s, 0.5 * s.sampleRate);
b = Buffer.alloc(s, 0.5 * s.sampleRate);
{
	var chord = [57, 61, 64]; // A major
	var env = Env.perc(0.2, 0.3).ar;
	var tone = {|midinote| 8.collect{|i| SinOsc.ar(midinote.midicps * (i + 1), 0, 0.1 * 1/(i+1)) }.sum };
	RecordBuf.ar([chord[0]].collect(tone.(_)).sum * env, a, loop: 0, doneAction: 2);
	RecordBuf.ar(chord.collect(tone.(_)).sum * env, b);
	0
}.play
)

( // step 2: get info from a
{
	var sig = PlayBuf.ar(1, a);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
o = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioA = msg[3..3+3] };
}, '/strongFreqs');
)

o.free;  // step 3: free OSCFunc

(  // step 4: get info from b
{
	var sig = PlayBuf.ar(1, b);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
q = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioB = msg[3..3+3] };
}, '/strongFreqs');
)

q.free; // step 5: free OSCFunc

(~ratioA/~ratioA[0]).round(0.1) == (1.0 .. 4.0) // true: monophonic // step 6-1: test a

(~ratioB/~ratioB[0]).round(0.1) == (1.0 .. 4.0) // false: polyphonic // step 6-2: test b

I think this will only work in very limited casesā€¦

I just tried again at both 44100 and 48000 (not that it should matter), same result, both false. I think that the ML idea as mentioned above might be the best way forward even though it is a bigger plunge than I had hoped for:) Thanks for the effort.

1 Like

Then my code should be wrong. I will recheck it. I hope I can find the problem.

Yes, the ML idea seems to be the best way!

1 Like

@Thor_Madsen
I experienced the same phenomenon. To check it, I did the following:

( // step 1: your code
a = Buffer.alloc(s, 0.5 * s.sampleRate);
b = Buffer.alloc(s, 0.5 * s.sampleRate);
{
	var chord = [57, 61, 64]; // A major
	var env = Env.perc(0.2, 0.3).ar;
	var tone = {|midinote| 8.collect{|i| SinOsc.ar(midinote.midicps * (i + 1), 0, 0.1 * 1/(i+1)) }.sum };
	RecordBuf.ar([chord[0]].collect(tone.(_)).sum * env, a, loop: 0, doneAction: 2);
	RecordBuf.ar(chord.collect(tone.(_)).sum * env, b);
	0
}.play
)

( // step 2: get info from a
x = {
	var sig = PlayBuf.ar(1, a);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
o = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioA = msg[3..3+3] };
}, '/strongFreqs');
)

x.free; o.free;  // step 3: free OSCFunc and x.

~ratioA // -> [220.26164245605, 440.31237792969, 660.43664550781, 880.27203369141]

(  // step 4: get info from b
y = {
	var sig = PlayBuf.ar(1, b);
	var n = 4;
	var f = FluidSineFeature.kr(sig, numPeaks:n, order: 1);
	SendReply.kr(Impulse.kr(2), '/strongFreqs', f);
}.play;
q = OSCFunc({ |msg| 
	if (msg[3..3+3].sum != 0) { ~ratioB = msg[3..3+3] };
}, '/strongFreqs');
)

y.free; q.free; // step 5: free OSCFunc and y.

~ratioB // -> [237.49104309082, 349.51354980469, 565.98101806641, 435.85006713867]

~ratioA //-> [237.49104309082, 349.51354980469, 565.98101806641, 435.85006713867]
// <- it should be [220.26164245605, 440.31237792969, 660.43664550781, 880.27203369141]
// why is ~ratioA changed?

There was a disturbance when performing step 4 or 5. It changed the value of ~ratioA.

I repeated the process by re-reading the code, but I could not find any problem in the code above.

Then I recompiled sclang. The problem is solved and I cannot reproduce it anymore. It is really strange. I can only reproduce it occasionally, but I do not know exactly when, how and why.

1 Like

@tedmoore - I watched the old trombone vs. oboe video and also studied the help file for FluidMLPClassifier which seems to be an updated example of the original video from 2021, which is great btw. In the original video you used FluidMFCC but in help file you are using FluidBufPitch, why not FluidBufMFCC?

And a couple of questions regarding the example in FluidMLPClassifier help doc:

  1. After training the nn, what is the syntax to get a prediction for a new previously untested buffer?
    I tried:
FluidBufPitch.processBlocking(s, ~tbone, features: ~pitch_features, windowSize:2048); // reusing the ~tbone buffer for testing
~mlp.predictPoint(~pitch_features, {|label| label.postln});

but I get

ERROR: FluidMLPClassifier - Wrong Point Size

  1. When replacing
    FluidBufPitch.processBlocking(s,src,features: ~pitch_features,windowSize:2048);
    with
    FluidBufMFCC.processBlocking(s,src,features: ~pitch_features,windowSize:2048, startCoeff: 1);

and trying to run

~dataSet.dump({
	arg datadict;
	~labelSet.dump({
		arg labeldict;
		defer{
			FluidPlotter(dict:datadict).categories_(labeldict);
		};
	});
});

I get an error, which I assume has to do with needing to turn the MFCC into a 2D array. I canā€™t figure out the syntax.

3). In the help file it says ā€˜// I happen to know there are 252 frames in this bufferā€™. What if you donā€™t this in advance, which syntax would you use? I tried changing .proccesBlocking to .process(ā€¦).wait and wrapping the whole thing in a routine. Then instead of 252.do{ā€¦} I did ~pitch_features.numFrames do{ā€¦} and also changing FluidBufFlatten.processBlocking to FluidBufFlatten.process(ā€¦).wait;

This works but it is very slow compared to using .processBlocking. What would be the best way if you donā€™t know the number of frames before hand? Can you calculate the number of frames before hand based on the number of frames in the buffer being analyzed?

Sorry for the barrage of question:)

I donā€™t remember why exactly, but the MLPClassifier can work with any input, so maybe it serves as a good example of trying different analyses to see what works best!

No worries at all. Iā€™ll be glad to help.

Can you post the whole of the code that youā€™re running? Itā€™ll help me give better answers! (I realize itā€™s a modification/extension of the example, but still.)

The ML approach does seem like a decent path forward for this. However, just to clarify, when I said,

It doesnā€™t necessarily need to involve machine learning! Just trying a few different analysis and looking at them on a plot might show that the raw analyses themselves contain the information youā€™re looking for!

Adding ML to this is like asking the neural network to find those distinctions for you (although plotting and inspecting oneā€™s data is pretty much never a bad idea).

When I examine a single note vs. a chord of in this case 4 notes through the freqscope, it is pretty easy to spot the difference, here two snap shots at random times in the duration of each buffer. The picture of the single notes clearly displays the harmonic series and in case of the chord, it is pretty clear that the picture represents more than one harmonic series.

Single note:

4 part chord:

I will keep digging into it.

1 Like

Yes! My code snippet is exactly using this!

~ratioA // -> [220.26164245605, 440.31237792969, 660.43664550781, 880.27203369141]

~ratioB // -> [237.49104309082, 349.51354980469, 565.98101806641, 435.85006713867]

(~ratioA/~ratioA[0]).round(0.1) == (1.0 .. 4.0) // true: monophonic // step 6-1: test a

(~ratioB/~ratioB[0]).round(0.1) == (1.0 .. 4.0) // false: polyphonic // step 6-2: test b

Just eyeballing thisā€¦ you might see if the spectral kurtosis analysis from FluidBufSpectralShape distinguishes these wellā€¦ (more info on kurtosis here too)

And as we were saying MFCCs seems like a plausible route.

It seems I can just use FluidBufPitch and look at the pitch confidence, I donā€™t know why I havenā€™t thought of this before! Last night I tested this approach on a pool of notes and chords played on guitar and with a pitch confidence threshold at 0.6 I was able to weed out all the chords (important) with only a couple of single notes being filtered out (which is ok). On my original buffer examples (at the top of the page), the pitch confidence is 0.73 for the single note and 0.35 for the chord just to illustrate and even a perfect fifth (midinotes [57, 64]) produces a low confidence value, which is good and mildly surprising.

1 Like

Yes, I will post my questions regarding ML in a new thread with the testing code included. Slight OT question: how is the demo flucoma code run in a browser, like https://learn.flucoma.org/reference/mfcc/explain/ ? Is that max code or SC code or something else under the hood?