Note-Based FFT Pitch-Shift

toneburst · June 19, 2021, 6:39pm

Anyone have any example code for an FFT/PV-based pitch-shift effect, where pitch of the resynthesised signal can be shifted up/down on a musical scale (ie by MIDI notes)?

I’m thinking this UGen would be worth looking at:

https://doc.sccode.org/Classes/PV_BinShift.html

I’m guessing a simple offset of the iFFT partials won’t work well, as the partial relationships will be incorrect, but I don’t have a conceptual grasp of how to calculate the stretch factor and shift amount for MIDI notes above/below a notional “original pitch” note (“ie C3”), and if this can be done without first calculating the pitch of the input signal.

Anyone any thoughts?

dietcv · June 20, 2021, 5:49pm

(
x = {
        var in, out, amp, f0=400, fftSize=8192, winLen=2048, hopFrac=0.5,
        chain, mexp, fScaled, df, binShift, phaseShift,
        // Sine window on input and output (WOLA):
        inWinType=0, outWinType=0;
        // Hann window on input, nothing (rect) on output (sounds close to me):
        // inWinType=1, outWinType=0-1;
        // Rectangular windows in and out: inWinType=0-1, outWinType=0-1;
        amp = MouseX.kr(-60,10).dbamp;
        // amp = 0.1; // when MouseX used for fundamental
        in = SinOsc.ar(f0,0,amp); // API: freq, phase, mul
        // API: FFT(buffer, input, hop(0.5), wintype(0=sine), active, winsize)
        chain = FFT(LocalBuf(fftSize), in, hopFrac, inWinType, 1, winLen);
        mexp = MouseY.kr(-1.0,1.0); // exponent for one octave up or down
        mexp = mexp*(1-MouseButton.kr); // press mouse to hear original freq
        // mexp = 0; // no-scale test
        fScaled = f0 * (2.0 ** mexp);
        df = fScaled - f0;
        binShift = fftSize * (df / s.sampleRate);
        // v3.5.3+:
        chain = PV_BinShift(chain, stretch:1, shift:binShift, interp:1);
        // Also try no BinShift at all to hear the phaseShift alone below:
        phaseShift = 2 * pi * binShift * hopFrac * (winLen/fftSize);
        // The integrate argument hopefully appears in SC-v3.5.3:
        chain = PV_PhaseShift(chain, phaseShift, integrate:1);
        out = IFFT(chain,outWinType,winLen);
        Out.ar(0, out.dup);
}.play
)

this is from a stanford workshop with phase-correction

fmiramar · June 20, 2021, 7:07pm

As all pitchshifters tends to have its drawbacks, because pitch is a perceptual, thus blurry, category, it worth trying several approaches to check what sounds best for your goal.

A granular approach can be this:
http://sccode.org/1-570

And you can substitute the PitchShift for PitchShiftPA, a quark which aims to improve the formant preservation on monophonic signals. Changing the pitch detection algorithm (if used) also tends to affect the results.

fmiramar · June 20, 2021, 7:35pm

There is also this implementation, which is more related to commercial autotuners:

I’ve tried to compile it on windows and linux. On windows, I could finish the compilation process. On Linux, although it created the .so UGen file with no big warnings, SC could not boot because it was complaining about this file. (I guess it is something related to the import of Pd FFT libraries, maybe some dev could help solving)

Maybe the quickest alternative is to install the LADSPA plugin and load it using VSTPlugin :

http://tombaran.info/autotalent.html

toneburst · June 20, 2021, 8:34pm

Thanks very much everyone. I will work my way through your responses and see what works best.

I probably should have given bit more context. I’m looking to combine pitch-shift with the ability to manually (of with MIDI CC etc.) “scrub” through FFT data in a buffer, giving the ability to re-pitch and re-time pre-recorded audio.

Thanks to the kind assistance of forum members, I’ve got the basis for the audio>FFT>buffer and scrubbing > iFFT setup, so my next task was to try the pitch-shifting part.

toneburst · June 20, 2021, 9:44pm

@dietcv what’s the significance of the value of “f0” here? Is it related to “fftSize” and “winLength”?

UPDATE: doh! Obviously f0 is the base frequency of the sine osc.

That does lead me to wonder though, if it’s possible to meaningfully pitch-shift without knowing the original pitch of the signal to be shifted.

dietcv · June 21, 2021, 6:45pm

hey, there is a difference between pitch-shifting and pitch-scaling.
for pitch shifting you need to know the fundamental frequency, f0 could be tracked with Tartini.ar for example, but which is kind of a bottle neck for the whole process to work accordingly. it really depence on the source. its only usable for monophonic signals. this whole process of pitch detection is kind of a thing on its own. there are other state of the art algorithms implemented in PitchShiftPA for example as mentioned above.
without knowing the fundamendal frequency you can pitch scale to octaves by multiplying the bin-number of each FFT bin by the appropriate power of 2.

b = Buffer.read(s, Platform.resourceDir +/+ "sounds/a11wlk01.wav");
b.play;

(
x = {
    var in, amp, fftSize=8192, winLen=2048, hopFrac=0.25,
        chain1, chain2, chain3;
    amp = MouseX.kr(-60,10).dbamp;
    in = amp * PlayBuf.ar(1,b.bufnum,BufRateScale.kr(b.bufnum),loop:1);
    chain1 = FFT(LocalBuf(fftSize), in, hopFrac, 0, 1, winLen);
    chain2 = PV_Copy(chain1,LocalBuf(fftSize));
    chain2 = PV_BinShift(chain2, stretch:2, shift:0, interp:1);
    chain3 = PV_Copy(chain1,LocalBuf(fftSize));
    chain3 = PV_BinShift(chain3,stretch:0.5,shift:0,interp:1);
    chain2 = PV_Add(chain2,chain1);
    chain3 = PV_Add(chain3,chain2);
    Out.ar(0, IFFT(chain3,0,winLen).dup);
}.play;
)

dietcv · June 21, 2021, 6:56pm

(
s.waitForBoot({

	s.sync;

	// the frame size for the analysis - experiment with other sizes (powers of 2)
	~windowSize = 2048;
	// the hop size
	~hopSize = 0.25;
	// Hann window
	~winType = 1;
	//different playback rates
	~rates = (1..12) * 0.1 + 0.4;

	~fftbuffers = Dictionary.new;
	PathName(~fftpath).entries.do {
		arg subfolder;
		var soundfile, data;

		~fftbuffers.add(
			subfolder.folderName.asSymbol ->
			Array.fill(
				subfolder.entries.size,
				{
					arg i;
					// read size of Soundfiles
					soundfile = SoundFile.openRead(subfolder.entries[i].fullPath);

					protect {
						(
							pvRecBuf: ~rates.collect { |rate, i|
								Buffer.alloc(s, (soundfile.duration / rate).calcPVRecSize(~windowSize, ~hopSize))
							},

							sndBuf: Buffer.read(s, subfolder.entries[i].fullPath)
						)
					} { |err|
						if(err.notNil) {
							"Error opening '%'".format(subfolder.entries[i].fullPath.basename).warn;
						};
						soundfile.close
					};
				}
			);
		);
	};

// this does the analysis and saves it to ~recBufs... frees itself when done (in addition uses rate)
SynthDef(\pvrec, {
	arg recBuf, soundBufnum, rate = 1;
    var in, chain;
    Line.kr(1, 1, BufDur.kr(soundBufnum) / rate, doneAction: 2);
    in = PlayBuf.ar(1, soundBufnum, rate * BufRateScale.kr(soundBufnum), loop: 0);
    chain = FFT(LocalBuf(~windowSize), in, ~hopSize, ~winType);
    chain = PV_RecordBuf(chain, recBuf, 0, 1, 0, ~hopSize, ~winType);
	}).add;

	s.sync;

~fftbuffers.do { |subdirArray|
	subdirArray.do { |buf|
		~rates.do { |rate, i|
			Synth(\pvrec, [
				\recBuf, buf.pvRecBuf[i],
				\soundBufnum, buf.sndBuf,
				\rate, rate
			]);
		};
	};
};

	s.sync;

	"fft analysis done".postln;

});
)

(
SynthDef(\pvmouse, {
	arg out=0, recBuf=1, fftSize=2048;
	var in, chain, bufnum;
	bufnum = LocalBuf.new(fftSize);
	chain = PV_BufRd(bufnum, recBuf, MouseX.kr(0, 1));
	Out.ar(out, IFFT(chain).dup);
}).add;
)

a = Synth(\pvmouse, [\recBuf, ~fftbuffers[\subfoldername][0][\pvRecBuf]]);

you can use the code above for fft analysis of audio files, store them in a dictionary and scrub them later via accessing the buffers in a Pbind for example

toneburst · June 22, 2021, 9:25am

Thanks very much for the example. @jamshark70 proposed a slightly different solution which seems to work, so I’ve been working with that, but I may try your method, too.

I actually worked with SuperCollider quite intensively for a period last year, but seem to have forgotten everything, now, so I feel like I’m starting from scratch again. I’m excited by the possibilities again, though!

toneburst · June 22, 2021, 9:30am

Thanks for the tips. Auto-tune isn’t really what I’m after right now. I’m more interested in re-pitching resynthesised audio from a MIDI keyboard as a textural thing, somewhat like this Max4Live effect.

Having said that, I do like Autotune, so I may well look into this in the future.

dietcv · June 22, 2021, 12:19pm

i would be interested in your outcome rebuilding the max4live effect

toneburst · June 22, 2021, 12:38pm

I’m not intending to re-create all the functions of the effect. The core idea of arbitrary scrubbing through a buffer of FFT data, and re-pitching the result in some kind of musically meaningful way is what I’m interested in right now.

toneburst · June 22, 2021, 12:45pm

OK, makes sense.

I see.

Is it possible to pitch-scale for notes within an octave, though?

I don’t know if this is possible, but I’d ideally like, if there is a clearly audible fundamental pitch in the original signal, that it would track MIDI notes when transposed from a keyboard, while more-or-less maintaining the harmonic relationships between frequencies within the signal.

I don’t think the M4L effect I link to above does track pitch of the original audio.

Perhaps this is a job better achieved with time-domain pitch-shifting methods, applied to the resynthesised signal, though.

dietcv · June 23, 2021, 12:03am

https://medias.ircam.fr/xdf671a
maybe you are interested: there are several max objects presented in here for srubbing, freezing and adding several partials.
I would also be interested in something similiar in supercollider probably with pitch tracking of f0 with Tartini and resynthesis maybe with SMS and adding different partials using Pitchshifting i dont know. maybe somebody else has an idea to start from.

toneburst · June 23, 2021, 10:50am

I managed to add MIDI control to my PV scrubbing patch, and working on the pitch-scaling now.

Some basic FFT/PV_BinShift concepts I’m not clear on.

The bin-shift value: is this in Hz?

I’ve been assuming the ‘shift’ value basically shifts the frequencies of the sine-wave bank that resynthesises the FFT data up/down. Is this correct?

Or does it offset the bins such that FFT magnitudes/phases for eg bin 1 of the analysis data control partial 2 etc.of the iFFT?

If yes, then I guess the shift value represents the offset, in bins.

toneburst · June 23, 2021, 2:22pm

I got some OK results using a time-domain pitch-shifting UGen to post-process the iFFT.

// Thread here:
// https://scsynth.org/t/ifft-freeze-scrub/3802

(
var resultbuf, inbuf;
var fftSize = 1024;

p = Platform.resourceDir +/+ "sounds/a11wlk01.wav";
q = "~/pvtest.wav".standardizePath;

// get duration
f = SoundFile.openRead(p);
f.close;
f.duration;  // 4.2832879818594

z = Server(\nrt, NetAddr("127.0.0.1", 57110),
	ServerOptions.new
	.numOutputBusChannels_(2)
	.numInputBusChannels_(2)
	.sampleRate_(44100)
);

inbuf = Buffer(z, 65536, 1);
resultbuf = Buffer(z, f.duration.calcPVRecSize(fftSize, 0.5, z.options.sampleRate), 1);

x = Score([
	[0, inbuf.allocMsg],
	[0, resultbuf.allocMsg],
	[0, inbuf.readMsg(p, leaveOpen: true)],
	[0, [\d_recv, SynthDef(\pv_ana, {
		var sig = VDiskIn.ar(1, inbuf, f.sampleRate / SampleRate.ir);
		var fft = FFT(LocalBuf(fftSize, 1), sig);
		fft = PV_RecordBuf(fft, resultbuf, run: 1);
		Out.ar(0, sig);
	}).asBytes]],
	[0, Synth.basicNew(\pv_ana, z).newMsg],
	[f.duration + (fftSize / z.options.sampleRate),
		resultbuf.writeMsg(q, "wav", "float")
	]
]);

x.recordNRT(
	outputFilePath: if(thisProcess.platform.name == \windows) { "NUL" } { "/dev/null" },
	headerFormat: "wav", sampleRate: z.options.sampleRate,
	options: z.options,
	duration: f.duration + (fftSize / z.options.sampleRate),
	action: { "encoded file".postln }
);

z.remove;
)


s.boot;

(
// this is the pv_rec result
b = Buffer.read(s, q);

SynthDef(\pvmouse, { |out = 0, recBuf = 1, fftSize =1024|
	var in, chain, bufnum, mY, result;

	bufnum = LocalBuf.new(fftSize);
	chain = PV_BufRd(bufnum, recBuf, MouseX.kr(0, 1));

	// Render FFT
	result = IFFT(chain).dup;

	// Pitch-shift	
	result = PitchShift.ar(
        result,    // stereo audio input
        0.1,             // grain size
		(MouseY.kr(-12, 12).round).midiratio,    // mouse x controls pitch shift ratio
        0,                 // pitch dispersion
        0.004            // time dispersion
    );

	Out.ar(out, result);
}).add;

)

a = Synth(\pvmouse, [recBuf: b]);

a.free;

I think I can live with the audio quality of this solution. Plus, I can have some fun messing with the parameters of PitchShift.ar()