FFT processing on Audio and Mic

Dear team,

Hope this mail finds well. This is my first programming language I’m learning and have learnt(self-taught) the basics of SC using The SuperCollider Book, Youtube and scsynth forums. I’m trying to do a project based on Digital Signal Processing (DSP) using FFT’s (Fast Fourier Transform UGen). I’ve looked at forums, help files on SCIDE, created my own question(FFT processing using SoundIn and an audio track) – but couldn’t get an complete understanding/answer to what I need to do.

My idea → Take 2 inputs:

  1. Input from internal mic
  2. An audio track being played through a buffer.

Do real-time FFT processing with input mic and an audio track.

I think I got a flow diagram as above from online (https://stash.reaper.fm/37301/STFT%20sketch%20small.png) and thinking of applying the exact concept.

So my idea is to play the audio track and affect the phase (once I get an working template, I can do other processing by modifying the parameters or doing something different) from the input mic to create an experimental audio out track.

So far, I have figured out playing with fft bins (audio and mic separately). I’m stuck here not knowing:

1)How to combine them, 2) Compare audio and mic fft using a loop or something and then 3) Modifying(processing) one with another. Could you please please help me in this regard and point me at the right direction? I’m desperate to get a working model by end of this year if possible please :blush:

s.boot;

//Mic Input processing - without using synthdef
(
x= {
	var sig, chain, size=2048;
	sig=SoundIn.ar(0, 1);
	chain= FFT(LocalBuf(size), sig, 0.5, 0);
	chain = PV_BinScramble(chain, \wipe.kr(0), \width.kr(0), \trig.tr(0)); // Processsing to be done
	sig=IFFT(chain)*0.1!2;
}.play;
)

x.set(\wipe, 0.8, \width, 0.2, \trig, 1);
x.free;


//Audio FFT processing using SynthDef
~buf = Buffer.read(s,"/Users/Kirthan/Desktop/real.wav"); //choose any audio file

(
SynthDef(\fftaudio, {
	arg loop=1, da=0;
	var sig, chain, size=2048;
	sig = PlayBuf.ar(1, \buf.kr(0), BufRateScale.ir(\buf.kr(0)), loop:loop, doneAction: da);
	chain = FFT(LocalBuf(size), sig);
	chain = PV_BinScramble(chain, \wipe.kr(0), \width.kr(0), \trig.tr(0)); //Processing to be done
	sig = IFFT(chain) * \amp.kr(0.5)!2;
	Out.ar(\out.kr(0), sig);
}).add;
)

a = Synth(\fftaudio, [\buf, ~buf]);
a.set(\loop, 0, \da, 2);
a.set(\wipe, 0.5, \width, 0.1, \trig, 1);
a.free;


//Mic FFT proccessing using SynthDef
(
SynthDef(\fftmic, {
	var sig, chain, size=2048;
	sig=SoundIn.ar(0, 1);
	chain= FFT(LocalBuf(size), sig, 0.5, 0);
	chain = PV_BinScramble(chain, \wipe.kr(0), \width.kr(0), \trig.tr(0)); //Processing to be done
	sig = IFFT(chain) * \amp.kr(1)!2; // Amplitude set to unity gain - Please reduce or use it with caution!!!
	Out.ar(\out.kr(0), sig);
}).add;
)

m = Synth(\fftmic);
m.set(\wipe, 0.5, \width, 0.1, \trig, 1);
m.free;

As a general approach, you could add a synth that streams the microphone input to one audio bus (instead of to the speakers), and a second synth that plays the loaded audiofile to a second audio bus (also instead of sending it to the speakers), then create a third synth that reads input from both audio busses and combines both inputs into something new, and then send the new thing to the speakers.
You will want to investigate using groups to ensure that the third synth is executed after the first two synths.

1 Like

Thanks a lot for your swift response @shiihs, will get cracking on with that!

A post was merged into an existing topic: FFT processing using SoundIn and an audio track

Oh, sorry about that, with two threads for basically the same question, I got confused about which post belongs where.

I’d suggest choosing one and following up only there. Splitting the conversation between two threads is probably not very effective as it dilutes attention.

hjh

Thanks for doing that. This forum UI is new to me and I’m not quite certain what I have done. I hope you’ve fixed it.

Actually I should move that post back here :pleading_face: … Since the other post refers to an attachment missing from this thread.

Bit messy…

hjh

Received an automated mail saying the .scd can’t be accepted. So have attached a screenshot of my code.

Thank You

Kirthan

Would you want me to do something? Remove something?

@richfield006 always post your code as text with code tags (paste the code, select it and then click the “</>” icon or type Ctrl+E):

a = b + c;
d = e + f;
1 Like

Thank you. Updated my post!

A bit more information from a private message – pulling it public as it’s a more detailed problem spec, better to have more eyes on it than only mine.

“bins with lower energy need to be boosted” sounds like a PV_Compander with slopeBelow set to a number less than 1, and slopeAbove = 1 (i.e. the opposite of a normal compressor). If that doesn’t work, try a normal compressor (slopeBelow = 1, slopeAbove < 1) with make-up gain (with a PV_ math UGen).

The catch is the relationship between the two inputs, where I’m confused. Do you mean a typical sidechain, but spectrally? That is, boost quiet bins by an amount related to the corresponding sidechain input’s bin loudness above the threshold? (Hm, if that’s the case, then slopeBelow < 1 wouldn’t work because it would boost where the sidechain is quiet. So you might have to take the compressor → gain approach.)

Or is it compared to the total signal level from the mic?

Unfortunately PV_Compander doesn’t have a sidechain input, so it’s tricky. I’d like to have a totally clear understanding of exactly what you want to happen to one bin before putting a lot more time into it. I suspect it might be to subtract the sidechain (so that louder sidechain bins would make the source bin much quieter), then compand, but I’m not clear that I’m reading you correctly. A clear problem spec leads more quickly to a solution.

hjh

1 Like

Thanks a lot for your time and detailed explanation. I guess you have understood my idea.
I’ll try to explain my idea bit clearly - Consider you are listening to a track on a product with a speaker and a mic (Smart Speakers, Phones etc) in a outside busy environment (streets etc.). Since the external noise sometimes gets louder, quieter parts of the track you are listening to would become masked by the noise. I am proposing to boost the quiter bits (increase the energy) of the track to overcome the noise energy by still retaining the louder parts the same in the track you are listening to - Its more like an expander. Hope this explains my idea.
So I’m taking the mic input (Input 1) as my side-chain signal - I’m interested in the fft bins that have magnitudes above a certain threshold - So I would have to collect the bin number and magnitude from this input
Input 2 is the audio track your listening to - Taking the Input 1 Bin and Magnitude info, I’m gonna process the increase the bins and magnitudes of this track triggered from the mic input.
Have attached a pic (a flow diagram):

Been thinking this over, though I didn’t have time to test anything.

Couple of ideas:

  1. With pvcalc2, you could use the magnitude of the sidechain bin to calculate a gain factor for the audio bin, using math operators. You couldn’t use “if” here, but for example a standard compressor gain curve (without knee), i.e. the amount of attenuation, not the final gain, would be something like max(0, noisemag.ampdb - threshold) * (1 - ratio) and you could apply this as a positive gain to the other signal (where a standard compressor applies this as a negative gain, more or less). For a large number of bins, this will make a huge SynthDef and it wouldn’t be CPU-cheap, but it would let you express the math as you like. (Btw if you have too many bins, there’s more time between FFT frames so it would react slower.)

  2. PV_ units would be faster, but I’m not sure you can do exactly the above with them. One idea I had was to start with the main audio magnitudes and subtract the noise magnitudes – so that louder noise bins would result in much quieter magnitudes. Then, PV_Compander with slopeBelow < 1. This would amplify the quieter bins (and since a louder noise bin produces a smaller magnitude, this would amplify more where the noise input is louder). Then add the noise magnitudes back in. The gain per bin, however, would depend on both input magnitudes instead of depending only on the noise mag, so this might not do exactly what you want.

A third approach would be to write your own PV_ unit in C++, but I guess you’d rather prototype in a SynthDef first.

I don’t promise that either of these would actually sound good.

hjh

1 Like

Thanks a lot, will give those ideas a try today. As I’m currently trying hard to get a working model, not much bothered about sounding nice😁 -. As I installed the sc3 plugins, have been trying some units which does FFT stuffs - came across FFTPeak. Is there a way I can send the freq and mag information of Mic input to a bus and receive those numbers and process those numbers on audio track?


s.boot;
(
SynthDef(\mic, {
	arg size=512; // Sent it to a different bus
    var sig,chain, freq, mag,
	sig = SoundIn.ar(0, 1);
	chain = FFT(LocalBuf(size), sig, 0.5, 1, winsize: 256);
	chain = PV_LocalMax(chain, 10); // Only send Bins above a threshold
	# freq, mag = FFTPeak.kr(chain1, 20, 20000).poll; // I guess I can set the rage of FFT bins I need and output the freq and mag.
    }).add;
)

Synth(\mic);


~voice = Buffer.read(s,"/Users/Kirthan/Desktop/real.wav"); // Load a MONO Audio file

(
SynthDef(\audio, {
	arg size=512;
    var sig, chain;
	sig = PlayBuf.ar(1, ~voice.bufnum, BufRateScale.ir(~voice.bufnum), loop: 1);
	chain = FFT(LocalBuf(size), sig, 0.5, 1, winsize: 256);
	chain = PV_MagBelow(chain, 0.01); //Only send FFT bins below a threshold
	//Or is there a way I can receive the Freq and Mags info from Mic play those in the audio file?
	
	
	sig=IFFT(chain)*1!2;
    }).add;
)

Synth(\audio);

s.quit;

Sadly you’ve run into limitations with the FFT implementation in SuperCollider. If you want to push forward with this, you may end up having to write C++ to get what you want, and writing new UGens can be really tough for a new programmer.

But not all is lost! If you’re willing to abandon FFT entirely, you can use a combination of bandpass filters, amplitude followers, and peaking filters to implement the audio effect you’ve described. To some this may seem like a royal hack, but this kind of filter design has a long history in analog electronics and should not be discounted. An advantage you get for free is zero latency, if that matters to you, and a minor disadvantage is phase distortion.

Rough sketch of how this would look:

  • Divide the frequency spectrum into N bands, where N is a small number like 8 or 16.
  • Take the mic signal and run it through N parallel bandpass filters whose frequencies are centered on the bands
  • Run each of these bandpass filters through a slow Amplitude.ar to get the amplitude of each band
  • Map these amplitudes to dB gain factors
  • Take the input signal and run it through N series peaking filters whose frequencies are centered on the bands, and whose gain factors are taken from above

One might argue that this dramatically reduces frequency resolution compared to FFT, but I wonder if that property is strictly what you want. If the mic signal has a sine wave in it, then you’ll suddenly get a whistling resonance in your output signal because those frequencies are boosted. So with FFT you’d need frequency-domain smoothing anyway.

I can’t write code for this right now, but hopefully that will get you started. Happy to answer questions.

2 Likes

Sorry to be a pain, I have been trying to implement this for past 4 hours but failed miserably as I couldn’t completely understand the process on how to do this. Could you please help me out. Thank you

I’m sorry, I have 6.5 hours of classes today, it won’t happen until tomorrow at the earliest.

Anyone else take a stab?

hjh

We are hashing out my solution in private messages right now.

Yes, many thanks to @nathan for his time and support helping me out making the model without the use of FFT. Will post the code once I get it working.

1 Like