Using FFT UGens to create a concatenative-synthesis thingy?

nammedit · March 31, 2020, 3:05pm

Hey!

I’m looking into the FFT UGens wanting to create a code that analyses incoming sound, and finds a sample from a folder that has the highest match to the sounds Spectral Centroid information at an interval of for example 1 second.
I’m trying to use the SpecCentroid.kr() but I have no idea how I’ll go forth trying to analyse the information, as I can’t get any numbers out of the UGen that I can work with.

(
~b1 = Buffer.alloc(s, 2048, bufnum:1);
~b2 = Buffer.read(s, "/Volumes/DATABSE M/codes/supercollider/disklavier/soundfiles/state3.wav", bufnum:3);
)

(
a = {
	var chain, sig, centroid;
	sig = PlayBuf.ar(1, bufnum:3, doneAction:2);
	chain = FFT(b, sig);
	centroid = SpecCentroid.kr(chain);
	sig;
}.play;
)

Can anyone lead me in a good direction?

elgiano · March 31, 2020, 3:58pm

I have been also working on something like this!
SCMIR is still the resource I would use for this, especially if you want to analyze a corpus of buffers offline and then use that dataset to “reconstruct” another sound live.
IMO the pipeline would be more or less like this:
0) Analyze your buffers and store the data

Load the data and put it in something like a KDTree
Analyze a live source, send the data from server to client using SendReply/OSCFunc
Search for nearest neighbors in your dataset: get the buffer number and frame position
Create one or more synths to play the retrieved segment

It looks like you’re asking about point 2. Would this help to move one step forward?

(
a = {|arg pollRate=10|
	var chain, sig, centroid;
	sig = PlayBuf.ar(1, bufnum:3, doneAction:2);
	chain = FFT(b, sig);
	centroid = SpecCentroid.kr(chain);
        // send centroid to language pollRate times per second
        SendReply.kr(Impulse.kr(pollRate), "/anal", centroid);
	sig;
}.play;
)

(
// this function will receive centroids from the server
OSCdef(\recvAnal,
    {|msg| 
        var centroid = msg[3];
         // do something with your received centroid
         // ....
    },
    path: "/anal",
    // this is for receiving only from a
    argTemplate: [a.nodeID]
);
)

nammedit · March 31, 2020, 4:42pm

Dude, I can’t thank you enough. I’ve been low-key looking for a way to get data from the server back to the language for ages, thank you so much!!

question: What is a KDTree? Is it a part of the SCMIR library? (which I haven’t downloaded yet)

elgiano · March 31, 2020, 7:58pm

KDTree is a Quark on its own. It gives you a data structure that knows how to find nearest neighbors in an n-dimensional space (nice if you plan to have more than one features).

Quarks.install("KDTree")

I would really like to hear what would be other people’s approaches to this task!

jordanjuras · April 1, 2020, 9:30am

@elgiano I can second this approach, and especially emphasize the requirement for a KDTree for real time searching.

In my experience with concatenative synth, adding more features will make this process more interesting. SCentroid can be misleading at times if the intention is to match pitch. You could try also for time domain periodicity measures, which are more computationally cheap that spectral analysis requiring FFT. I have found that envelop also has a huge influence on ‘aesthetic’ matching.

The preprocess analysis step should take some appreciable time if there are a lot of samples in the directory, but searching for NNs in the KDTree will be fast. The one realtime latency consideration i would question is returning a filepath, or file index from the KDTree search, opening the file, and streaming the audio. Perhaps, if you are using short audio samples, you could also buffer the audio files.

Also to consider is how the file lengths compare to the analysis window.

fmiramar · October 26, 2020, 12:44am

@elgiano do you have some basic code with the whole process that you could share ?

elgiano · October 26, 2020, 4:29pm

I have to look for it and tidy it up, but I can do it in the weekend! Please remind me if I forget

cbe · May 10, 2021, 2:07pm

Super interesting! What about other indices/descriptors for that task? I’m still on a very beginners level, so excuse my ignorance, but is the general way to implement any such descriptors/indices into Supercollider by utilizing a library/method, someone has already created?
Take for example the bass ratio (proAV / data and information, lists, tables and links): In essence it’s super simple analysis+ a bit of math. If there isn’t already an implementation could this be realized in Supercollider with a reasonable amount of time (and knowledge of course)?

Sam_Pluta · May 10, 2021, 2:22pm

Just released:

https://www.flucoma.org/download/

Lots of descriptors.Very well organized and also works the same way in Max and PD.