Ways to find periodicity in signals

Hi,

I need to find periodicity in sound coming from a microphone: I should be able to distinguish between periodic/pseudo-periodic sounds, and all the rest, and do this analysis in real time, if at all possible. The idea is to check if in the whole incoming sound (real outdoor soundscape), I’m acquiring some thing which is periodic -even pseudo periodic- or periodic/pseudo periodic just for a while as well. So I can identify it and further elaborate it, differently from the more noisy material.

Every answer is welcomed!

Thank you

Leo

Hi!

It seems that you are looking for something like Zero Crossing Rate (ZeroCrossing Ugen). When the rate is high you tend to have noisy signals, when it is low and stable you tend to have signals with stable pitch. You can play with the threshold or add more statistical features to get some heuristics regarding your specific signals.

(As a suggestion for organizing the forum: I think that your question would be more apropiately categorized into “Questions” and not “Music”. :grin:)

Do you just mean pitch (in which case you can use the second return value of Pitch.kr) or larger-scale repetition on the scale of seconds? The latter could be done with the autocorrelation on a set of suitable machine listening descriptors (e.g. MFCCs), but I doubt there’s a way to do that in SC without writing a custom UGen in C++.

Hi, and thank you for answering
I
I think Im not getting the point : actually zerocrossing ugen doesnt have any threshold argumwnt, and it actually reports the real pitch of the analyzed sound, if you poll the result. I mean, if I didn’t get it wrong, it is basically a pitch tracker, but based on zero crossing. And even in case of mapping the result one doesnt have any real measure of periodicity in a signal, but is simply mapping a signal. Am I wrong?
What do you suggest in order to use this Ugen to see peeiodicity? If I analyze a SinOsc with static grequency , I get its pitch, not for example, a “1”, where 1 could be totally periodic and 0 totally noisy, or alike…

Thank you! And yes next time I’ll post this kind of questions in the appropriate section.

Leo

Hi and thank you for answering
I mean the second option you mention. Basically I have a mic acquiring an outdoor soundcsape, and I’d like to see when, among other unpitched/less-pitched sounds, and noisy material in general, I can isolate periodic sound events, like musical instruments’ sound ( they would be present in that spundscape easilly) or any material which is periodic or pseudo periodic for a while. Maybe a possible solution is to devide the spectrum in frequency bands and analyze them simply using Pitch or Tartini, and find a way for which when the reported pitch stays stable for a while, is considered periodic ( as it normally is, in fact!).

I exclude writing a class in C++ cause I don’t know how to program in that language.

Thanks!

Leo

Why would Zero Crossing Rate be high on noisy signals? Wouldn’t a high zero crossing rate be a high pitched sound?

Isn’t white noise the random occurence of audio in the entire spectrum? And isn’t finding the periodicity in audio an FFT?

If you are trying to measure when a musical instrument is playing in a field recording, assuming the instruments are close-ish to the mic, then you are probably better off using amplitude as your metric.

There are many issues here which I don’t think you have thought about.
Is the background sound stable, if so you could subtract its FFT spectrum, cancelling it out, or is it full of loud irregular sounds?
What about unpitched percussion?
What do you mean by other noisey material? Most sounds you find in a field recording are only noisey because they are far away.

If you want to measure when a pitched instrument is playing, then using pitch tracking is a decent approach.

Another option is to look at MedianSeparation, which separates ‘harmonic’ content from ‘percussive’, but both those terms are technically defined and might not meet your requirments.

Tartini’s second output may be fairly close to this, though I can’t promise it exactly matches your requirement.

"hasFreq
“confidence in the estimate- 0 = no fundamental found, 1= fully confident, then also values inbetween- will be above 0.9 if really finding a strong periodicity”

hjh

1 Like

Hi,
I didn’t explain my self enough: my situation is the following: I have (let’s stay mono for the sake of clarity) one cardio condenser mic, pointing in an outdoor environment, which has some surfaces, walls, columns…in this place there could be some musical instruments playing as well as people talking walking, etc…many different or poor juman activity as well, it depends. For noisy material I mean non defined in pitch sonic material, I mean, something not definible as periodic, something which is background noise, something spectrally disordered, lets say.
And my need would be to isolate, in that mass of sound, the periodicity. I dont need precision, but instead, i’d need my system to be able to recognize when there’s some pitched, therfore periodic or pseudo periodic sound present.
As you suggest, one good option would be running an FFT continuously, and continuously make the subtraction of the noise from the needed part. By the way I want to prrserve the noisy part, only, I need to have to two separated during analysis
Maybe pitch tracking is enough if done after the fft spearation.


.

Thank you, and yes, maybe it would be a solution and may be the easiest.

ZCR can be a pitch tracker, but it is a really bad one. This is because our pitch perception is more related to mechanisms like autocorrelation than to ZCR. ZCR is a relative “good” and cheap measure of pitchness/noisiness of a signal.

1 - Because noisy signals consists of tons of frequencies happening simultaneously, so you get lots of ZCs.
2 - A very high pitched sound can be, for instance, a single sinusoid at 5kHz (this is almost out of the range of pitch, because above ~5k humans do not have clear perception of pitch, tuning, harmony, etc). If you compare this with white noise, the amount of ZCs present in white noise would still be way higher because you have frequencies between 5-20kHz. High pitch is different from high frequency content.

CR can be a pitch tracker, but it is a really bad one. This is because our pitch perception is more related to mechanisms like autocorrelation than to ZCR. ZCR is a relative “good” and cheap measure of pitchness/noisiness of a signal.

I agree, but I was referring to the Ugen. As I said, it seems it doesnt provide a measure with polarities
fully periodic------------fully noise
But tracks the pitch (badly in comparison to Pitch.kr or Tartini.kr),and yes, as a consequence, if the sound is periodic, it should track the more or less same frequency, and viceversa with a PinkNoise for example. So yes if the frequency is the same is periodic otherwise no. As suggested, this would be enough to check for periodicity…
I must say Im probably confused about the “extract the periodicity” thing from the entire soundscape VS just check for periodicity on the whole soundcape itself…I must reflect on that

Right, it’s still not clear to me whether you’re talking about pitchiness/noisiness (which is a nontrivial but mostly solved problem in DSP for monophonic signals) or something else.

My recommendation when working with machine listening descriptors is to record a small dataset that’s representative of the real-world situation, and do some rough manual classification of the sounds to get an idea of what you want. To build a robust analysis system, it’s important to have good data to validate that the algorithms are working as intended. If you’re comfortable sharing a few audio snippets and your ideas of how they should be classified, I can help guide you towards finding the right algorithms as well.

There are many algorithms for pitch/periodicity detection that don’t explicitly use FFT, including autocorrelation, AMDF, and crude peak picking.

Yes, I think that the best thing is to share some audio, here it is

In the one I’m sharing now, if you do the following
(
~b = Buffer.read(s, …);
(
{ var freq, hasFreq, sig = PlayBuf.ar(1, ~b, loop:1);

#freq, hasFreq = Tartini.kr(sig,n: 2048, k: 2048/2, overlap: 1024) ;
freq.poll(label:\freq);
hasFreq.poll(label:\HASFREQ);

Out.ar(0, sig!2);
}.play
))

you see that, as the sound of the trombone is higher in amplitude than the rest, Tartini is able to pitch-track it and in fact hasFreq argument tends towards 1 (full confidence), while Pitch.kr seems to say "yes I found the fundamental–> 1, no I don’t find any fundamental -->0, and no values in between.

My need would be to real time track the whole soundscape, and, when I find some periodic/pseudo periodic/almost periodic content, I save it and provide a dedicated elaboration to it, for the situation is actually that I’m building a sonic ecosystem, using the soundscape as the primary sound source. But the more I’m asking for help the more I realize, as I already said, that there’s something not working in the logic of this pitch extraction process. Because, in this case it is quite easy, as the trombone eats the rest of the scape, but what if wasn’t that loud…?
I should be prepared to recognise the periodicity (pseudo/almost…) in case it happens. For example, if I run some FFT based Ugens to see how the energy is spectrally distributed, I could know if, when end where there’s more energy, and do my things with that informations. The thing here is exactly what you said, to find some descriptors that fits the entire soundscape and possibly differentiate them in order to apply further, dedicated processing. So I guess one approach would simply be : be prepared to recognise periodic sounds if they appear, and they could appear -at the ears of the computer- when some conditions are presents -they are simply louder, the mics position…-, or another approach could be : continuously analyze the spectrum trying to clean it from anything that is not periodic -so, from the more noise quantity possible, so, build a, let’s call it, noise-mask - and when a periodic sound appear it can be isolated through this analysis, the computer sees it and I can elaborate it. I think there we are talking about this second possibility, as I’m able to do the first one, and refine it.
How do you see it?
Thanks for your patience

Leo

A few questions here.

The audio sample also has people talking. Are you hoping to detect the trombone but not the voices?

Do these elaborations sample and transform the input audio, or simply respond with their own material (e.g. George Lewis’ Voyager system)? Is the noise mask purely to enable more accurate analysis, or are you actually using the denoised signal in audio playback? If you want a denoised/separated signal clean enough to play back, you’re looking at a seriously difficult, research-level problem.

No,
I don’t hope to exclude voices, and by the way, using those reocrdings are just a way to simulate the real world situation I’ll encounter when doing this live. The purpose of the analysis process, should be done only in order to know when something periodic happens in the whole soundscape I’m acquiring, and to address that part to a dedicated elaboration, which will be part of the output, so no, the cleaned from noise sound wont’ be played back as it is…and the elaboration can take place carrying delays, no problem for that.
So the noise mask should exist in order to clean the sound from noise, as much as it is possibile, with the only purpose of eliminating what is for sure non periodic and so, restrict the analysis field.
I now know that unmixing noise from periodic material is a mission impossible. I thought it could be done using FFT procedures as pvcalc, the 2 or pvcollect or some ways involving FFT and spectral data manipulation.
I’d simply rely on hasFreq argument provided by Tartini.kr, (like taking only the >high level part, like 0.8 let’s say…) done on the most cleaned spectrum I can obtain. To clean it a bit there’s this UGen, PV_Residual, which actually keeps the noisy part of a signal, and the way suggested to me here in the forum, of using MedianSeparation, which as a consequence, will give me the harmonic part, here again deprived from some noisier events.

Let’s see…another way is to use a denoiser plug in inside SC and run the analysis from that cleaned sound…

While a bunch of the suggestions here might work unspecific circumstances, dealing with these complex real world situations and expecting a computer to listen and identify with the accuracy of a human is ultimately untenable (except maybe with some large AI project, but even then…).

Although this forum is primarily technical in nature, it might be worth questioning your aesthetics here. I.e., it might be easier, practical, and interesting to consider the computer not a as a tool, but as a performer in its own right; as such, ‘machine listening’ algorithms become akin to sensory organs as opposed to scientific instruments of precision. In other words, what method you use to measure sound becomes the computers ears and defines how it sees and exists in the world: by choosing algorithms you are building a body. Therefore, to be empathetic to this new body you have made, you would need to embrace and accept the differences between how you hear, and how it hears as oppose to labelling these as unwanted errors. Choosing an algorithm then becomes more like choosing who you would like to collaborate with, do you like/respect the algorithm’s world view?

That is really just a suggestion, but unless dealing with such technical nuances is important to the way you work/want to work, it might be worth trying to re-frame the issue to avoid such difficulties?

1 Like