Using samples with different rates 48000Hz vs 44.1Khz

I have a few sets of samples I need to use but one set is 44.1Khz while the other is 48000Hz. I’ve read that you need to convert one or the other to match your server’s sample rate so it plays back at the intended pitch.

Is there a built in function or does anyone have the math to properly convert 48000Hz down to 44.1Khz or vice versa, to accomplish this without tweaking it and listening to it to match the original pitch? Sorry if this is a stupid question.

https://doc.sccode.org/Classes/BufRateScale.html

2 Likes

Thanks brother much appreciated

using SoX in batch processing, for instance:

for file in *.wav; do sox "$file" "44100Hz_$file" -r 44100; done
2 Likes

The resample quality between applications can vary a lot; there are different ways to do it. I believe SoX is among the good ones.

Also, this one is worth taking a look at.

There is a “mathematical” (by observation, since the ideal method is costly, with thousands of operations per sample) way to check if the algorithm is good enough. We know the sweet spot. If you’re curious, there are ‘proofs’ and analyses of how engineers figured this out for practical use:

4 Likes

Thanks for the help guys, I got distracted shamefully from playing Call of Duty too much and now I am back on this project. What I’ve noticed is a strange behavior, because I changed the server’s sample rate from the default 41.1Khz to 48000Hz (using s.options.sampleRate), but when I then read the same test.wav file (one that uses 41.1K sample rate) using Buffer.read(s, “test.wav”), it sounds the same as when the server is running the default 41.1K sample rate. I did not have to convert at all using BufRateScale or SoX batch processing. Is there some kind of auto conversion going on? Rebooted the server after changing the options and verified using s.sampleRate

Booting server ‘localhost’ on address 127.0.0.1:57110.

Device options:

  • MME : Microsoft Sound Mapper - Input (device #0 with 2 ins 0 outs)
  • MME : Microphone (Realtek(R) Audio) (device #1 with 2 ins 0 outs)
  • MME : Microsoft Sound Mapper - Output (device #2 with 0 ins 2 outs)
  • MME : Headphones (Realtek(R) Audio) (device #3 with 0 ins 2 outs)
  • MME : Speakers (Realtek(R) Audio) (device #4 with 0 ins 2 outs)
  • Windows DirectSound : Primary Sound Capture Driver (device #5 with 2 ins 0 outs)
  • Windows DirectSound : Microphone (Realtek(R) Audio) (device #6 with 2 ins 0 outs)
  • Windows DirectSound : Primary Sound Driver (device #7 with 0 ins 2 outs)
  • Windows DirectSound : Headphones (Realtek(R) Audio) (device #8 with 0 ins 2 outs)
  • Windows DirectSound : Speakers (Realtek(R) Audio) (device #9 with 0 ins 2 outs)
  • ASIO : Realtek ASIO (device #10 with 2 ins 2 outs)
  • Windows WASAPI : Speakers (Realtek(R) Audio) (device #11 with 0 ins 2 outs)
  • Windows WASAPI : Headphones (Realtek(R) Audio) (device #12 with 0 ins 2 outs)
  • Windows WASAPI : Microphone (Realtek(R) Audio) (device #13 with 2 ins 0 outs)
  • Windows WDM-KS : Speakers (HD Audio Speaker) (device #14 with 0 ins 2 outs)
  • Windows WDM-KS : Microphone (HD Audio Microphone) (device #15 with 2 ins 0 outs)
  • Windows WDM-KS : Headphones (HD Audio Headphone) (device #16 with 0 ins 2 outs)

Requested devices:
In:

  • (default)
    Out:
  • (default)

Selecting default system input/output devices

Booting with:
In: MME : Microphone (Realtek(R) Audio)
Out: MME : Headphones (Realtek(R) Audio)
Sample rate: 48000.000
Latency (in/out): 0.013 / 0.093 sec
SC_AudioDriver: sample rate = 48000.000000, driver’s block size = 64
SuperCollider 3 server ready.
Requested notification messages from server ‘localhost’
localhost: server process’s maxLogins (1) matches with my options.
localhost: keeping clientID (0) as confirmed by server process.
Shared memory server interface initialized

@danknugz how are you playing the buffer?
Buffer.read will always read the same buffer data. It’s the playback that needs to be adjusted when trying to use a buffer with a different sample rate.

BTW Buffer -play does use the BufRateScale, so when you play the buffer using this method, it will always play at the correct pitch, applying resampling as needed.

I personally wouldn’t bother converting the samples; just use BufRateScale if you are playing the buffer using PlayBuf.

1 Like

Thanks, I was just using ~test.play; and that explains why. When I use the following it then sounds pitched up, makes sense.

(
SynthDef.new(\bufplayWithoutRateScale, {
arg buf=0, rate=1, amp=1;
var sig;
sig = PlayBuf.ar(2, buf, rate, doneAction:2);
sig = sig * amp;
Out.ar(0,sig);
}).add;
)
Synth.new(\bufplayWithoutRateScale, [\buf, ~test.bufnum, \rate, 1]);

Would you know the implications of using BufRateScale all the time, is it realtime, enough to affect performance, i.e. is it worth it to re-record the 44.1K samples as 48K so everything is native with no rate scaling going on? I know there are certain things like aliasing that are reduced when using 48K vs 44.1K under certain situations so that’s why I am trying to make everything native 48K if it makes a difference.

I use BufRateScale all the time in RT. I don’t think it takes much CPU, but I haven’t tested that.

The most times I don’t use it is when I forget, and then I realize that something plays at the wrong pitch later on…

I would only bother re-recording (if that’s even an option?) if you can actually hear any artifacts of resampling when using BufRateScale.

1 Like

A sampled signal is frequency band limited: there is an upper limit to the frequencies that can be represented at a given sampling rate. This is the Nyquist frequency = samplerate / 2. A 44.1 kHz file can represent frequencies up to 22,050 Hz. A 48 kHz file can represent frequencies up to 24,000 Hz.

Aliasing is what happens when you have a signal containing frequencies greater than half the sampling rate, and you try to sample that signal at this sampling rate.

If you’re resampling a 44.1 kHz file at 48 kHz, then the original signal can contain no energy above 22.05 kHz, and the target sampling rate is fine up to 24 kHz. So it’s completely impossible for the original file to contain any frequencies that will alias at the target sampling rate – hence, in the case you’re describing, there is no possibility of any aliasing in the resampled file whatsoever.

If you are downsampling, there’s a possibility of aliasing. 48 kHz → 44.1 kHz, aliasing will be limited to the very top frequencies (44100 - 24000 = 20,100 and above) so you probably won’t notice. 96000 → 44.1 kHz, aliasing could be noticeable. In that case, it would be better to use sox, which (should/will) filter out the too-high frequencies.

But the case you’re describing is up-sampling.

When you PlayBuf with rate scaling, it’s using cubic interpolation to get the values between the samples. This is an approximation – there will be some error compared to the true continuous signal represented by the samples. “Error” = noise. But the noise level should be at least a hundred dB (just guessing) quieter than the signal in typical cases – which is likely quieter than the noise introduced by the DAC, the amplifier and the speakers.

Which is a long way of saying… don’t worry about it. (If you’re really worried about it, you can go ahead and resample them and then sleep well. But, upsampling by a factor of 9%… just BufRateScale it and get on with your project.)

hjh

1 Like

Thanks for the detailed description on the aliasing, with the aliasing I read somewhere it can also come up with certain audio processing like timestretch, or possibly other effects depending on how the signal is manipulated. That’s really what I was more concerned about than just the raw sample itself. Does that even make sense? Does the difference of the original sample’s samplerate vs. the server sample rate potentially cause aliasing when that sample’s audio is then processed further by sending it through different buses/effects? It sounds like you shouldn’t have to worry about upsampling if the noise level is insignificant, but what about if that upsampled audio signal is then processed further?

Again: If you are starting with 44.1 kHz files and playing them with rate scaling at 48 kHz, there is no possible way that this could introduce aliasing because the frequency band of the 48 kHz output fully covers the frequency band of the 44.1 kHz input.

If the amount of aliasing in the rate-scaled signal is 0, and the amount of aliasing in a non-rate-scaled rate-matched file would also be 0, then any further processing would in both cases be operating on a signal without aliasing. Therefore, with respect to aliasing, there could not be any difference.

Higher rate input downsampled to a lower rate input would introduce aliasing, but this would be noticeable before applying any other effects.

Effects may introduce aliasing on their own, but that’s independent of sample rate conversion. And in that case, you could filter out the high frequencies before the effect.

Noise from cubic interpolation, I’d expect, would be like: if your sound is playing at the level of a jet engine, then the noise would be on the order of butterfly wings. That’s a guess but the point is, this is not really anything to lose sleep over.

hjh

2 Likes

Cubic interpolation is good enough and very high quality. However, polyphase filters are also considered good and more specific for audio resampling (instead of using a general technique). Although we’re talking about subtle things at this level, one would not be so worried about them.

EDIT: Interesting visualization comparison (for fun):

https://src.infinitewave.ca/?Top=SoX144_HQ&Bot=AFsp&Spec=0100
\

1 Like

Ah right, that makes sense. Kind of like capturing 60fps camera footage and then playing it at 120fps but duplicating every frame, it would still effectively be 60fps, but capturing at 60fps then playing at 30fps would lose frames.

I noticed that supercollider server defaults to 44.1Khz though so this might be something to consider if you were running that with the 48K samples and downsampling? I have chosen to change the server to run at 48K, some of my samples are 44.1K, others are 48K, so I guess it makes more sense to change the server’s sampleRate to avoid downsampling and only introduce potential upsampling, instead of using 48K samples while the server uses 44.1K.

I also came across this when googling. Crazy how bad some of the older software is

SRC Comparisons

If you check the ServerOptions code file, you’ll see that the default sample rate is nil, meaning that the SC server’s default behavior is to use the sample rate configured in the system.

ServerOptions.defaultValues[\sampleRate]
-> nil

s.options.sampleRate
-> nil

// (and see in code file:)
				sampleRate: nil,

I think this is the correct default behavior – you don’t want SC to override the system’s sample rate unless the user specifically requests SC to do so.

Also note that if you set a sample rate for the server, the OS might take on that sample rate for the audio driver, or that request might fail. (And, in Linux, scsynth cannot override the JACK sample rate at all.)

hjh

FIY this is technically correct, but due to backend specifics with PortAudio, it doesn’t actually work that way on Windows…
Practically speaking, ASIO devices will always boot at 44100 when SR is not set (nil).
MME devices I think also boot at 44100 by default, but I’m not certain.

A-ha. I didn’t know that.

So I should amend my position – since the behavior isn’t consistent across operating systems, and can’t be, then it isn’t meaningful for SC to specify a concrete default.

hjh