Some questions about bit-depth in theory and practice

Dear users,

According to digital audio theory, the dynamic range of 16-bit bit-depth is 96dB from the following formula:

20 * log10( (2 ** -16) / (2 **  0) ) // -96.329598612474 ≈ -96dB

However, this is the range of the converted amplitudes in the range from -1 to 1. Thus, I think the actual dynamic range meaningfully perceived by our ears is 48 dB. I am not sure…
For example, the following two codes are the same in their dynamics:

{!2 * 0.1 }.play
{!2 * -0.1 }.play

In the following examples, we can distinguish the inverted phase when hearing two signals from different channels, but we cannot determine the dynamic difference :

{ * [0.1, -0.1] }.play
{ * [-0.1, 0.1] }.play

(Of course, one can hear that the Code 1a and 1b are slightly louder than the code 2a and 2b when hearing their sound in loudspeakers, but there is no dynamic difference when hearing from earphones.)

However, how is it in 32-bit float audio?
Firstly, I quote the following two formuala from the web page in order to ask my questions:

  • dBmax = 20 x log (3.4 x 10 ** 38) = 770 dB
  • dBnoise = 20 x log (1.2 x 10 ** -38) = -758 dB

I would like to know how experts think about the followings:

  1. The actual dynamic range of 16-bit audio meaningfully perceived by our ears is 48 dB, not 96 dB, according to the four codes presented above. Is my thought correct?

  2. The actual dynamic range tracked by an excellent dynamic follower (if it can exist) is 758 dB since the value above 0dB is headroom.

  3. What do you think about the meaningful smallest amplitude in SC or any audio software supporting 32-bit float but-depth audio? I don’t remember the smallest value I used. It might be -96.dbamp in actual works or even -120.dbamp for tests, but probably not smaller.

Sclang doesn’t have operator priority, meaning the expression is strictly evaluated left to right

(20 * log10 ( ( (3.4 * 10)**38 ) )) // what you have
20 * log10(3.4 * (10**38) )  // what you need

32bit floating points have 24 bits of precision between -1 and 1. I think Wikipedia has a good explanation of this. That website you linked is correct only in the technical sense. If you are storing audio that would normally clip, floating point files will be preferable, but otherwise fixed point is always better.

The difference between the code examples you have is phase, I don’t think there would be any difference regardless of bit depth. Perhaps I am misunderstanding?

Thanks for your quick response!

Yes, I know it. This is my mistake in thinking. I do not understand why I thought there was a pair of parentheses. Perhaps I mistyped while copying, pasting, and editing. Thanks for pointing it out! I will edit it, it is so basic, so I am very shame that I asked it here.

I agree! Honestly, I have not yet experienced such a case in my work.

It is an example of phase inverting. Of course, there is nothing regarding bit-depth, but I think there might be something regarding the range of bit-depth:

The dynamic range of bit-depth is the logarithmic ratio between the maximal and minimal values. For example, from the following formula (the first formula in my first post), the dynamic range is as follows:

20 * log10( (2 ** -16) / (2 **  0) )

Here, 2 ** -16 should be -1, and 2 ** 0 should be 1 if I understood correctly; then, the difference between -1 and 1 is approximately 96 dB. As my phase inversion examples show, there is no perceivable dynamic difference between the amplitude 0.1 and -0.1. It will be the same also between the amplitude 1 and -1. Thus, I think the actual perceivable dynamic range of 16-bit audio is 48 dB, and my question No. 1 is if this is correct… When I generalise it, it could be as follows: The perceivable* amplitude (or dynamic) difference in dB of digital audio is half the dynamic range of its bit-depth.

Perhaps, the term ‘perceivable’ is not appropriate here. Instead, the term ‘controllable’ may be better.

Ah, now I think I understand…
so the equation should be

20 * log10( 2**16 )

Not negative 16.

In the floating case…

The first value (3.4 * 10 ** 38) here is a large positive number. But the second (1.2 * 10 ** -38) is a very small (close to zero) value. None of the values are negative, as everything is just mirrored in the negative.

I am actually confused as to why the 16 case is measured to 2 ** 16 and not to 2 ** 15 for this reason. Which would be about 90db.

Also worth reading:

… which explains the conventional signal-to-noise ratio formula. (This document references a 1948 peer-reviewed paper deriving this formula – I’m going to take a wild guess that a professional engineer analyzing noise characteristics in digital signals will probably come up with a better answer than online forum speculation, especially if that answer hasn’t been challenged in 74 years.)

SNR = 6.02 * num_bits + 1.76

… where 6.02 ~= 20 * log10(2).

SNR can be understood as the volume difference between the loudest sound that is possible to represent (aka “0 dBFS” deciBels Full Scale) and the noise floor resulting from quantization error.

  • 16 bits → 98.08 dB
  • 24 bits → 146.24 dB

There’s a really nice video from (the organization responsible for ogg Vorbis and Theora formats) which, somewhere in the middle, observes that the noise floor of a cassette tape corresponds to about 5-6 bits’ resolution :astonished: – so 98 dB is already very good.

32-bit floating point carries only 24 bits of resolution.

A single precision float is:

  • 1 sign bit
  • 8 exponent bits
  • 23 mantissa bits

From the 23 mantissa bits, you get 24 bits resolution, because floating point places the point such that there is exactly one nonzero digit to the left of the point: 5 = binary 101 = 1.01 * 2^2. In binary, there is only one possible nonzero digit = 1! So it can be assumed (not encoded): it’s always 1-point-something (except 0, which is treated as a special case).

When a 32-bit floating-point signal exceeds 1.0, the 24 bits of precision shift upward = noise floor rises. The SNR will still be ~= 146 dB – the signal is louder than what is conventionally defined as 0 dBFS, but the noise floor is louder by the same amount.