As a total n00b to spatializing audio in general and ambisonics in particular, I’m trying to wrap my head around it and hit a few questions:
First set of questions is about understanding the process of composing with ambisonics:
I think I understand ambisonics manipulate the audio scene as a whole.
Therefore, if I wanted to have (say) two sound sources “move” independently, I’d need to encode, transform and decode them separately and then sum (or average) the decoded signals. Still correct so far?
If in addition I now wanted to apply some additional transformations to the resulting scene as a whole, I think I’d need to re-encode the superposed decoded signals. Assuming the superimposed signals were decoded to some 8 speaker configuration, I then would have to specify these same speaker locations during this second encoding process, perform the transformations and then decode again to the same speakers (or maybe binaural for use with headphones). Is my understanding correct?
Second set of questions is about the “sweet spot”:
As far as I understand, ambisonics has a sweet spot and using higher order ambisonics can somewhat increase the size of this sweet spot.
So first question: if one is located outside the sweet spot, will the sound be totally “incomprehensible” or is there some kind of “graceful degradation”?
Second question: will ambisonics sound (a lot) worse than simply using PanAz and SplayAz when working with a ring of speakers, more specifically outside this sweet spot?
Third question: I think I saw somewhere that a good rule of thumb is number_of_speakers = (ambisonics order * 2) + 2. Are there any advantages to using higher order than 3 when working with 8 speakers?
No doubt these questions show my total lack of experience in the field, but I’m hoping to get some insights, thanks!
You can mix ambisonic signals - you only need to decode for playback!
So you create an ambisonic master and decode it for 8 or headphones etc as needed.
The sound outside the sweet spot is not nonsense- it’s just harder to perceive the intended locations. It can still create an interesting sense of space just not a magical illusion as you get in the sweet spot.
The localization you can get with ambisonic microphones is a lot of fun the sense of your head turning when you rotate the scene is terrific.
These are all great questions! I’ve been working in ambisonics for a few years and have at some point or another wondered many of the same things myself! +1 to everything @semiquaver wrote, but I might just supplement their post with a few details from my experience (following the order of your points):
General:
•put simply, ambisonics is surround sound on a sphere, so individual sounds can be “panned” around the surface or the entire thing can be transformed/warped/manipulated at once
•as mentioned, a typical workflow is to have a decoder as a “master bus” effect for either monitoring or performance; pieces encoded to high orders can be decoded to various speaker arrays without having to re-encode the material
•I guess this depends what kind of transformations you want to make - spatial transformations? Spectral? Conceptually, ambisonics is a bit like mid/side (but with added dimensions…) - you can do transformations to both encoded and decoded sounds, but they will have different effects on the sonic outcome
Sweet spot:
•higher ambisonic orders == higher spatial resolution and a larger sweet spot
•moving outside the sweet spot distorts the spatial image (the same can happen on a stereo monitoring system), but can sometimes be desired; I often encode droney/pad material to low orders (ie. “shrinking” the sweet spot) to give the material a healthy blur
•I guess it depends on your definition of worse? There are composers that combine ambisonics and discrete speaker playback in pieces - some material lends itself better to being played back as a mono signal from a single speaker (and letting the room contribute the spatial information) whereas some material really shines when presented with the spatial depth ambisonics provides
•I kinda hinted at it above, but working in high ambisonic orders allows for portability - a piece encoded to 7th order can be decoded to various speaker arrays without re-rendering or re-encoding. If I play my 7th order material on an 8-channel speaker array, the higher-order material won’t be reproduced (ie. LPF in the spatial domain) but the lower order material will still be present. The speaker rule you mentioned can be thought of like a spatial Nyquist theorem - you won’t hear more resolution than your speaker array can reproduce, but you’ll be giving your CPU a nice workout!
There are definitely other people on this forum with much more experience/knowledge than me, so hopefully I’ll be corrected anywhere I’ve misspoken! I highly recommend checking out the ambisonic toolkit quark - the documentation is incredibly thorough and definitely goes into more detail than I do here. And there are many examples!!!
I was thinking of spatial transformations indeed. Applying spectral transformations to encoded material probably will have hard to predict spatial consequences (which is not necessarily a bad thing but not what I’m interested in at the moment).
Random thought: does it mean something like spatial aliasing exists where perhaps it’s still beneficial to work at higher order?
Thanks for pointing me to the atk documentation. I’ve been looking at it, but I found it a bit lacking in sketching the overall picture (at least I didn’t see much related to the very basic questions I had - perhaps it’s there and I just didn’t understand it ). E.g. the fact that you can just sum two encoded streams came as a surprise. (Perhaps not very surprising anymore once you realize it’s all matrix multplications and convolutions, yet I somehow failed to connect those dots.)
Sure – when I first heard of ambisonics, it baffled me that encoded sound fields could be mixed freely. Here’s how I came to understand it:
You can represent a stereo signal in terms of left and right channels, or equivalently in terms of mid and side.
L and R: “I have two numbers, 7 and 3.”
M and S: “I have two numbers. Their sum is 10, and the first one is 4 greater than the second.”
The information is equivalent. If you start with L and R (which is the case if you’re recording with an XY pair), then M = L+R and S = L-R. If you start with mid and side (which can be done with one omnidirectional mic, and one cardioid pointed left), then L = (M+S)/2 e.g. (10+4)/2 = 7, and R = (M-S)/2 e.g. (10-4)/2 = 3.
If you have two L and R streams, you mix them by adding. If you have two M and S streams, you mix them by… adding.
So the mixed “mid” is equivalent to mixed-left + mixed-right, and the mixed side is equivalent to mixed-left - mixed-right.
Mid-Side gives you one dimension (the X axis). Ambisonic B-format (first-order) only extends the mid-side concept to the other two axes: you get W = mid (sum of everything), X = what’s different on the left, Y = what’s different in front, Z = what’s different above. That’s all… seems magical but it’s not. (I haven’t used HOA but I suppose it’s similar.)
Since the principle is the same for each axis, then the mixing principle is also the same.
The “higher order” somehow made me think that signals were not “linear” combinations anymore but I understand now that’s not what it means. It’s probably closer to using more partials in a Fourier transform.
I think that’s a reasonable comparison - the image of spherical harmonics halfway down this page is perhaps helpful to visualize how higher order components add spatial resolution.
The short answer is that Ambisonics is essentially a set of ideas and techniques for working with (holophonic) soundfields built around the Spherical Harmonic Transform (STH). The SHT is the FT over the surface of the sphere. We can think of the SH coefficients (Ambisonics channels) as “spatial bins”. And, we can think of Ambisonic order as being equivalent to FFT size.
The Ambisonic Enlightenment page offers a walkthrough of some important issues regarding the framing of Ambisonics. In particular, the question of sweet spot is reviewed here. Basic implications of Ambisonic order are reviewed here.
@shiihs, Ambisonics is both a deep and wide topic! Try (w/ the ATK):
As @jamshark70 points out, just like we can mix stereo sources together, we can mix Ambisonic encoded sources. These could be synthetic, or recorded from a SoundField microphone (for FOA), Eigenmike (for HOA). (Other hardware is available, too!!)
Once we have a mix, just like in stereo, any processing will affect the complete mix.* In principle, just like in stereo, if we want to process** separate Ambisonic streams separately, we need to do so before we mix.
*Actually, we can do beamforming, to select a part of the mix, process, and then re-inject into the soundfield. Some examples can be found here.
Questions about sweet spot and Ambisonics are often misunderstood, and the situation is sometimes substantially overstated (e.g. ‘there is no sweet spot’). In practice, I think performance at higher orders tends to be comparable to other approaches. (You can see this from the plots in the link above.) It’s important to realise that precedence effect is not trumped by ambisonics, and very much applies. (Indeed it can be worse with non ‘in phase’ approach since localisation can be 180 degrees wrong, though thankfully this is not usually a problem with modern decoders.) In general the sweet spot (at least as ambisonics usually defines this) is small, and indeed much smaller than is relevant for concert presentation. (Probably much smaller than you think! Haha!)
In practice, this luckily doesn’t matter so much however, and a variety of factors affect listener experience. For example, while pairwise panning tends to be best in testing for point source localisation from an ideal listening position, ambisonics can have a ‘smoother’ sense of graduated space, since it should never have a source on only one speaker. This smooths out localisation blur and is less likely to get an ‘in the box effect’ when panning to exact speaker directions. (As an aside, VBAP’s spread parameter exists exactly to overcome this issue with pairwise panning, even though it makes point source localisation worse!)
This ‘wronger but nicer’ aspect can be very desirable, and is a principle worth considering in general with spatialisation systems. Qualitative aspects are under discussed or ignored! How approaches ‘go wrong’ outside the ideal listening area is similarly very important, and ambisonics tends to do pretty well with that. (i.e. wrong, but not too wrong with most common ambisonics decoding schemes). How ‘right’ does your music need to be is something else to keep in mind.
A final point is that results vary substantially by decoder type, especially with irregular speaker arrays. (Technically anything that isn’ a Platonic solid for 3D arrays). Parametric decoders, many of which use VBAP like approaches based on analysed sources or critical bands, can give very good point source and transient localisation, but at the expense of some artifacts. YMMV!
The ambisonic enlightenment article (and all the other ATK docs) is among some of the best info you can find on ambisonics out there. Highly recommend everyone try it.