Continuing the discussion from Proposal: SynthDefs could eliminate duplicate UGens::
I initially approached audio processing from a purely signal-based perspective but developed a different way of thinking when I investigated audio graphs more deeply.
While audio graphs, or graphs in general, didn’t originate with digital signal processing, category theorists and computer scientists have been interested in them for a long time, revealing a structure that appears in seemingly unrelated areas of mathematics and computation.
Recently, I discussed with a friend who hinted at the simplicial-like structure of cumulative effects in an audio graph. This is a trippy idea and a coincidence of what I was trying to do with graph optimizations on a more sober level ( More on that idea later.)
Let’s start with the basics. The concept of audio unit composition dates back to the early days of modular synthesis. Someone smart long ago noticed that we could create complex sounds by combining oscillators, filters, and amplifiers instead of more complicated, do-everything synthesizers.
A complex synth can always be broken down into simpler units, so these essential elements are more fundamental as units of composition in sound design.
Simple units work well when you combine them in higher-order structures , while complex synths don’t. You can take any three simple audio units, and there’s a unique way they can be connected (it may be trivial if they’re incompatible).
But complex synths will (I think that almost always) have multiple possible connection schemes.
Modern digital audio languages use audio graphs to build complex sound designs, including all sorts of complex sounds. It’s the basis of most of what is done with digital audio in programming languages or patch editors.
Audio graphs are just systems of audio units that share some of the inputs and outputs. (SynthDefs, for example.)
Bare with me for a minute now: simple audio units can approximate complex audio processes in n-dimensional parameter spaces. At least, that’s what talking to my friend about “simplicial-like structure” graphs reached.
First, We could use audio units in spaces with more than amplitude and time dimensions. This way, we could build a spatial audio processor in a 3D sound field without it conflicting with other processors.
We can also consider replacing simple audio units with higher-dimensional objects.
We could approximate complex audio effects by combining more straightforward effects. This technique is used in SynthDef design, where we often organize many simple processors in data structures called audio graphs.
However, just like single audio units don’t work well for complex sound design, simple combinations cannot be used in intricate, evolving sounds.
We can continue generalizing this construction to higher and higher dimensions of audio processing.
To form an n-dimensional audio processor, we can pick n+1 audio parameters. We can draw a connection between any two parameters, a triangular relationship between any three parameters, a tetrahedral relationship between any four parameters, and so on.
What do I want to say? That audio processors have a very regular recursive structure
An n-dimensional processor has n+1 sub-processors, which are all n-1 dimensional processors. A multi-effect processor has four effect sub-processors, an effect processor has three parameter controls (one-dimensional processors), and a parameter control has two limit values.
Every higher-dimensional audio processor can be decomposed into lower-dimensional processors, and the process can be repeated until we reach individual parameter values.
Another leap of abstraction: let’s abandon traditional signal paths. Can we still define audio processors, and if so, how would we use them?
Consider an audio graph built from simple units. It defines a particular sound processing chain. We can rearrange this chain any way we want, but we cannot change its fundamental structure as long as we don’t break connections or fuse units.
The information about the audio graph’s structure is encoded in connections.
The connections don’t depend on specific signal paths.
Two audio units are either connected or not.
Two effects either share a parameter or they don’t.
Twp processors either share an effect or they don’t.
This is a grammar of precise information, just like pattern matching, which has to be exact, never ambiguous on each definition, and embrace the totality of the patterns.
–
You will know what the synthesizer will sound like only if you have a patch. The sheet tells you how to arrange things: which effects form the sub-processors of which processors (effects, etc). In general, you want to know which lower-order processors are the “components” of higher-order processors. This can be determined by defining functions between the corresponding sets, which we’ll call component maps.
For instance, there should be two functions from the set of effects to the set of parameters, one assigning the input and the other the output to each effect.
There should be three functions from the set of multi-effects to the set of effects, and so on.
If the same parameter is the output of one effect and the input of another, the two effects are connected.
An effect may be shared between multiple multi-effects, a multi-effect may be shared between complex processors, and so on.
For instance, you can compose these functions to select a multi-effect parameter or a complex processor’s sub-effect.
Composable functions suggest a category, in this case, a subcategory of audio processors.
Selecting a subcategory suggests a functor from some other, more straightforward category.
Our functor would map the audio processor category to corresponding sets of processors.
What determines the structure of this category is its morphisms. In particular, we need morphisms that would be mapped, under our functor, to the functions that define components of our processors - the component maps. This means, in particular, that for every n, we need n+1 distinct functions from n to the image of n-1. These functions are themselves images of morphisms that go between n and n-1 in " audio process category"
The system/grammar/category we’ve discussed delivers a perfect framework for pattern matching in audio graphs.
By understanding the compositional nature of audio processors and their relationships, we can develop algorithms to identify common substructures, transformation patterns, and even potential optimizations in complex audio processing chains.
We could use the concept of homotopy to identify audio graphs that are “topologically equivalent” - that is, graphs that can be continuously deformed into one another without changing their fundamental audio processing characteristics.
This could be useful in audio design, where different implementations of the same effect could be recognized as equivalent.
These things provide a formal way to describe adding or removing audio units from a graph.
This could be used to develop intelligent audio routing systems that automatically simplify or expand audio graphs based on specific criteria or even new approaches and techniques, such as “AI” models and new algorithms.