Experiences with RAVE UGen?

The only real answer is for you to try the transfer learning for yourself and see what you think of it.

I don’t know about your other question, again the community is on Discord so that is where to ask.

Version 0.0.5-alpha ( :pancakes: :broccoli:) is out!

Now it supports batch processing multiple inputs!
Here is a minimal example, there are a few more in the README and HelpFiles.

// use a single model to resynthesize four inputs at the same time: 
{ NN(\pancakes, \forward).ar(SinOsc.ar([100,200,300,400])) }.play
// (note: output is 4 channels)
2 Likes

Can someone explain what this Rave thing is ? Im a bit foggy on what it does,

Looks like it’s a Ai sampler. In the sense that it is trained on a sound and should be a new synth of that natural sound with enough training . Is that right? So instead of a static sample it can be manipulated more?

Also, a more general question about Ai and audio, what is really going on under the hood? Is it just FFT? Or something more exotic? Thanks

https://arxiv.org/pdf/2111.05011

@misc{caillon2021rave,
title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis},
author={Antoine Caillon and Philippe Esling},
year={2021},
eprint={2111.05011},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

1 Like

I can try to give a more functional explanation, as a not-that-AI-knowledgeable person :slight_smile:

RAVE allows you to train a model based on a corpus of similar sounds (e.g. saxophone sounds). For any given input sound (provided to it in raw samples), it will reduce that to a limited subset of time-varying latent parameters (e.g. ~16 audio-rate or near audio-rate streams, but it depends on the model) that describe the closest match to that sound in the saxophone model. Those parameters can then be used to reproduce a saxophone sound based on the model (e.g. the sounds it has been trained on).

So, in theory (with a well-trained model), if you give it a saxophone sound very close to what it has heard before, it will reproduce more or less that exact sound from the latent parameters - in a way, it’s a kind of very extreme compression. If you feed it a non-saxophone sound, it will reproduce a saxophone sound that is a sort of best-match for that sound (where “best match” is a verrrrrry ephemeral concept…). You can also modify the latent parameters, or generate them from scratch - in general (though it depends on how the model was trained I think?), the latent parameters will produce smoothly varying changes in the output sound, so you can e.g. encode an input sound, then vary one or two of it’s latents slightly, and you’ll get a slightly changed (but hopefully coherent) output sound. Or, you vary the latent parameters in a completely random way, which will essentially give you a random walk around the space of your model, producing a continuous stream of saxophone-like sound (or whatever the model is trained to reproduce).

As a practical synthesis technique, it works similarly to concatenative synthesis - e.g. find a sound from this corpus that is the best match for this other sound. But, of course, it produces novel sounds that are not in the original corpus, and can be much better at producing sound that is continuously varying and acoustically coherent. It’s not particularly good at precise control, and it’s not particularly good at extremely high-fidelity sound, but it can produce really surprising and strange results in an extremely intuitive way. In terms of fidelity, it feels close to AI image synthesis circa 1-2 years ago - once in a while you get an uncanny valley level of realism, more often you get something that reproduces the model with some glitches/fuzziness/weirdness that can be surprising or just frustrating depending on your taste / the project it’s being used for.

You could, I imagine, use it as a sampler - e.g. make / find a drum model, feed it one drum sound as a starting point, and then produce a hundred variations on that drum sound by tweaking the latents slightly. What the latent parameters control is always a bit mysterious, so you’ll never end up with e.g. an attack and decay knob, just some values that you can change to alter SOMETHING about the sound.

I think some techniques will work on sound in e.g. FFT form (or some other more perception-based encoding), and some work directly on raw sample data. I’m not sure how RAVE in particular is transforming it’s audio data.

2 Likes

It works directly on waveform data rather than using FFT. At most, it does a multiband with simple filters like a vocoder. That’s precisely why it is possible to do it in real-time. Even the encoder seems quite heavy already (neural network → 128-dimensional matrix, all extracted from the waveform)

It leverages the waveform approach to model and generate audio directly. That’s the innovation, or the “RAVE” part, that happens in real time.

"We define our encoder as the combination of a multiband decomposition followed by a
simple convolutional neural network, transforming the raw waveform into a 128-dimensional latent
representation. "

This is quite interesting. We tend to think computers can only analyze based on our perception. We can’t accept that the model is not based on FFT, , or perception-based analyses. It’s counterintuitive!) But the rules seem to change when the question is real-time safety, and we do whatever works.

That’s why FFT was adopted in digital signal processing, and the Walsh Transform, which would be much more efficient in digital computations, was forgotten in history.

Walsh Transform decomposes a signal into a set of square waveforms, or Walsh functions. These functions are piecewise constant and take values of +1 or -1, which perfectly match how computers work at the base binary level

(The lesson is: Math is your friend, as much as your ear. You can trust both.)

EDIT: Someone did a Walsh Transform in SuperCollider in a blog post. Some people know how to have fun)))))))

Is there a build for the rave Ugen for m2 Macs?

Been playing with this and its sublime, The sounds on the horizon are going to be amazing
https://ganharp.ctpt.co

This might get things going fast.
the VST ;

and a way to train it with new sounds;

2 Likes

In danger of pointing out the obvious, you can very likely use DDSP in SC with the VSTPlugin extension.

2 Likes

It looks like RAVE supports transforming audio with mel and pqmf as well as raw audio, but I don’t see a lot of evidence that either one of these is used very much (there’s just one configuration that uses pqmf afaict).

1 Like

Hey! I don’t have a mac, let alone an m2, so I haven’t tried personally, but have you tried the arm64 mac release:
https://github.com/elgiano/nn.ar/releases/download/v0.0.5-alpha/nn.ar-macOS-arm64.zip

Please let me know if it works if you try it!
thanks

1 Like

Tried building for M2

ss@sss-Laptop build % cmake .. -DSC_PATH=/Users/ss/supercollider -DCMAKE_PREFIX_PATH="path/to/libtorch;/usr/local" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/Users/ss/sc3-plugins

-- CMAKE_HOST_SYSTEM_NAME="Darwin"

-- Found SuperCollider: /Users/ss/supercollider

-- Building plugins for SuperCollider version: 3.14.0-dev

-- The C compiler identification is AppleClang 14.0.3.14030022

-- The CXX compiler identification is AppleClang 14.0.3.14030022

-- Detecting C compiler ABI info

-- Detecting C compiler ABI info - done

-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped

-- Detecting C compile features

-- Detecting C compile features - done

-- Detecting CXX compiler ABI info

-- Detecting CXX compiler ABI info - done

-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped

-- Detecting CXX compile features

-- Detecting CXX compile features - done

-- Install directory set to: /Users/ss/sc3-plugins

-- Found ZLIB: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libz.tbd (found version "1.2.11")

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success

-- Found Threads: TRUE

-- Caffe2: Found protobuf with new-style protobuf targets.

-- Caffe2: Protobuf version 26.1.0

CMake Warning at /opt/homebrew/share/cmake/Torch/TorchConfig.cmake:22 (message):

static library kineto_LIBRARY-NOTFOUND not found.

Call Stack (most recent call first):

/opt/homebrew/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)

CMakeLists.txt:79 (find_package)

-- Found Torch: /opt/homebrew/lib/libtorch.dylib

CMake Error at CMakeLists.txt:87 (install):

install FILES given directory "/opt/homebrew/lib/gio" to install.

-- Added server plugin target RAVE_scsynth

-- Added server plugin target RAVE_supernova

-- Generating plugin targets done

-- Configuring incomplete, errors occurred!

ss@sss-Laptop build %

it sounds like a problem with libtorch installation. I don’t really know how it works on mac, I just can see that it doesn’t find libkineto, but I know libkineto.a is included in the libtorch package for mac arm64 (https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.3.0.zip). Could it be an issue with homebrew?

By the way, have you tried the pre-built binary I linked above? Sorry for the ignorance, but I don’t even know if something built for macos arm64 would work on both m1 and m2…

Version 0.0.6-alpha ( :adhesive_bandage: :pancakes: :broccoli:) is out!

Fixed a bug with multi-channel processing, thanks to @scztt for finding it and pointing it out

1 Like

Hi, yes I did the the prebuilt binary, no luck. I’ll try posting the error later

Version 0.0.6-updated is updated with a review of the build system. It uses latest possible torch version for each architecture, and also has a build for macOS 14 (macos-latest).

@sslew maybe you want to try that one? hope it works

P.S. so far we have tried and it works on machines running Linux, Windows and macOS (M1/M2)

2 Likes

Works, Thank you very much

1 Like