Experiences with RAVE UGen?

jarm · December 7, 2023, 1:29pm

The only real answer is for you to try the transfer learning for yourself and see what you think of it.

I don’t know about your other question, again the community is on Discord so that is where to ask.

elgiano · May 13, 2024, 9:55pm

Version 0.0.5-alpha ( ) is out!

Now it supports batch processing multiple inputs!
Here is a minimal example, there are a few more in the README and HelpFiles.

// use a single model to resynthesize four inputs at the same time: 
{ NN(\pancakes, \forward).ar(SinOsc.ar([100,200,300,400])) }.play
// (note: output is 4 channels)

sslew · May 14, 2024, 9:58pm

Can someone explain what this Rave thing is ? Im a bit foggy on what it does,

Looks like it’s a Ai sampler. In the sense that it is trained on a sound and should be a new synth of that natural sound with enough training . Is that right? So instead of a static sample it can be manipulated more?

Also, a more general question about Ai and audio, what is really going on under the hood? Is it just FFT? Or something more exotic? Thanks

smoge · May 15, 2024, 2:21am

https://arxiv.org/pdf/2111.05011

@misc{caillon2021rave,
title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis},
author={Antoine Caillon and Philippe Esling},
year={2021},
eprint={2111.05011},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

scztt · May 15, 2024, 8:05am

I can try to give a more functional explanation, as a not-that-AI-knowledgeable person

RAVE allows you to train a model based on a corpus of similar sounds (e.g. saxophone sounds). For any given input sound (provided to it in raw samples), it will reduce that to a limited subset of time-varying latent parameters (e.g. ~16 audio-rate or near audio-rate streams, but it depends on the model) that describe the closest match to that sound in the saxophone model. Those parameters can then be used to reproduce a saxophone sound based on the model (e.g. the sounds it has been trained on).

So, in theory (with a well-trained model), if you give it a saxophone sound very close to what it has heard before, it will reproduce more or less that exact sound from the latent parameters - in a way, it’s a kind of very extreme compression. If you feed it a non-saxophone sound, it will reproduce a saxophone sound that is a sort of best-match for that sound (where “best match” is a verrrrrry ephemeral concept…). You can also modify the latent parameters, or generate them from scratch - in general (though it depends on how the model was trained I think?), the latent parameters will produce smoothly varying changes in the output sound, so you can e.g. encode an input sound, then vary one or two of it’s latents slightly, and you’ll get a slightly changed (but hopefully coherent) output sound. Or, you vary the latent parameters in a completely random way, which will essentially give you a random walk around the space of your model, producing a continuous stream of saxophone-like sound (or whatever the model is trained to reproduce).

As a practical synthesis technique, it works similarly to concatenative synthesis - e.g. find a sound from this corpus that is the best match for this other sound. But, of course, it produces novel sounds that are not in the original corpus, and can be much better at producing sound that is continuously varying and acoustically coherent. It’s not particularly good at precise control, and it’s not particularly good at extremely high-fidelity sound, but it can produce really surprising and strange results in an extremely intuitive way. In terms of fidelity, it feels close to AI image synthesis circa 1-2 years ago - once in a while you get an uncanny valley level of realism, more often you get something that reproduces the model with some glitches/fuzziness/weirdness that can be surprising or just frustrating depending on your taste / the project it’s being used for.

You could, I imagine, use it as a sampler - e.g. make / find a drum model, feed it one drum sound as a starting point, and then produce a hundred variations on that drum sound by tweaking the latents slightly. What the latent parameters control is always a bit mysterious, so you’ll never end up with e.g. an attack and decay knob, just some values that you can change to alter SOMETHING about the sound.

I think some techniques will work on sound in e.g. FFT form (or some other more perception-based encoding), and some work directly on raw sample data. I’m not sure how RAVE in particular is transforming it’s audio data.

smoge · May 15, 2024, 8:50am

It works directly on waveform data rather than using FFT. At most, it does a multiband with simple filters like a vocoder. That’s precisely why it is possible to do it in real-time. Even the encoder seems quite heavy already (neural network → 128-dimensional matrix, all extracted from the waveform)

It leverages the waveform approach to model and generate audio directly. That’s the innovation, or the “RAVE” part, that happens in real time.

"We define our encoder as the combination of a multiband decomposition followed by a
simple convolutional neural network, transforming the raw waveform into a 128-dimensional latent
representation. "

smoge · May 15, 2024, 9:32am

This is quite interesting. We tend to think computers can only analyze based on our perception. We can’t accept that the model is not based on FFT, , or perception-based analyses. It’s counterintuitive!) But the rules seem to change when the question is real-time safety, and we do whatever works.

That’s why FFT was adopted in digital signal processing, and the Walsh Transform, which would be much more efficient in digital computations, was forgotten in history.

Walsh Transform decomposes a signal into a set of square waveforms, or Walsh functions. These functions are piecewise constant and take values of +1 or -1, which perfectly match how computers work at the base binary level

(The lesson is: Math is your friend, as much as your ear. You can trust both.)

EDIT: Someone did a Walsh Transform in SuperCollider in a blog post. Some people know how to have fun)))))))

sslew · May 16, 2024, 2:47am

Is there a build for the rave Ugen for m2 Macs?

Been playing with this and its sublime, The sounds on the horizon are going to be amazing
https://ganharp.ctpt.co

This might get things going fast.
the VST ;

and a way to train it with new sounds;

Spacechild1 · May 16, 2024, 4:01am

In danger of pointing out the obvious, you can very likely use DDSP in SC with the VSTPlugin extension.

scztt · May 16, 2024, 7:58am

It looks like RAVE supports transforming audio with mel and pqmf as well as raw audio, but I don’t see a lot of evidence that either one of these is used very much (there’s just one configuration that uses pqmf afaict).

github.com

acids-ircam/RAVE/blob/44498a0d6ea80349be5cf65ccd9e8d64a0edfbdb/rave/model.py#L172


      
              enable_pqmf_encode: Optional[bool] = None,
              enable_pqmf_decode: Optional[bool] = None,
              is_mel_input: Optional[bool] = None,
              loss_weights = None
          ):
              super().__init__()
              self.pqmf = pqmf(n_channels=n_channels)
              self.spectrogram = None
              if spectrogram is not None:
                  self.spectrogram = spectrogram
              assert input_mode in ['pqmf', 'mel', 'raw']
              assert output_mode in ['raw', 'pqmf']
              self.input_mode = input_mode
              self.output_mode = output_mode
              # retro-compatibility
              if (enable_pqmf_encode is not None) or (enable_pqmf_decode is not None):
                  self.input_mode = "pqmf" if enable_pqmf_encode else "raw"
                  self.output_mode = "pqmf" if enable_pqmf_decode else "raw"
              if (is_mel_input) is not None:
                  self.input_mode = "mel"
              if loss_weights is not None:

github.com

acids-ircam/RAVE/blob/44498a0d6ea80349be5cf65ccd9e8d64a0edfbdb/rave/model.py#L246


      
                      {'optimizer':dis_opt})
          
          def _mel_encode(self, x: torch.Tensor):
              batch_size = x.shape[:-2]
              x = self.spectrogram(x)[..., :-1]
              x = torch.log1p(x).reshape(*batch_size, -1, x.shape[-1])
              return x
              
          def encode(self, x, return_mb: bool = False):
              x_enc = x
              if self.input_mode == "pqmf":
                  x_enc = _pqmf_encode(self.pqmf, x_enc)
              elif self.input_mode == "mel":
                  x_enc = self._mel_encode(x)
                  
              z = self.encoder(x_enc)
              if return_mb:
                  if self.input_mode == "pqmf":
                      return z, x_enc
                  else:
                      x_multiband = _pqmf_encode(self.pqmf, x_enc)

elgiano · May 16, 2024, 10:55am

Hey! I don’t have a mac, let alone an m2, so I haven’t tried personally, but have you tried the arm64 mac release:
https://github.com/elgiano/nn.ar/releases/download/v0.0.5-alpha/nn.ar-macOS-arm64.zip

Please let me know if it works if you try it!
thanks

sslew · May 18, 2024, 11:51pm

Tried building for M2

ss@sss-Laptop build % cmake .. -DSC_PATH=/Users/ss/supercollider -DCMAKE_PREFIX_PATH="path/to/libtorch;/usr/local" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/Users/ss/sc3-plugins

-- CMAKE_HOST_SYSTEM_NAME="Darwin"

-- Found SuperCollider: /Users/ss/supercollider

-- Building plugins for SuperCollider version: 3.14.0-dev

-- The C compiler identification is AppleClang 14.0.3.14030022

-- The CXX compiler identification is AppleClang 14.0.3.14030022

-- Detecting C compiler ABI info

-- Detecting C compiler ABI info - done

-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped

-- Detecting C compile features

-- Detecting C compile features - done

-- Detecting CXX compiler ABI info

-- Detecting CXX compiler ABI info - done

-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped

-- Detecting CXX compile features

-- Detecting CXX compile features - done

-- Install directory set to: /Users/ss/sc3-plugins

-- Found ZLIB: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libz.tbd (found version "1.2.11")

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success

-- Found Threads: TRUE

-- Caffe2: Found protobuf with new-style protobuf targets.

-- Caffe2: Protobuf version 26.1.0

CMake Warning at /opt/homebrew/share/cmake/Torch/TorchConfig.cmake:22 (message):

static library kineto_LIBRARY-NOTFOUND not found.

Call Stack (most recent call first):

/opt/homebrew/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)

CMakeLists.txt:79 (find_package)

-- Found Torch: /opt/homebrew/lib/libtorch.dylib

CMake Error at CMakeLists.txt:87 (install):

install FILES given directory "/opt/homebrew/lib/gio" to install.

-- Added server plugin target RAVE_scsynth

-- Added server plugin target RAVE_supernova

-- Generating plugin targets done

-- Configuring incomplete, errors occurred!

ss@sss-Laptop build %

elgiano · May 20, 2024, 2:04am

it sounds like a problem with libtorch installation. I don’t really know how it works on mac, I just can see that it doesn’t find libkineto, but I know libkineto.a is included in the libtorch package for mac arm64 (https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.3.0.zip). Could it be an issue with homebrew?

By the way, have you tried the pre-built binary I linked above? Sorry for the ignorance, but I don’t even know if something built for macos arm64 would work on both m1 and m2…

elgiano · May 20, 2024, 2:06am

Version 0.0.6-alpha ( ) is out!

Fixed a bug with multi-channel processing, thanks to @scztt for finding it and pointing it out

sslew · May 20, 2024, 6:54pm

Hi, yes I did the the prebuilt binary, no luck. I’ll try posting the error later

elgiano · May 22, 2024, 7:20pm

Version 0.0.6-updated is updated with a review of the build system. It uses latest possible torch version for each architecture, and also has a build for macOS 14 (macos-latest).

@sslew maybe you want to try that one? hope it works

P.S. so far we have tried and it works on machines running Linux, Windows and macOS (M1/M2)

sslew · May 23, 2024, 5:21am

Works, Thank you very much

Sam_Pluta · July 7, 2024, 1:14am

Question most certainly for elgiano: I’ve been staring at NN.ar all day, and it seems to me that this is pretty close to being a general purpose neural net tool, not only limited to Rave models. Is there a way to trick NN.ar into thinking I have sent it a Rave model, when I have only just made a simple Regressor, etc.

Here is my very wrong python code, but I think you will get a sense of what I am trying to do:

import torch
import torch.nn as nn
import nn_tilde

# Define the neural network model
class MLPRegressor(nn_tilde.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLPRegressor, self).__init__()
        self.n_channels = 2
        self.target_channels = 1

        self.fc1 = nn.Linear(input_size, hidden_size//2)  # Modified line
        self.fc2 = nn.Linear(hidden_size//2, hidden_size)
        self.fc2b = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size//2)
        self.fc4 = nn.Linear(hidden_size//2, hidden_size//4)
        self.fc4b = nn.Linear(hidden_size//4, hidden_size//8)
        self.fc5 = nn.Linear(hidden_size//8, output_size)

        self.register_method(
            "forward",
            in_channels=2,
            in_ratio=1,
            out_channels=1,
            out_ratio=1,
            input_labels=['(signal) Channel %d'%d for d in range(1, self.n_channels + 1)],
            output_labels=['(signal) Channel %d'%d for d in range(1, self.target_channels+1)]
        )

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        x = torch.sigmoid(self.fc2b(x))
        x = torch.sigmoid(self.fc3(x))
        x = torch.sigmoid(self.fc4(x))
        x = torch.sigmoid(self.fc4b(x))
        x = self.fc5(x)
        return x

# you need to define the model above before you load it
model = torch.load("/Users/spluta1/Documents/SC/neural_osc/cpu_aa_saw_140")

# create a random input tensor
x = torch.rand(1, 2)
# Save the model as a TorchScript
torchscript_model = torch.jit.trace(model, x)
torchscript_model.save("/Users/spluta1/Dev/python/aa_saw_model.pt")

All I want to do is register the forward method so that it can be called from Rave…

Any ideas?

Sam

sslew · July 10, 2024, 10:30pm

Is the Fluid Corpus Manipulation toolkit similar to the Rave Ugen? I can’t seem to get Rave working properly on Mac M2. But Fluid is good

tedmoore · July 11, 2024, 12:42am

There is no overlap of functionality between the two systems, although they both use machine learning.

https://www.flucoma.org/