RTNeural real-time neural inference UGen

Dear all,

I have made a new plugin which implements the RTNeural real-time neural inference engine:

A mac universal release build is here:

RTNeural is a SuperCollider UGen which uses the RTNeural inference engine to load and run RTNeural format neural network models at audio and control rates. See the RTNeural github page (GitHub - jatinchowdhury18/RTNeural: Real-time neural network inferencing) for neural network layers supported.

Models can be trained in pytorch or tensorflow. RTNeural was designed with tensorflow/keras-shaped models, but I give examples on how to save Linear, LSTM, and GRU based networks in the correct format.

-Linux users should be able to build with the build instructions.
-HELP - The Readme has instructions for building on PC. I am able to build it on Parallels, but the scx file is corrupted. It think this is a Windows ARM thing. If anyone can build it on PC and gets it working, can you DM me.

Any feedback on this is welcome.

FYI - I have also made a UGen that does real-time inference with onnxruntime. I need to polish this up. It essentially runs the same as RTNeural, BUT loads standard onnx files (which pytorch can save directly) and uses Onnxruntime for inference. The disadvantage is that it is much less efficient (2-10x less efficient). I will polish this up and release it as well very soon. My hunch is that there will be better and better inference engines (onnxruntime is already realtime safe) that may be able to load onnx format, so going this direction could have advantages.

Sam

7 Likes

I realize the question was erased, but two things:

  1. there is a huge help file for this plugin. There are also a bunch of examples in python in addition to supercollider. The training needs to happen in python for this plugin and I have shown how to do some things.

  2. The plugin is capable are far more than I achieve in the examples. If you can train a network and save it in the right format, it should be able to do inference on the network. I honestly don’t know how this will be useful. Probably someone will find an unexpected use for this. I enjoyed the process of making a general use real-time audio plugin for neural networks, but as with many things ai, the research on how to train the models correctly for audio things is in the nascient stage. There are some nice papers out there with some good ideas!

Sam

2 Likes

Thanks, @Sam_Pluta !!
Quick question, what is the difference between RTNeural and Proteus? Is RTNeural a more general implementation that integrates Proteus?

Best,

José

Yes, exactly. Proteus is an LSTM neural net with a hidden size of 40. All of the Proteus models were trained at 44100 and 16 bit. The inference is handled by RTNeural, and since it is a fixed graph, it is actually more efficient than RTNeuralUGen.

RTNeural, the UGen, can load Proteus networks that have been saved in RTNeural format (I provide a script to convert a pytorch json to an RTNeural (keras) json).

But it can also load any network saved in RTNeural format - it can use Dense, GRU, LSTM, Conv1D, Conv2D, BatchNorm1D, and BatchNorm2D layers and supports most activation functions. So if you train a GRU model with hidden_size 80 at 96K with a 24bit file, RTNeural could load that model and run it. Proteus cannot.

Another way to put it: Tequila is a kind of mescal made using the blue agave cactus, but there are 30 other kinds of mescals made with different cacti, none of which are tequila. Proteus is tequila. RTNeural is mescal.

BTW - I just published a bug fix to both RTNeural and Proteus having to do with SynthDef graph ordering problems. So if you want to use them, you should probably update.

Sam

Thanks for the explanation!
I really liked the comparison with Tequila :slight_smile:

I’m sure I’ll use it!
I’d like to know if it’s possible to get more timbre variation for exemple by changing the model’s parameters? Or do you know if it’s possible to interpolate between models, for example moving from one to another and see what happens in intermediate states or to create a dynamic change in the processing?

Thanks again!

Best,

José