Using visuals as midi for Supercollider

Hi there,
I am having difficulty finding information online about SC reacting to visuals on screen. The opposite of what I’m looking for (visuals created from audio input) by the likes of is easy enough to find/understand.

However, I am looking to use SC and whatever other programs in order to compose music from a short piece of film. For example, there are apps/softwares that can use a camera in order to track movement which is translated into midi data (essentially a theremin) but mine would be inputting a film and outputting sounds based on that.

Admittedly this might be far too ambitious but even if it could read something from the video (such as colour or movement) and then react accordingly (changing timbre, frequency).

Any advice would be great! Apologies for such a long question, cheers!

you could find the average brightness of each frame using processing, and then map those values to an amplitude envelope inside supercollider?

Starting to think that my issue might be better directed to a processing forum by your reply. Have you any experience with the marriage of both programs? Do you think that processing is probably the best software for the job?

I use both programs, but have never conjoined them in holy matrimony in this way. would be an interesting experiment though.

it depends on what exactly you are trying to do. Processing is free, and learny, which makes it an attractive option for me.

if you can save the frames as sequentially named .png or .jpg files in a folder, it wouldn’t be very hard to get processing to find the average pixel brightness of each frame. then it is just a matter of getting the values into supercollider, and using them to make an envelope that controls something - again not rocket science, so long as you have a feeling about what you might want it to do.

imo, tracking movement would be prohibitively difficult in Processing.

Thank you for this great response. Agreed on processing, I’ve not really used it but it looks like a sibling to SuperCollider.
I’d never considered doing it frame by frame rather than sort of treating the film as colour/object tracking (which I know processing can do but I can’t necessarily do it in processing haha).

I’ll try and see how I do over the next few weeks and might even post some progress/ideas/questions here!

1 Like

I have used Pure Data with GEM extensions for some (really basic, almost childishly rudimentary) video analysis and sending the results to SC using OSC messages. See for example

GEM, btw, seemed to be a mostly dormant project for some years, but has recently had a new release that is much easier to install than older releases. It’s a reasonable way to go.


1 Like

I think a typical approach would be to write the video analysis part in a language like python or even c++, and then send OSC from that program to supercollider. Doing video analysis directly in supercollider probably would be very challenging, and I don’t expect it would be performant enough (depending on the complexity of what you’re trying to accomplish).

1 Like

Hey there,

I sat and tried to do this all morning. I’ve used PD but not the extended version (needed for GEM). Sadly since I’m using a windows 10 laptop and quicktime is no longer supported, PD Extended won’t even let me open videos in any codec. Any ideas on how to rectify it?

This is similar to what I’m looking to do. (although I’d have liked to push it further), any ideas on how I could achieve this with the likes of Python?

Totally agree. Use processing for visual analysis if you are comfortable with it and then send whatever info you need mapped to audio via OSC.

As for “sensing” the image, there’s many ways to do it. The easiest approach might be getting the pixel data (either from a camera, video or image) parsed and processed in processing and then send the values you need via OSC to SC.

My method uses jitter, but Processing would pbly work as well. In addition to just fetching the avg brightness, it’s good to break up the screen into a grid. Doesn’t have to be super fine, but the more squares in the grid, the more you can make use of the geometrics of your input video. Then fetch RGB intensities for each square separately, and send to SC via OSC messages. Used that method for a bunch of experiments, works very well.

There are very good higher-level machine vision and tracking libraries for Processing, OpenFrameworks, Cinder, and Jitter. It’s really worth spending some time searching for these, lest you enter the endless rabbit hole of brewing your own video tracking (though this may be the goal too!)

You don’t need pd-extended. AFAICS, pd-extended hasn’t been supported in some time, so if you are using it, probably you are using an old version and you’re likely to hit problems.

I’d suggest to use the latest available pd-vanilla[1] and add the latest available GEM[2] to it.

[1] – currently 0.49-1 dated 10/2018

[2] currently

I could install the latest GEM easily in Windows 10. Not sure about video support, but I’d guess that the latest GEM (released 3/2019 – really, don’t use any older versions than this!) has much better video support.


I did this by using processing with a certain blob library (can’t look up the specifics now - not at home) and used OSC messages from processing to supercollider to control the sound.

Yes it is ecco the dolphin controlling sc :stuck_out_tongue:

1 Like

This is insanely great, holy hell. I’d love to know more about how you did this if possible?

I just came back from the Vatican.

But yes thank you for the enthusiasm :slight_smile:

Unfortunately, I don’t have the code anymore, as I made it probably around 10 years ago. It shouldn’t be that hard to implement however. Pretty sure I’m using this Processing library for the blobs: And then sending the detected blob boundingboxes’ width/height, x/y data to supercollider with OSC to controll synth parameters.