This is just a question out of curiosity.
From the relationship between sclang and the audio server (scsynth or supernova), users can imagine using a video server that basically handles animation by playing video files with effects or using images with various processing. (For non-animated still images, I am not sure if they should be handled by the video server).
Even the Pen class could be part of the video server, I think.
However, SuperCollider does not separate this. Users need a cross-application approach with processing, madmax, etc. to do this.
Is there a reason why SuperCollider does not have a video server? Is it because it is focused on audio synthesis? There seem to be some quarks for that. Are they stable? (Some operations seem to be possible via WebView as well.)