Is there any support in SC for doing speech synthesis ( creating voices for talking rather than singing ). With Text to Speech (TTS) applications getting better now, am wondering if there are any UGens in SC related to this.
that reminds me of a question of mine from some one-two years back (Speech synthesis). Basically the short answer back then was
a) maybe pinktrombone, but there is no sc port available
b) the TTS advances are mostly “AI” generated, so can’t be as straightforwardly ported to SC as conventional dsp and will probably be tricky real time? Maybe this has changed since then.
there are some low latency open source text to speech tools that could maybe be used… but something like pink trombone seems super interesting… there is apparently a cpp port:
hmmm
Thanks all for the info, and looks like @girthrub has already asked the question. Basically, I’d just like to be able to synthesize vocal sounds in a programming language context like SC, and write routines to manipulate vocal parameters ( even if the vocal sounds are somewhat robot-like and not as polished as what AI TTS does ). An advantage of using SC would be the ability to combine with other Ugens for interesting effects. Otherwise, I guess I could look into C++ though am not as fluent with that.
I hadn’t heard of Pink Trombone, but seems there may be an SC port ( though not sure if working )
I also noticed there was a thread on a Vowel class so maybe working with that and the Formant Ugen might be something to look into.
ah nice, I was somehow firmly convinced that there was no sc port around, no idea why
I’ve used FormantTable from sc3-plugins along with PM synthesis to produce vowels that sound a lot better than they have any right to (since it’s “just” PM).
hjh
yep it works though repo owner calls it a work in progress and says they would like to expose more params to sc…
Thanks everyone for your help with some things to explore!
I’ll be watching this thread with interest, as speech-synthesis is something I’ve long been fascinated with myself.
One thing that’s worth considering. If you need intelligable words, it’s not just a question of modelling or otherwise approximating and parameterising the vocal tract, you’ll need some kind of text-to-phoneme-stream conversion, paired with preset parameter settings for vowels, consonants etc.
This is probably not the kind of thing SuperCollider would be good at, but I’d be interested to see someone have a stab at it, to control, say the PinkTrombone vocal synthesiser.