Is there any support in SC for doing speech synthesis ( creating voices for talking rather than singing ). With Text to Speech (TTS) applications getting better now, am wondering if there are any UGens in SC related to this.
that reminds me of a question of mine from some one-two years back (Speech synthesis). Basically the short answer back then was
a) maybe pinktrombone, but there is no sc port available
b) the TTS advances are mostly āAIā generated, so canāt be as straightforwardly ported to SC as conventional dsp and will probably be tricky real time? Maybe this has changed since then.
there are some low latency open source text to speech tools that could maybe be used⦠but something like pink trombone seems super interesting⦠there is apparently a cpp port:
hmmm
Thanks all for the info, and looks like @girthrub has already asked the question. Basically, Iād just like to be able to synthesize vocal sounds in a programming language context like SC, and write routines to manipulate vocal parameters ( even if the vocal sounds are somewhat robot-like and not as polished as what AI TTS does ). An advantage of using SC would be the ability to combine with other Ugens for interesting effects. Otherwise, I guess I could look into C++ though am not as fluent with that.
I hadnāt heard of Pink Trombone, but seems there may be an SC port ( though not sure if working )
I also noticed there was a thread on a Vowel class so maybe working with that and the Formant Ugen might be something to look into.
ah nice, I was somehow firmly convinced that there was no sc port around, no idea why
Iāve used FormantTable from sc3-plugins along with PM synthesis to produce vowels that sound a lot better than they have any right to (since itās ājustā PM).
hjh
yep it works though repo owner calls it a work in progress and says they would like to expose more params to scā¦
Thanks everyone for your help with some things to explore!
Iāll be watching this thread with interest, as speech-synthesis is something Iāve long been fascinated with myself.
One thing thatās worth considering. If you need intelligable words, itās not just a question of modelling or otherwise approximating and parameterising the vocal tract, youāll need some kind of text-to-phoneme-stream conversion, paired with preset parameter settings for vowels, consonants etc.
This is probably not the kind of thing SuperCollider would be good at, but Iād be interested to see someone have a stab at it, to control, say the PinkTrombone vocal synthesiser.
this seems somewhat doable - not sure how realistic the ultimate quality can get but worth a stabā¦
Regarding PinkTrombone: This person here extended the original version in various ways (including an interface for text-input), but itās mostly written in JavaScript. The current version of the SC port works well, but I was unable to approximate the parameter combinations needed for certain phonemes. There is some text-to-speech demo video out there, but its far beyond the quality of current neural text-to-speech systems/models when it comes to intelligibility at least.
Back in the day there was Speech which was rather limited and is also deprecated now. So if youāre looking explicitly for text-to-(intelligeble)-speech with modulatable parameters, there currently seems to be no optionā¦
Interesting! My aim, personally isnāt to create super-realistic speech. I like obviously synthetic speech, and vocal-like sounds. That said some kind of text-to-phoneme script would be great.
Iāll probably be writing some kind of script for the Norns audio computer, so if I come up with anything in SC, it will likely be the sound-generation part only, controlled by a Lua script for Norns.
Iām reasonable au-fait with JavaScript, so I may be able to convert relevant control code from JS to Lua.
Incidentally, does anyone happen to know what the āVO-6ā speech-synch engine in the Elektron Monomachine is derived from?
Claude identified some of the parameters not accessible in the port - one was āvelarā IIRC - might be worth vibe-coding the port to add these⦠Iāll try to remember to have a whack⦠these might be useful: Voximplant Docs
I have a bit of an obsession with SpeakānāSpell/Texas Instruments-style LPC speech. I wonder if I could use Teachable Machines to find parameter settings for standard English phonemes.
The nice thing about LPC speech resynthesis is you can really f*ck it up in interesting ways by feeding random values to the model.
Well, there is still Speech synthesis | Apple Developer Documentation - looks like a fun project/plugin to write, using e.g. plugin commands to send the string.
Please note that GitHub - v7b1/mi-UGens: some mutable instruments eurorack modules ported to SuperCollider Ā· GitHub has a port of Plaits, which has
A collection of speech synthesis algorithms (formant filter, SAM, LPC), with phoneme control and formant shifting. Several banks of phonemes or segments of words are available.
(
Ndef(\x, {
MiPlaits.ar(
pitch: 45,
engine: 7,
harm: \harm.kr(0.1, spec: [0.0, 1.0]),
timbre: \timbre.kr(0.5, spec: [0.0, 1.0]),
morph: \morph.kr(0.5, spec: [0.0, 1.0]),
trigger: Impulse.kr(\speed.kr(1.5)),
fm_mod: \fmMod.kr(0.0, spec: [0.0, 1.0]),
timb_mod: \timbMod.kr(0.0, spec: [0.0, 1.0]),
morph_mod: \morphMod.kr(0.0, spec: [0.0, 1.0])
) * \amp.kr(0.2)
}).play.gui
)
Iām a bit of an MI fanboy, going back to before they starting making Eurorack modules, so Iām aware of Plaits ![]()