Speech synthesis

girthrub · June 13, 2023, 2:46pm

Hi all,

I’ve been messing around with a project where among other things I’d ideally have a synthesized voice recite a poem (or some text, anyway), and am wondering what the best approach to this would be. I’ve looked into e-speak and mbrola but those are both getting old - this isn’t a problem as far as the sound quality is concerned but more in terms of getting it to compile and interface with SC (given my very limited skills with C++). I’ve also seen oddvoices but I’m not really that interested in singing voices (and I’m not sure if there is a way to emulate speaking voices by using glissandi within a very limited range - in midi, too)…

So I’m curious what else is out there that I might not be aware of?

Seems like there should be something, given how widespread the technology is already (e.g. the ghastly tiktok text to speech, or people doing deepfakes of Bod Bylan singing the navy seals copypasta, etc. etc.)

What I think I need:

Some way to fine tune the result (fix some pronunciations, accent, timing) - this is why I think I won’t get away with using the built in Macos text-to-speech functionality. Perfectly fine (or even preferable) on the other hand if the input is phonetic rather than text.
some not overcomplicated way to interface with sc, or at least a well documented API for the comman line.
Maybe - but not as important - something like real-time performance? (Ideally, and probably unrealistically, I imagine something like pattern integration where an Event would consist of phonetic data, timing data and accent data)…

What I don’t need is for the voice to sound realistic or non-glitchy.

Thanks for any suggestions!

tedthetrumpet · June 14, 2023, 4:33am

Here’s my overcomplicated way of doing tts in sc on the mac:

"say hello".unixCmd // test
"say -v '?' ".unixCmd // list voices
"say -v 'Xander' hoe gaat het met u ".unixCmd // voice
"say -r 200 hello and welcome".unixCmd // rate
"osascript -e ' say \"hello\" ' ".unixCmd; // another way
"osascript -e ' say \"algorave generation we love repetition\" using \"Samantha\" pitch 126 ' ".unixCmd; //
Pbindef(\tts, \foo, Pfunc({"osascript -e ' say \"algorave generation we love repetition\" using \"Zarvox\" pitch 26 ' ".unixCmd}), \dur, 4).play

girthrub · June 14, 2023, 9:35pm

Thanks! I’ll have another look at oddvoices but it seems that espeak might be my best bet…

Found a thread on the maxmsp forum where somebody says the following, which sort of summarizes the trap I’ve fallen into here (believing that this should be easily doable):

The prevalence of blackbox (closed, all-in-one, but nowhere open source, or customizable) text-to-speech synthesis solutions in a lot of modern devices may lead you to believe it’s easy to do, but it is not the case if you don’t use said blackboxes.

Although it seems that espeak is already technically capable of doing some of the things I want, the problem there is more at the interface level than the synthesis level (e.g…, getting it to slow down for just that one word…) Oh well…

LuxEtObscuritas · June 18, 2023, 2:22pm

If you look for very precise ways of tuning several speech parameters, maybe articulatory synthesis is one thing to look up for…

If you need a TTS approach then gamaTTS is interesting:

I personally had a hard time installing the editor on macOS but maybe you have a better routine in such things than I do (if you are a mac user). Anyways, the results sound great but I guess here are no options of performing live unlike you just want abstract speech sounds… Have a look at the demo…

I also came across this SuperCollider implementation of another articulatory approach but I didn’t managed to make it work:

Have a good one =)

jordan · June 18, 2023, 2:26pm

If anyone didn’t know…
they can be found here: Models - Hugging Face
… and some of them are very excellent! Many let you choose different voices out the box, but you can’t edit the timbre sonic character outside this. However, they are very fast!

scztt · June 18, 2023, 2:48pm

IIRC there were some bugs in the SuperCollider implementation of this, but even after fixing it sounded to me like something was not working correctly? It was code written specifically for that paper, so possibly it isn’t really ready for more general consumption.

fmiramar · June 18, 2023, 6:02pm

Do you still have the fixed version and could you please share it? I tried to compile it, but there were lots of compilations errors that I could not manage…

scztt · June 19, 2023, 5:35pm

Sorry, I don’t think I do. I’ll look on an old computer when I have a chance… but honestly, I wouldn’t get your hopes up. I spent a good 8 hours on it, had it compiling and fully running and the sound was just not doing what it should have been doing, I was a bit out of ideas.

If it’s interesting, I’m midway through a port of Pink Trombone, which I believe is roughly the same sort of thing… I’ve probably got another few days of work before I’m ready to post a beta, but its working well and sounds incredible

fmiramar · June 19, 2023, 7:04pm

Great news!! Pink Tombrone sounds really realistic indeed! Looking forward to seeing it!

LuxEtObscuritas · June 20, 2023, 5:51pm

So much looking forward!

girthrub · October 4, 2023, 12:53am

Hey scztt, I just thought of this again, did that port of pink trombone come to fruition? Just curious really, I don’t really have time for anything except my dissertation at this point anyway, still it would be so cool…