Hi,
I'm a student using Kaldi for a course project. The ultimate goal is to build a TTS system based off of LPC coefficients. I have been working off of the Tedlium recipe adapted to be using LPC coefficients, and have trained an encoder-decoder network (of the Tedlium configuration forward and reversed). I am hoping to add a frontend that will take text in and feed it into the decoder, and then attach a vocoder to turn the LPC coefficients into audio.
I'm working now on creating a front-end that will take text as input and output either raw features or posterier pdf (ideally to match the output of the Tedlium configuration). I was wondering if this is possible and if you might have suggestions for how to approach this?
I wasn't sure if there might be a way to use the alignment information? I see some scripts to convert text to phoneme sequence but wasn't sure how to go from there.
Any insight or advice in general would be greatly appreciated.
Thanks so much,
Eliezer