Hi Eleanor,
Thank you very much for your response about the Common Voice models.
I've been taking a look to the exact phonemes used by, for example, the english_mfa acoustic model:
a aj aw aː b bʲ c cʰ d dʒ dʲ d̪ e ej f fʲ h i iː j k kʰ l m mʲ m̩ n n̩ o ow p pʰ pʲ s t tʃ tʰ tʲ t̪ u uː v vʲ w z æ ç ð ŋ ɐ ɑ ɑː ɒ ɒː ɔ ɔj ə əw ɚ ɛ ɛː ɜ ɜː ɝ ɟ ɡ ɪ ɫ ɫ̩ ɱ ɲ ɹ ɾ ɾʲ ʃ ʉ ʉː ʊ ʎ ʒ ʔ θ
While it is IPA-based, there seem to be some differences between these and the results produced by Epitran for some simple English sentences (if I'm not mistaken, the IPA column
here). I'm wondering if these differences are simply because some phonemes returned by Epitran are missing in some MFA acoustic models, or rather because of small notational differences. I remember reading somewhere that the MFA phoneme set is a variation of IPA, which would suggest the latter.
My goal here would be to find out if there is some way to use Epitran as a quick OOV fallback that can also be used later with MFA acoustic models, but for this the phoneme sets must match. (I'm aware there's "mfa g2p" to handle OOVs, but for various reasons using Epitran would be more convenient in some of my use cases). If I knew what the notational differences are (if there are any), since both are based on IPA it might be possible to build a mapping between the two.
Another option would be to use CV models since they use Epitran, but it seems there are no acoustic models for some of the languages I need.
Regards,
Leandro