MFA vs Epitran phoneme sets

76 views
Skip to first unread message

Leandro Graciá Gil

unread,
Aug 11, 2023, 1:47:16 PM8/11/23
to MFA Users
Hi,

I've noticed that some pretrained models come in either/both MFA and CV versions. The first uses the MFA phoneme set, while the latter seems to use the Epitran phoneme set. Yet, both are supposed to be based on IPA.

What are the differences between these two phoneme sets? Is there any way to manually convert from one to the other, in particular from Epitran to the phonemes used by MFA acoustic models?

Thanks.

Eleanor Chodroff

unread,
Aug 14, 2023, 11:31:45 AM8/14/23
to Leandro Graciá Gil, MFA Users
Hi Leandro,

The ones with the Epitran phoneme set (suffixed _cv) are typically just trained on the Common Voice corpus, and were developed by me and Emily Ahn for the project described here: https://aclanthology.org/2022.lrec-1.566/ (OSF repository with pronunciation lexicons). A few of those acoustic models might also reference the XPF corpus. You can find more about the rules for Epitran if you poke around the data/ folder there. 

It looks like Michael McAuliffe has been training acoustic models for several of these languages with even more resources (later releases of Common Voice + additional datasets). I'm not entirely sure how he's developed the pronunciation dictionaries for those, but it's probably a safe bet to assume the corresponding pronunciation dictionary is what he used for training those. 

Eleanor 

--
You received this message because you are subscribed to the Google Groups "MFA Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mfa-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mfa-users/5919fe4c-36b4-433e-a4d0-7aafc7fe96ben%40googlegroups.com.


--

Leandro Graciá Gil

unread,
Aug 15, 2023, 4:19:31 AM8/15/23
to MFA Users
Hi Eleanor,

Thank you very much for your response about the Common Voice models.

I've been taking a look to the exact phonemes used by, for example, the english_mfa acoustic model:

a aj aw aː b bʲ c cʰ d dʒ dʲ d̪ e ej f fʲ h i iː j k kʰ l m mʲ m̩ n n̩ o ow p pʰ pʲ s t tʃ tʰ tʲ t̪ u uː v vʲ w z æ ç ð ŋ ɐ ɑ ɑː ɒ ɒː ɔ ɔj ə əw ɚ ɛ ɛː ɜ ɜː ɝ ɟ ɡ ɪ ɫ ɫ̩ ɱ ɲ ɹ ɾ ɾʲ ʃ ʉ ʉː ʊ ʎ ʒ ʔ θ

While it is IPA-based, there seem to be some differences between these and the results produced by Epitran for some simple English sentences (if I'm not mistaken, the IPA column here). I'm wondering if these differences are simply because some phonemes returned by Epitran are missing in some MFA acoustic models, or rather because of small notational differences. I remember reading somewhere that the MFA phoneme set is a variation of IPA, which would suggest the latter.

My goal here would be to find out if there is some way to use Epitran as a quick OOV fallback that can also be used later with MFA acoustic models, but for this the phoneme sets must match. (I'm aware there's "mfa g2p" to handle OOVs, but for various reasons using Epitran would be more convenient in some of my use cases). If I knew what the notational differences are (if there are any), since both are based on IPA it might be possible to build a mapping between the two.

Another option would be to use CV models since they use Epitran, but it seems there are no acoustic models for some of the languages I need.

Regards,
Leandro

Reply all
Reply to author
Forward
0 new messages