Is MFA confused by corpus of multiple speakers but with one-to-many ratio?

93 views
Skip to first unread message

Hovsep Deovletian

unread,
Oct 5, 2022, 2:26:35 AM10/5/22
to MFA Users
Hello

I have a corpus of 10k wav+textgrid pairs. The 10k pairs are said by around 9 speakers (so it's not equal sets of files per speaker). For example

> s01_sentence1.wav, s01_sentence2.wav,  s01_sentence3.wav, ... s01_sentence999.wav,
> s02_sentence1.wav, s02_sentence2.wav,  s02_sentence3.wav, ... s02_sentence1100.wav,
...
> s10_sentence1.wav, s10_sentence2.wav,  s10_sentence3.wav, ... s10_sentence1000.wav,

But when I run MFA to train a model with this corpus, MFA assumes that all the files are said by the same speaker:

>  Found 1 speaker across 10909 files, average number of utterances per speaker: 10909.0

So it uses only one job instead of 3.

> WARNING - Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs.

Is this warning a true problem-problem? And if so, is there a way I can tell MFA that some subset of files have the same speaker, so that it can do speaker adaptation and jobs right?

Thea Knowles

unread,
Aug 8, 2023, 4:12:01 PM8/8/23
to MFA Users
I was running into speaker adaptation issues as well, and realized that the standard MFA align options were not correctly identifying the number of speakers based on directories or file names (it would detect 1 or 2, but never the full set; I'm still not sure what info was being used).

For me the solution was this: If you ensure that the tiers in each of your input textgrids are named with the appropriate speaker id, speaker adaptation seems to get correctly toggled. See my related issue here: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/669#issuecomment-1668621720

Reply all
Reply to author
Forward
0 new messages