I have a corpus of 10k wav+textgrid pairs. The 10k pairs are said by around 9 speakers (so it's not equal sets of files per speaker). For example
> s01_sentence1.wav, s01_sentence2.wav, s01_sentence3.wav, ... s01_sentence999.wav,
> s02_sentence1.wav, s02_sentence2.wav, s02_sentence3.wav, ... s02_sentence1100.wav,
...
> s10_sentence1.wav, s10_sentence2.wav, s10_sentence3.wav, ... s10_sentence1000.wav,
But when I run MFA to train a model with this corpus, MFA assumes that all the files are said by the same speaker:
> Found 1 speaker across 10909 files, average number of utterances per speaker: 10909.0
So it uses only one job instead of 3.
> WARNING - Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs.
Is this warning a true problem-problem? And if so, is there a way I can tell MFA that some subset of files have the same speaker, so that it can do speaker adaptation and jobs right?