accent identification

288 views
Skip to first unread message

ras.m...@gmail.com

unread,
Jun 21, 2018, 4:39:51 PM6/21/18
to kaldi-help
Hi
i want to use kaldi i_vector for accent identification.
how can implement in kaldi ?
which egs is useful and free?

David Snyder

unread,
Jun 21, 2018, 4:50:05 PM6/21/18
to kaldi-help
Look at the egs/lre07/v1 recipe for a traditional i-vector-based language recognition system. It's an older method that is no longer state-of-the-art, but it's easy to set this up, and should be a respectable starting point.

As far as I know, there aren't any free training resources for accent or language recognition. Usually the resources are purchased through the LDC, or acquired by participation in a NIST evaluation.

ras.m...@gmail.com

unread,
Jun 22, 2018, 3:39:23 PM6/22/18
to kaldi-help
thanks
1- I want to use articulatory features instead of MFCC (output of DBN network ). how can transform this feature to arc KALDI ?
2- what are differents of data preparation of KALDI (kaldi-asr.org/doc/data_prep.html) in the lre07 dataset? (since I don't have the lre07 dataset to run the script). IS it possible to share of some data preparation of this dataset? (I run FARSDAT egs on KALDI)
3- after data preparation on my own dataset, how can run the script of the lre07 on my own dataset?

ras.m...@gmail.com

unread,
Jun 25, 2018, 7:02:32 AM6/25/18
to kaldi-help
Hi
no answer? no idea?

allab...@gmail.com

unread,
Jul 10, 2018, 5:23:51 AM7/10/18
to kaldi-help
hey Dan
really no answer about questions?
weak questions or  no time or ... ?

David Snyder

unread,
Jul 10, 2018, 2:12:11 PM7/10/18
to kaldi-help
I'm not Dan, but I'll comment on some of this. 

1- I want to use articulatory features instead of MFCC (output of DBN network ). how can transform this feature to arc KALDI ?

Yes, it's quite possible to transform arbitrary features from an external system into a format Kaldi can understand. You'll want to do something like this:

1. Look at the format of Kaldi features. For example, if you have MFCCs in data/train/feats.scp, you can run copy-feats scp:data/train/feats.scp ark,t:feats.txt to transform the features into a plain text format that you can read.
2. Once you understand the format the features need to be in, write some script that transforms your DBN features into this plain text format, e.g., save it to your_feats.txt. 
3. Now you'll want to transform your features into the ark/scp format that Kaldi typically uses. You can do something like copy-feats ark,t:your_feats.txt ark,scp:your_feats.ark,your_feats.scp to do that.
4. Create some data directory where your new feats will live. E.g., run utils/copy_data_dir.sh data/train data/train_your_feats. Then copy your_feats.scp into data/train_your_feats/feats.scp
5. Now you should be able to use data/train_your_feats to train your UBM or i-vector extractor

2- what are differents of data preparation of KALDI (kaldi-asr.org/doc/data_prep.html) in the lre07 dataset? (since I don't have the lre07 dataset to run the script). IS it possible to share of some data preparation of this dataset? (I run FARSDAT egs on KALDI)
3- after data preparation on my own dataset, how can run the script of the lre07 on my own dataset?

Yes, you can adapt this recipe to your own data. One difference between this and other recipes is that you'll need an utt2lang file. This is like the utt2spk files, but provides a mapping from utterance ID to language ID. Other than that, the data prep should be pretty similar to any of the other recipes (simpler than any ASR recipe, since you don't need any transcribed data, or any segments file).

allab...@gmail.com

unread,
Jul 12, 2018, 1:02:52 PM7/12/18
to kaldi-help
Hi David
glad to answer .
first of all we want to run by MFCC, follow lre07.
we put prepare file in data/tran and data/lre07 as test set.
and suddenly close terminal.
the output of terminal is:
id/extract_ivectors.sh --cmd run.pl  --mem 3G --nj 50 exp/extractor_2048 data/train_lr exp/ivectors_train
lid
/extract_ivectors.sh: extracting iVectors
lid
/extract_ivectors.sh: combining iVectors across jobs
lid
/extract_ivectors.sh: computing mean of iVectors for each speaker and length-normalizing
lid
/extract_ivectors.sh --cmd run.pl  --mem 3G --nj 30 exp/extractor_2048 data/lre07 exp/ivectors_lre07
lid
/extract_ivectors.sh: extracting iVectors
lid
/extract_ivectors.sh: combining iVectors across jobs
lid
/extract_ivectors.sh: computing mean of iVectors for each speaker and length-normalizing


what's wrong? how can fix it

David Snyder

unread,
Jul 12, 2018, 1:06:45 PM7/12/18
to kaldi-help
There's nothing in this terminal output to indicate an error.

If an error occurred, there will be more details in the log files. Look in exp/ivectors_lre07/log/*

allab...@gmail.com

unread,
Jul 12, 2018, 2:03:19 PM7/12/18
to kaldi-help
thanks
Nothing wrong in logs.!!!

for more details.
 my train set contains  448 waves, every wave about 10 utterances and totally 3944 utterances, every utterance about 3 seconds.
and the test set contains 34 waves, and totally 287 utterances.
and have 10 accents. and numbers of them are:
2000 TEHRANI
    448 TORKI
    288 KHORASANI
    260 ESFAHANI
    251 SHOMALI
    212 JONUBI
    193 KORDI
    144 YAZDI
    144 LORI
     54 BALUCHI

is it possible a bug in code with my dataset?

Daniel Povey

unread,
Jul 12, 2018, 2:41:54 PM7/12/18
to kaldi-help
If the terminal suddenly closed, maybe something went wrong at the system level, like it use up memory (but that stage doesn't use too much memory), or filled up your system disk (but that stage doesn't write too much data0.

Or maybe you just had a bad connection.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e972dd5a-41b5-4d44-8030-d6b69a753be4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages