自动回复: [kaldi-help] Monophone Training with TED-LIUM Dataset in Kaldi

Message has been deleted

1060147127

unread,

Sep 27, 2024, 6:26:44 AM9/27/24

to Jayenthiran Pukuraj

这是来自QQ邮箱的假期自动回复邮件。

您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

Abraham Nyongesa

unread,

Oct 1, 2024, 2:26:54 AM10/1/24

to kaldi...@googlegroups.com

Start with minilibrispeech. It is easier

On Fri, 27 Sept 2024, 1:26 pm Jayenthiran Pukuraj, <jaya...@gmail.com> wrote:

Iam a new bie.

I am currently working on a speech recognition project using the TED-LIUM dataset with the Kaldi toolkit.

However, I am encountering challenges while attempting to train a monophone system and would greatly appreciate any insights or assistance you could provide.

### Steps I've Taken So Far:

1. Data Preparation:
- Data Directory Structure:
- `data/lang` contains the following files: `L.fst`, `oov.int`, `oov.txt`, `phones`, `phones.txt`, `words.txt`.
- `data/train` includes: `cmvn.scp`, `data`, `frame_shift`, `log`, `segments`, `split1`, `stm`, `utt2dur`, `utt2spk`, `conf`, `feats.scp`, `glm`, `reco2file_and_channel`, `spk2utt`, `split4`, `text`, `utt2num_frames`, `wav.scp`.

2. Feature Extraction:
- After preparing the data, I proceeded to feature extraction steps. However, during this phase, I encountered an issue where the `final.mdl` file is missing.

LOG:

/kaldi/egs/tedlium/s5# ./testdecode.sh output1.wav

utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/my_audio
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 1 --cmd run.pl data/my_audio exp/mono/decode/log mfcc
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/my_audio
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for my_audio
steps/compute_cmvn_stats.sh data/my_audio exp/mono/decode/log mfcc
Succeeded creating CMVN stats for my_audio
steps/decode.sh --config conf/decode.config --nj 1 --cmd run.pl exp/mono/graph data/my_audio exp/mono/decode
steps/decode.sh: Error: no such file exp/mono/final.mdl
Transcription not found. Something went wrong during decoding.

3. Training Command:
- I then attempted to run the monophone training command:
```bash
steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono
```
- The log indicates the process is "Initializing monophone system," but it does not proceed further, and no `final.mdl` file is created in `exp/mono`.

4. Validation Issues:

- I noticed warnings during data validation, particularly stating that I have only one speaker, which might not be optimal. Additionally, the `data/lang` directory lacks the `spk2utt` file.

~/kaldi/egs/tedlium/s5# utils/validate_data_dir.sh data/train
utils/validate_data_dir.sh: Successfully validated data-directory data/train

(base) root@ip-172:~/kaldi/egs/tedlium/s5# utils/validate_data_dir.sh data/lang
utils/validate_data_dir.sh: no such file spk2utt

(base) root@ip-10:~/kaldi/egs/tedlium/s5# steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono
steps/train_mono.sh: Initializing monophone system.

5. Log Insights:

- I’ve checked the logs from `exp/mono/train_mono.log` and observed no significant errors, but the training process halts after initialization.

6. Memory and Resources:
- My system has approximately 7.6 GiB of RAM, and I’ve ensured that all required files and configurations are in place.

Request for Help:
Could anyone provide guidance on how to resolve these issues? Is there a specific format or additional data I should be preparing? Also, are there any community resources or best practices for working with the TED-LIUM dataset in Kaldi?

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/73928a5d-44e1-46e2-9707-dd041ed61961n%40googlegroups.com.

Reply all

Reply to author

Forward