ASR Model Training for Air Traffic Control (ATC) Communication with Kaldi/Vosk

40 views

Skip to first unread message

jass S

unread,

Jun 30, 2025, 3:35:31 PMJun 30

to kaldi-help

Hello

I have recently started a project to train Automatic Speech Recognition (ASR) models for Air Traffic Control (ATC) communication, leveraging Kaldi for domain-specific acoustic model development. While Vosk is intended for the ASR inference process, I believe Kaldi can help create a more appropriate acoustic model for this specialized domain.

Currently, I am considering two primary approaches for training:

Approach 1: Kaldi TDNN Fine-tuning/Augmentation
- Base Models: Utilize existing LibriSpeech and/or Mini-LibriSpeech trained TDNN models (after completing their base training, if necessary).
- Customization: Fine-tune or augment these models with my specific ATC corpora.
- My Data: This custom data consists of approximately 3-8 hours of WAV files and corresponding transcripts (.txt), recorded under non-ideal acoustic conditions.
Approach 2: Vosk Pre-trained Model Fine-tuning
- Base Models: Use the pre-trained Vosk models (e.g., vosk-model-en-us-0.22 and/or vosk_model-small-en-us-0.15).
- Customization: Fine-tune these Vosk models directly using my custom ATC data (3-8 hours of WAV files and transcripts).

In both approaches, I am also considering incorporating additional ATC-related data, such as ATCO2, for further fine-tuning.

I would greatly appreciate your expert input on the following:

Are these proposed approaches feasible for achieving the desired outcome?
What are the potential drawbacks or challenges associated with each approach?
Do you recommend any alternative or complementary strategies for this task?

Thank you for your time and insights.

Reply all

Reply to author

Forward

0 new messages