ASR Model Training for Air Traffic Control (ATC) Communication with Kaldi/Vosk

33 views
Skip to first unread message

jass S

unread,
Jun 30, 2025, 3:35:31 PMJun 30
to kaldi-help

Hello 

I have recently started a project to train Automatic Speech Recognition (ASR) models for Air Traffic Control (ATC) communication, leveraging Kaldi for domain-specific acoustic model development. While Vosk is intended for the ASR inference process, I believe Kaldi can help create a more appropriate acoustic model for this specialized domain.

Currently, I am considering two primary approaches for training:

  1. Approach 1: Kaldi TDNN Fine-tuning/Augmentation

    • Base Models: Utilize existing LibriSpeech and/or Mini-LibriSpeech trained TDNN models (after completing their base training, if necessary).

    • Customization: Fine-tune or augment these models with my specific ATC corpora.

    • My Data: This custom data consists of approximately 3-8 hours of WAV files and corresponding transcripts (.txt), recorded under non-ideal acoustic conditions.

  2. Approach 2: Vosk Pre-trained Model Fine-tuning

    • Base Models: Use the pre-trained Vosk models (e.g., vosk-model-en-us-0.22 and/or vosk_model-small-en-us-0.15).

    • Customization: Fine-tune these Vosk models directly using my custom ATC data (3-8 hours of WAV files and transcripts).

In both approaches, I am also considering incorporating additional ATC-related data, such as ATCO2, for further fine-tuning.

I would greatly appreciate your expert input on the following:

  • Are these proposed approaches feasible for achieving the desired outcome?

  • What are the potential drawbacks or challenges associated with each approach?

  • Do you recommend any alternative or complementary strategies for this task?

Thank you for your time and insights.


Reply all
Reply to author
Forward
0 new messages