end of utterance detection for kaldi online2-tcp-nnet3-decode-faster

88 views
Skip to first unread message

Narjes Bozorgi

unread,
Mar 30, 2021, 11:24:41 AM3/30/21
to kaldi-developers
Hello
I have a pre-trained nnet3 model that I am using for online2-tcp-nnet3-decode-faster. At the client side, I am using python-client with following format:
TCP-client:
open(file)
socket.sendall(packets)
data=socket.recv(packets)
txt=data.decode("utf-8")

The problem is  that the end of utterance at each refresh period is not detectable.
For both AsPIRE and our model the outputs from sox or python -client program doesn't include " or ' as Kaldi-asr.org explained:

The TCP protocol simply takes RAW signal on input (16-bit signed integer encoding at chosen sampling frequency) and outputs simple text using the following logic:

  • each refresh period (output-freq argument) the current state of decoding is output
  • each line is terminated by ''
  • once an utterance boundary is detected due to endpointing a '
    ' char is output
Is there any option or any configuration needs to be set to activate this protocol?

Hossein Hadian

unread,
Mar 31, 2021, 12:09:57 PM3/31/21
to kaldi-de...@googlegroups.com
Have you set --endpoint.silence-phones correctly when running the server? 
It might help if you show the full command you use to run the server.

--Hossein

--
visit http://kaldi-asr.org/forums.html to find out how to join.
---
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/b7053d16-f030-4046-a29e-98896efeb150n%40googlegroups.com.

Narjes Bozorgi

unread,
Apr 2, 2021, 1:22:15 PM4/2/21
to kaldi-developers
Thank you. Yes I am using --endpoint.silence-phones at online-config file. The server command is:

online2-tcp-nnet3-decode-faster --samp-freq=8000 --frames-per-chunk=20 --output-period=1 --read-timeout=1 --extra-left-initial=0 --frame-subsampling-factor=3  --min-active=200 --max-active=7000 --beam=15.0 lattice-beam=6.0  --acoustic-scale=1.0 --port-num=5050 --config=<pth>/online.conf  final.mdl HCLG.fst  words.txt
Reply all
Reply to author
Forward
0 new messages