end of utterance detection for kaldi online2-tcp-nnet3-decode-faster

Narjes Bozorgi

unread,

Mar 30, 2021, 11:24:41 AM3/30/21

to kaldi-developers

Hello

I have a pre-trained nnet3 model that I am using for online2-tcp-nnet3-decode-faster. At the client side, I am using python-client with following format:

TCP-client:

open(file)

socket.sendall(packets)

data=socket.recv(packets)

txt=data.decode("utf-8")

The problem is that the end of utterance at each refresh period is not detectable.

For both AsPIRE and our model the outputs from sox or python -client program doesn't include " or ' as Kaldi-asr.org explained:

The TCP protocol simply takes RAW signal on input (16-bit signed integer encoding at chosen sampling frequency) and outputs simple text using the following logic:

each refresh period (output-freq argument) the current state of decoding is output
each line is terminated by ''
once an utterance boundary is detected due to endpointing a '
' char is output

Is there any option or any configuration needs to be set to activate this protocol?

Hossein Hadian

unread,

Mar 31, 2021, 12:09:57 PM3/31/21

to kaldi-de...@googlegroups.com

Have you set --endpoint.silence-phones correctly when running the server?

It might help if you show the full command you use to run the server.

--Hossein

--
visit http://kaldi-asr.org/forums.html to find out how to join.
---
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/b7053d16-f030-4046-a29e-98896efeb150n%40googlegroups.com.

Narjes Bozorgi

unread,

Apr 2, 2021, 1:22:15 PM4/2/21

to kaldi-developers

Thank you. Yes I am using --endpoint.silence-phones at online-config file. The server command is:

online2-tcp-nnet3-decode-faster --samp-freq=8000 --frames-per-chunk=20 --output-period=1 --read-timeout=1 --extra-left-initial=0 --frame-subsampling-factor=3 --min-active=200 --max-active=7000 --beam=15.0 lattice-beam=6.0 --acoustic-scale=1.0 --port-num=5050 --config=<pth>/online.conf final.mdl HCLG.fst words.txt

Reply all

Reply to author

Forward