Ideal wav length

77 views
Skip to first unread message

Benny Chun

unread,
Dec 10, 2018, 10:50:57 PM12/10/18
to Recognito
Hi,

I'm currently trying to work on an Authentication length,

Was wondering if anyone knows what the ideal wav length is to get a good voice print.

Currently have about four 2-5 second clips for each user.

Thanks in advance!

rbe...@aucklanduni.ac.nz

unread,
Jan 6, 2019, 8:53:46 PM1/6/19
to Recognito
If we are talking strictly the authentication stage (where an initially unknown utterance is compared to the voice prints), 2-5 seconds should be acceptable (emphasis on the latter though).

However, the duration doesn't quite capture the full scope of what would be considered a good utterance. You should also consider the amount of time a speaker is not speaking (pauses between sentences or how quickly one speaks)  as well as the content (a proper sentence with lots of phonemes is a lot better than saying the same word 3 times).

If you are talking about the enrollment stage, you might need more data. You mention 4 recordings between 2-5s for each user. So your range of total recording times could potentially be anywhere between 8 and 20 seconds (and that is before you perform VAD). I would recommend closing that range a bit (at least 10-15s perhaps).

I relatively new to Speaker Recognition, so take my advice with a grain of salt. My estimations primarily come from experience using Recognito as well as other tools (such as ALIZE-LIA_RAL).

Reply all
Reply to author
Forward
0 new messages