configuration for 8K input file using PHN_EN_TIMIT_LCRC_N500

81 views
Skip to first unread message

Ma Jambo

unread,
Jun 10, 2015, 8:16:06 AM6/10/15
to phn...@googlegroups.com
Hi, Everyone,
I am using the phoneme recognizer (PHN_EN_TIMIT_LCRC_N500). My input file is 8K sample rate. Does anyone know how to set the parameters for PHN_EN_TIMIT_LCRC_N500.
I post the parameters I have set below, but I don't know whether they are right or not. If you know how to set them, please tell me. Thank you very much.
Regards,
Jambo

[source]

format=lin16

sample_freq=8000

 

[posteriors]

system=LCRC

length=31

add_c0=true

hamming=false

suffix=lop

bunch_size=5

softening_func=none 0 0 0

 

[params]

kind=fbanks

suffix=fea

 

[melbanks]

nbanks=23

lower_freq=64

higher_freq=4000

vector_size=200

vector_step=80

preem_coef=0.0


Matejka Pavel

unread,
Jun 10, 2015, 9:00:07 AM6/10/15
to phn...@googlegroups.com
Hi

There is description of TIMIT which says that the model is trained on 16kHz so it means that you need to pass to the system 16kHz audio file

PHN_EN_TIMIT_LCRC_N500 - 16kHz, 2 block STC, trained on TIMIT, 15 banks, 31 points, the DCT is applied on each temporal vector to reduce its size to 11 values, 500 neurons in all nets

generally I would say do not touch the settings which is in the configuration file, the system was trained with some parameters and if you change it it might be inconsistent.

Run the audio file through the system and look at the labels and listen the audio - is it good?

Best regads
Pavel
--

---
You received this message because you are subscribed to the Google Groups "phnrec" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phnrec+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 

 Ing. Pavel Matejka, PhD      E-mail: mate...@fit.vutbr.cz
 UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~matejkap
 Bozetechova 2, 612 66        Phone:  +420 54114-1283
 Brno, Czech Republic         Fax:    +420 54114-1290

Ma Jambo

unread,
Jun 10, 2015, 7:53:40 PM6/10/15
to phn...@googlegroups.com
Hi, Pavel,
Thank you for your help.
You mean I'd better not to change the parameters. And the system was trained for 16K sample rate files. So if I want to use it for 8K files, I should upsample the files to 16K and then use it as the input file. Do you think I should upsample the files?
Second, yes. I found that this system do better for 16K files. At first, I thought the input file should be 8K so I downsampled the data in TIMIT and got nonsense phonemes. But for 16K files, it gave me good results.
Thank you.
Best regards,
Jamb

在 2015年6月10日星期三 UTC+10下午11:00:07,Pavel Matejka写道:

Pavel Matejka

unread,
Jun 11, 2015, 12:13:22 AM6/11/15
to phn...@googlegroups.com
if you have 8k files you can not use timit models, because it is 16k model. Upsampling will not help because you will not have upper half of the spectra. You have to use other models which are 8k.
Best regards
Pavel

Ma Jambo

unread,
Jun 11, 2015, 2:58:28 AM6/11/15
to phn...@googlegroups.com
Hi, Pavel,
Thank you very much for your answers.
You mean if I have 8K files which are English utterances, I can use other models like PHN_HU_SPDAT_LCRC_N1500. But I think in this way it will only help me to classify different frames into several classes (for how many phonemes the model has). It is not compatible with English phonemes. Am I right?
I want to divide utterances which are spoken in English into different classes (phonemes is a good way). I think it would be better if I can use model for English.
Thank you.
Best regards,
Jambo
在 2015年6月11日星期四 UTC+10下午2:13:22,Pavel Matejka写道:

Matejka Pavel

unread,
Jun 11, 2015, 4:53:44 AM6/11/15
to phn...@googlegroups.com


On 11.6.2015 08:58, Ma Jambo wrote:
Hi, Pavel,

Thank you very much for your answers.
You mean if I have 8K files which are English utterances, I can use other models like PHN_HU_SPDAT_LCRC_N1500. But I think in this way it will only help me to classify different frames into several classes (for how many phonemes the model has). It is not compatible with English phonemes. Am I right?
you are right

I want to divide utterances which are spoken in English into different classes (phonemes is a good way). I think it would be better if I can use model for English.
what are the classes?
any phoneme recognizer might be used as tokenizer with further classification to classes - like dialects, topic .....
so in your case use the Hungarian which you are mentioning - I personally used it for language recognition.

Pavel

Ma Jambo

unread,
Jun 11, 2015, 7:58:11 PM6/11/15
to phn...@googlegroups.com
Hi, Pavel,
I just want to divide an utterance into different phoneme segments and analysis the importance of those phonemes and want to do matching in test and training utterance for speaker ID. For the first task, I think using a model for English is more reasonable. For the second one, using any model as tokenizer as you said model for Hugarian is OK.
Thank you for your answer.
Best regards,
Jambo

在 2015年6月11日星期四 UTC+10下午6:53:44,Pavel Matejka写道:

Matejka Pavel

unread,
Jun 12, 2015, 4:05:37 AM6/12/15
to phn...@googlegroups.com
I would suggest to work with hungarian one. because it is better model then timit one and you need tokenizer. You might be doing better job with english phoneme recognizer, but not trained on timit, it would need to be trained on switchboard or fisher
Pavel
Reply all
Reply to author
Forward
0 new messages