Comparison of 2 'wav' files

HuaMin

unread,

Jun 16, 2009, 6:22:03 AM6/16/09

to

Hi,
What's the right mechanism for comparing that, to check if that is for the
same sound or not? Is it better to choose C++ instead of C# for doing this?

Bob Masta

unread,

Jun 16, 2009, 7:02:29 AM6/16/09

to

On Tue, 16 Jun 2009 03:22:03 -0700,
=?Utf-8?B?SHVhTWlu?=
<Hua...@discussions.microsoft.com> wrote:

It's not clear what you are looking for.

If you just want to see if the files are
identical, any byte-by-byte comparison scheme will
work perfectly.

If you want to know if two files are different
recordings of the same sound (different mic types
and/or positions, different start-stop times,
etc.) then the job is *much* harder. You would
probably want to do FFTs (say, 1024 samples each)
followed by some sort of feature extraction.

If you have reason to believe that the recordings
are identical except for a time shift, then
correlation techniques may be the best choice.

You will not get a yes/no result from these FFT or
correlation methods. You will have to apply some
threshold based upon your requirements and
experience to estimate the likelihood of a match.
Definitely not a simple project!

Best regards,

Bob Masta

DAQARTA v4.51
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!

HuaMin

unread,

Jun 23, 2009, 1:12:01 AM6/23/09

to

Many thanks Bob. Sorry for my late reply. Is there any existing way (like the
correlation techniques) that compares the sound from different people, for
instance, different pronounciation from different people?

Bob Masta

unread,

Jun 23, 2009, 8:07:57 AM6/23/09

to

On Mon, 22 Jun 2009 22:12:01 -0700,
=?Utf-8?B?SHVhTWlu?=
<Hua...@discussions.microsoft.com> wrote:

>Many thanks Bob. Sorry for my late reply. Is there any existing way (like the
>correlation techniques) that compares the sound from different people, for
>instance, different pronounciation from different people?
>

I haven't looked lately, but I'll bet you can find
a huge body of work by searching for "speech
recognition" plus "technique" or "algorithm", etc.

The last I checked, this was still regarded as a
hard problem to solve. I suspect the best speech
recognition software uses multiple techniques,
with plenty of "fudge factors" based upon test
results.

If you are doing basic research on pronunciation,
your job may be somewhat easier if you can have
each subject utter the same short phrase, word, or
even single syllable. Then you can align the
starts and use a series of short overlapping
spectra to watch the development of the sounds.

My Daqarta software can show color spectrograms of
real-time sounds. But to get highest time
resolution (high overlap) it's best to record the
sound first. See
<http://www.daqarta.com/dw_sgram.htm>
for speech examples.

It won't cost you a thing to try, and if you don't
need live input you can avoid the US$29 purchase
price altogether: After the 30-day/30-session
trial expires, the inputs stop working but you can
still analyze files.

Daqarta only shows one spectrogram at a time, not
side-by-side comparisons of different subjects.
(Though I suppose if you were motivated you could
splice two short utterances from different
subjects so they appeared sequentially in the same
file.) But even if this is not the solution to
your problem (which you still haven't explained),
it should give you plenty of insight on the issues
you will need to deal with. You can, for example,
change window functions, overlap, and dynamic
range to see how they affect the spectrogram.

HuaMin

unread,

Jun 24, 2009, 2:17:01 AM6/24/09

to

Any advice?

Chris P.

unread,

Jun 24, 2009, 10:09:17 AM6/24/09

to

On Mon, 22 Jun 2009 22:12:01 -0700, HuaMin wrote:

> Many thanks Bob. Sorry for my late reply. Is there any existing way (like the
> correlation techniques) that compares the sound from different people, for
> instance, different pronounciation from different people?

That would be called voice recognition. What are your exact requirements?
Do you have a commercial need or is this a research project?

--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

unread,

Jul 2, 2009, 11:25:01 PM7/2/09

to

Thanks Chris. It's a commerical need.

HuaMin

unread,

Jul 6, 2009, 12:25:01 AM7/6/09

to

Chris,
Do you have more advice for doing that?

Chris P.

unread,

Jul 6, 2009, 12:54:13 PM7/6/09

to

On Sun, 5 Jul 2009 21:25:01 -0700, HuaMin wrote:

> Do you have more advice for doing that?

Nuance has an SDK that can be licensed from them. I believe it is called
"Nuance Verifier". You will have to contact Nuance sales, they don't sell
it on the web site.

HuaMin

unread,

Jul 10, 2009, 12:12:01 AM7/10/09

to

Many thanks and good day Chris.

I've checked that, one product from that is 'Dragon Naturally speaking'. As
its technical support does need a valid registry key, I can't use that
service. Do you know which product actually is for comparing 2 sounds.

How about the idea to store a sound like a sequence of sme stuff?

HuaMin

unread,

Jul 10, 2009, 3:41:01 AM7/10/09

to

Chris,
I remember that there's a way to transfer a 'wav' file into something that
can be stored into the PC. How about that way?

"HuaMin" wrote:

> Many thanks and good day Chris.
>
> I've checked that, one product from that is 'Dragon Naturally speaking'. As
> its technical support does need a valid registry key, I can't use that
> service. Do you know which product actually is for comparing 2 sounds.
>

> How about the idea to store a sound like a sequence of some stuff?

Chris P.

unread,

Jul 10, 2009, 10:30:14 AM7/10/09

to

On Thu, 9 Jul 2009 21:12:01 -0700, HuaMin wrote:

> I've checked that, one product from that is 'Dragon Naturally speaking'. As
> its technical support does need a valid registry key, I can't use that
> service. Do you know which product actually is for comparing 2 sounds.
>
> How about the idea to store a sound like a sequence of sme stuff?

Are you comparing sounds or voices? Are you trying to authenticate the
voice or match it to a phrase?

If you are comparing speech phrases, then comparing sounds is not enough.
You have to break the speech down into phonemes and then compare this to
what you've stored.

If you are voice printing for speaker identification then this is a whole
different challenge. This requires recognizing the high frequency
variations in signal.

Nuance has products that do both of these tasks, you will have to contact
their sales to get evaluation software.

HuaMin

unread,

Jul 15, 2009, 6:39:01 AM7/15/09

to

Thanks Chris. I do expect to store the sound file of every one word of a
phrase, and further to validate/detect the sound of the words from the speech
against the stored sound files. Is there any example for this?

HuaMin

unread,

Jul 16, 2009, 2:48:01 AM7/16/09

to

Any advice?

Chris P.

unread,

Jul 16, 2009, 6:08:47 PM7/16/09

to

On Wed, 15 Jul 2009 03:39:01 -0700, HuaMin wrote:

> Thanks Chris. I do expect to store the sound file of every one word of a
> phrase, and further to validate/detect the sound of the words from the speech
> against the stored sound files. Is there any example for this?

There are no examples of this. This is Ph.D level research, not something
you are going to find source code for laying around. Use one of the
available commercial products for phonetic comparison, it's really your
only choice.

HuaMin

unread,

Sep 21, 2009, 9:52:01 PM9/21/09

to

Many thanks Chris and I do understand the difficulty of this. But I still
expect to have some hints for the way instead of using existing products.

HuaMin

unread,

Sep 28, 2009, 11:36:01 PM9/28/09

to

Any advice?

"HuaMin" wrote:

> Many thanks Chris and I do understand the difficulty of this. But I still

> expect to have some hints for the way instead of using any existing products.