NEW PITCH DETERMINATION METHOD

Dmitry Terez

nepročitano,

29. svi 2002. 23:46:4129. 05. 2002.

u

I am happy to announce a new robust method for pitch determination.

The method has been applied to speech signals with great success,
but it is general in nature and can be used to determine
fundamental frequency (period) of any (quasi-periodic) signal.

The new method was recently presented at ICASSP 2002 in Orlando, Florida.
("Robust Pitch Determination Using Nonlinear State-Space Embedding",
Pitch Estimation Poster Session, Tuesday, May 14, 2002).

The ICASSP paper and the Matlab demo program are available from:

http://www.soundmathtech.com/pitch

This is a major paradigm shift from traditional linear signal processing
techniques towards using NONLINEAR methods for practical speech processing
tasks.

We have reasons to believe that our new method has a potential to REPLACE
all other presently popular pitch determination methods (like those
based on auto- or cross-correlation, spectrum, cepstrum etc.)

Your constructive comments and questions are welcome!

Best Regards,

Dmitry Terez
SoundMath Technologies, LLC
Millville, New Jersey, USA

Steve Underwood

nepročitano,

30. svi 2002. 01:30:1130. 05. 2002.

u

You covered all the non-essential stuff, like how well it works, how it
works, and some real working code. What about the important stuff. Is it
patented? :-)

Regards,
Steve

Stevie Wonder

nepročitano,

30. svi 2002. 05:46:3130. 05. 2002.

u

Wow, finally something new, congratulations!!!!!!
There is a database from a french guy Alain de Cheiveigne, if you calculate
your error rates you can directly compare your method to others.

I checked your demo and found an VUV error from 2.330s to 2.450s.
It would be fine if you do a button for writing the f0 data to a ascii file,
so checking is easier!

Best,
Steve

"Dmitry Terez" <d...@soundmathtech.com> wrote in message
news:63322c10.02052...@posting.google.com...

spee...@yahoo.com

nepročitano,

30. svi 2002. 12:02:0230. 05. 2002.

u

Could you give some info on the database you mentioned.

I guess that you used that to compare performance for VUV?

Tx

robert bristow-johnson

nepročitano,

30. svi 2002. 13:13:5030. 05. 2002.

u

In article 63322c10.02052...@posting.google.com, Dmitry Terez at

de-constructive comments are unwelcome? :-)

i dunno how you might classify these comments:

1. it does not appear to be an absolutely different paradigm from the
method commonly called "Average Magnitude Difference Function" (AMDF) or
"Average Squared Difference Function" (ASDF), the latter which can be nearly
identically compared to an inverted auto-correlation (the ASDF will always
have a minimum where the auto-correlation has a maximum).

2. i had to look up the term "Heaviside function". it's been too long
since i heard the term used and forgot that all it means is the unit step
function.

3. there appears to be a slight inconsistency of notation. in Eq. (1),
your state space vectors look are denoted x(i) and in Eqs. (2) and (3) the
argument (i and i+k) is subscripted. but i presume they are denoting the
same thing. also to be explicit x(i) is also a function of d (approx. 12?)
and m (approx. 3?)

4. Eqs. (2) and (3) depict the core "correlation" function (if i may
use the term) which map a value "k" to a measure of how good a choice k is
for your period. in those equations is the Euclidean distance function
|x(i) - x(i+k)| is a crude ASDF function ("crude" because only 3 terms are
in the ASDF summation). like the ASDF (or AMDF or even autocorrelation)
function, you are measuring the similarity that s(i) is to s(i+k) by
advancing s(i) by k samples, subtracting, squaring, and adding up the
squared difference for some selected samples. where that difference is
small, your measure of correlation is big and that is why i would not call
it a "major paradigm shift" from ASDF or autocorrelation. it is essentially
separating out different terms of the ASDF summation into groups of 3 and
running that through a non-linear function and adding them back together
again. like x(i), the hist(r, k) functions of Eqs. (2) and (3) are also a
function of d and m. the behavior of hist() will be affected by different
selections of values of d and m.

5. now that might be very useful in some cases (as shown by the
difference of Figs. 2(c) and 2(d)) in that the "true peaks" and "false
peaks" are differentiated more greatly in the histogram than in the straight
autocorrelation, but applying a step nonlinearity to information *does*
cause you to lose information. it throws away information. because of
that, i can (given a value of "r") create two inputs that differ in their
periodicity but come out looking identical in the histogram (whereas they
would not have identical autocorrelations). that's because all matched
distance measures closer than "r" come out equally good in th Eqs. (2) and
(3). a perfect match (distance = 0) comes out the same as a less perfect
match where the distance is less than r.

finally, i am not saying that this method does not have value. it very well
may, particularly applied to speech (my bag is more music). i am just
saying it is a "twist" on correlation, is still time-domain (which is not a
bad attribute at all), and may suffer some of the same false results that
"conventional" PDAs do such as how periodic does it have to be? or, my
favorite PDA problem: given a perfectly periodic signal with frequency 100
Hz and i add to that another perfectly periodic signal at 50 Hz and
amplitude 80 dB less than the 100 Hz component. when that goes into a PDA,
what measure pitch should come out? 100 Hz? 50 Hz? (i guess it depends on
what you want to do.)

--

r b-j

Wave Mechanics, Inc.
45 Kilburn St.
Burlington VT 05401-4750

tel: 802/951-9700 ext. 207 http://www.wavemechanics.com/
fax: 802/951-9799 rob...@wavemechanics.com

--

Clay S. Turner

nepročitano,

30. svi 2002. 13:50:3030. 05. 2002.

u

robert bristow-johnson wrote:

>
> 2. i had to look up the term "Heaviside function". it's been too long
> since i heard the term used and forgot that all it means is the unit step
> function.

Hello Robert,
Besides giving us a named unit step function, Heaviside is the one who
first (IIRC) used complex numbers in A.C. circuit analysis. Plus the
method of solving differential equations using Fourier and Laplace
transforms is known as the Heaviside Calculus.

Two important (useful for DSPers) results are:

(1st one most here already know)

"Heaviside shifting theorem"

multiplication of a Laplace function by e^{-bs)f(s) time shifts the
function -> F(t-b)

(2nd is the Heaviside expansion theorem)

given f(s) = g(s)/h(s)

then F(t) = sum over i of e^{t*s_i}*g(s_i)/h'(s_i)
where s_i are simple isolated zeros of h(s).

Think about the application in finding impulse responses of rational s
equations without using partial fraction decomposition!

For what it is worth.

Clay

p.s. The following link has a good terse biography of Oliver Heaviside.

http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Heaviside.html

--
Clay S. Turner
Wireless Systems Engineering, Inc. 770-641-8293
http://www.WirelessSystemsEngineering.Com (Work)
http://personal.atl.bellsouth.net/lig/p/h/physics/index.htm (Personal)

chang xiaoguang

nepročitano,

30. svi 2002. 14:07:2130. 05. 2002.

u

I will study the continous speech recognition (chinese) base soft
comoputing, I need some wave file . Who can tell me where I can get some
chinese speech data?

than you

robert bristow-johnson

nepročitano,

30. svi 2002. 14:33:2030. 05. 2002.

u

never said i didn't like Heaviside (nor that his name didn't ring a bell - i
have always associated his name with the partial fraction expansion theorem
that gives us the coefs really quickie-dickie). just that i was reading
this paper and wondering "what the hell is this 'H()' function??" and
literally looked it up in Google. at least the answer was a lot easier than
if it were the Jacobi Elliptical function or something as nasty. also, once
i saw the definition, it *did* ring a bell of very low Q (i had probably saw
it called that in my 1st semester Linear Electric Circuits class).

also, i must confess that the time-shifting theorem (#1 below that "most
here already know") i also never remember associating with Heaviside.

In article 3CF66666...@WirelessSystemsEngineering.Com, Clay S. Turner

at CSTu...@WirelessSystemsEngineering.Com wrote on 05/30/2002 13:50:

> p.s. The following link has a good terse biography of Oliver Heaviside.
>
> http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Heaviside.html

i looked that up and it seems to me that Heaviside's "operational calculus"
is, essentially, the application of the Laplace xform onto diff eq. where
"p" is substituted for "s". who came first? Heaviside or Laplace? it also
is gratifying to read about some "uncredentialed outsider" stirring it up
for the academics. good for him.

BTW, i meant to include another deconstructive comment in the previous post:

6. H( r - |x(i)-x(i+k)| ) is the same as

H( r^2 - |x(i)-x(i+k)|^2 ) (i'm assuming that r > 0)

which means the sqrt() operation used in the Euclidean distance is
unnecessary. it also shows more explicitly how this histogram measure
shares a lot in common with ASDF (with some non-linearity operating on
separated groups of terms tossed into the ASDF summation).

also meant to cite explicitly that the author's paper is at
http://soundmathtech.com/pitch/download/terez_icassp02.pdf if people want
to see what it is that i'm talking about.

whatever,

r b-j

Clay S. Turner

nepročitano,

30. svi 2002. 15:32:4730. 05. 2002.

u

robert bristow-johnson wrote:

other stuff snipped

> i looked that up and it seems to me that Heaviside's "operational calculus"
> is, essentially, the application of the Laplace xform onto diff eq. where
> "p" is substituted for "s". who came first? Heaviside or Laplace?

Robert,

Even though Laplace died in 1827 and Heaviside was born in 1850, the
question becomes how much of what we presently called Laplace
transformation was created by Laplace and what is due to Heaviside? I'll
have to go and do some searching to find out. E.T. Bell's classic "Men
of Mathematics" list many things attributed to Laplace but an operator
type of solution (method of reduction of calculus to algebra) was not
mentioned. So I'll have to look furthur.

> it also
> is gratifying to read about some "uncredentialed outsider" stirring it up
> for the academics. good for him.

> r b-j

I agree. This kind of thing (being an uncredentialed outsider or
credentialed in a different field) can bring out some very good
discoveries, since one is not blindfolded by a standard, in the field of
study, paradigm. A famous example is Alfred Wegener, a meteorologist,
who put forth the theory of continental drift, an idea of which the then
geologists could not accept, but is well accepted today.

robert bristow-johnson

nepročitano,

30. svi 2002. 16:14:5730. 05. 2002.

u

>
> robert bristow-johnson wrote:
>
>> it also
>> is gratifying to read about some "uncredentialed outsider" stirring it up
>> for the academics. good for him.

In article 3CF67E5F...@WirelessSystemsEngineering.Com, Clay S. Turner

at CSTu...@WirelessSystemsEngineering.Com wrote on 05/30/2002 15:32:

> I agree. This kind of thing (being an uncredentialed outsider or
> credentialed in a different field) can bring out some very good
> discoveries, since one is not blindfolded by a standard, in the field of
> study, paradigm.

well, i ain't a PhD and i get stuck in certain standards (like "all decent
time-domain pitch detection algorithms eventually boil down to some sorta
AMDF or auto-correlation") and i know that there are academics and PhDs who
think outside the box (there's no stigma in that), but my problem with the
insiders is basically one of hubris and sometimes scholarly integrity. they
worked hard, spent a lot of time and possibly a lot of money, and kissed a
lot of fanny to get their PhD and they don't want to hear about or find out
that there are people who might not have gone through all that and gotten
the "union card" who can do the same scholarly stuff just as well and
sometimes better. this makes if cognitively incongruous for these
particular folk to put themselves in a frame of mind where they can learn
something or accept critique from those without the "union card" and, in the
case of academics, to even work alongside those without the "union card".

now i'm not generalizing this to all PhDs. i have a few friends in the AES
and related industry with PhDs and some are academics. i just am gratified
when someone points it out when the Emperor is naked.

r b-j

Aldebaro Klautau

nepročitano,

30. svi 2002. 15:30:3830. 05. 2002.

u

People looking for source code of pitch estimation methods will find the
following message:

Currently, the demonstration software is provided in the form of several
Matlab P-files. (We are planning to release this demo as an open source
software in the near future).

Aldebaro

Stevie Wonder

nepročitano,

31. svi 2002. 04:23:3131. 05. 2002.

u

sorry, its Alain de Cheveigne
in his YIN paper he mentions 4 Databases and proves that his method is
better than others
http://www.ircam.fr/pcm/cheveign/ps/yin.pdf

I checked the VUV with a spectrogram

2 Databases:
Keele Pitch Database
Paul Bagshaw (p...@cstr.ed.ac.uk) with source code for calculating error
rates

stevie

<spee...@yahoo.com> wrote in message news:3CF64D25...@yahoo.com...

Dave DAVIES

nepročitano,

1. lip 2002. 22:46:4601. 06. 2002.

u

Steve,

This is a very interesting feature you found at around 2.33-3.41sec. Have you
had a look in a time domain plot? The glottal period drops to about half
frequency for about 5 cycles but they are still strong glottal pulses -
stronger than the previous few actually. This is an effect that I have noticed
before in some speakers and wanted to persue - particularly in relation to how
source synchronous systems cope with it.

Glottal movement isn't simple. It has challenged researchers for decades. What
I think might be happening here is that only every second cycle actually closes
completely. The alternate cycles that don't quite close don't produce the burst
we usually see with complete closure. I would like to hear if anyone has
another explanation.

Dmitry, do you have phonemic label data for this file?

I too would be interested in seeing your F0 as an ASCI file. I can put my
version up on our ftp site if there is any interest along with a plot of it and
the time domain section at 2.33s so others can see what we are talking about.

Cheers,

dave

Stevie Wonder

nepročitano,

2. lip 2002. 09:56:2102. 06. 2002.

u

Hi Dave,

Actually around 2.33-3.41sec happens a phenomen called creaky voice, the
pitch
suddenly jumps down one octave and often jumps back up one octave again.

The demo of Dmitry also has a very restricted frequency range 100-400, can't
really check
the performance of the algorithm. This should be at least 60 - 600.
Turning off/on dynamic programming and VUV is also a necessary feature.

"Dave DAVIES" <da...@discus.anu.edu.au> wrote in message
news:3CF98716...@discus.anu.edu.au...

Dave Davies

nepročitano,

3. lip 2002. 06:16:5703. 06. 2002.

u

Stevie Wonder wrote:

> Hi Dave,
>
> Actually around 2.33-3.41sec happens a phenomen called creaky voice, the
> pitch

OK Steve but what is happening physically here?

>
> suddenly jumps down one octave and often jumps back up one octave again.

Not always, or even usually, an octave I think. Using the Fundamental Harmonic
technique
for detecting instantaneous pitch, or more precisely its inverse - glottal
period, the shift
doesn't always seem to be an octave. You can see this in a plot I have put at

ftp://discus.anu.edu.au/pub/daved/shehad87.GIF

Also there are some time domain plots (x1 and x2 time scales, vertical bars =
5ms) of this
example.

Has anyone got another physical explanation for this and are there other types
of anomaly that get classed as creak? My understanding is that creak is a
pretty
broad perceptual classification. Am I wrong? Is it just this phenomenon?

Is there label data for this file?

Cheers and TIA,

dave

Chip Wood

nepročitano,

3. lip 2002. 11:49:2103. 06. 2002.

u

Hey, guys, search the literature, I did this stuff with Moore and Hollien
back in the 60's and the correct term is vocal fry, not creak. Well known
phenomenon and happens all the time, even in normal voices but especially
with males, especially at end of phrases when running out of breath support,
the sub-glottal pressure drops and the vocal fold musculature doesn't
compensate. Any pitch tracker has to account for it, it is not a problem
with your tracker, it really happens.

"Dave Davies" <da...@discus.anu.edu.au> wrote in message
news:3CFB4204...@discus.anu.edu.au...

DOA

nepročitano,

3. lip 2002. 13:20:0403. 06. 2002.

u

For what it is worth, I have also noted the phenomenon below, in
recordings of myself. I was quite surprised, because I could not
imagine what has happening physically to cause it. The
microphone could also be the cause, but again I can't
imagine how.

>
> Dave DAVIES <da...@discus.anu.edu.au> wrote in message
> news:<3CF98716...@discus.anu.edu.au>...

Xuejing Sun

nepročitano,

3. lip 2002. 13:11:4403. 06. 2002.

u

You may want to check the following references for some explanations on
alternate cycles from voice researchers.

[1] I. R. Titze, Workshop on Acoustic Voice Analysis- Summary Statement.
Denver: National Center for Voice and Speech, 1995.

[2] I. R. Titze, Principles of Voice Production. Prentice-Hall, Inc.,
Englewood Cliffs, NJ, 1994.

You can also check http://mel.speech.northwestern.edu/sunxj/pda.htm for a
PDA in this regard. It doesn't solve the problem completely, but it

may represent a good start. The source code and the evaluation databases are
available online. With completely disabled voicing detection module, the
pitch determination results are pretty good.

"Dave DAVIES" <da...@discus.anu.edu.au> wrote in message
news:3CF98716...@discus.anu.edu.au...

Stevie Wonder

nepročitano,

3. lip 2002. 13:46:2303. 06. 2002.

u

http://stud4.tuwien.ac.at/~e8827684/jaco/shehadyourdark.wav.jpg

There is also the term glottalization, but the pitch halfing suggest that
every
second cycle is stressed so that you get half the pitch. Thats my physical
explanation.

Cheers

"Dave Davies" <da...@discus.anu.edu.au> wrote in message
news:3CFB4204...@discus.anu.edu.au...
>
>
> Stevie Wonder wrote:
> asd

robert bristow-johnson

nepročitano,

3. lip 2002. 15:56:1403. 06. 2002.

u

In article PXNK8.91783$305.1...@news.chello.at, Stevie Wonder at x@x.x

wrote on 06/03/2002 13:46:

> http://stud4.tuwien.ac.at/~e8827684/jaco/shehadyourdark.wav.jpg
>
> There is also the term glottalization, but the pitch halfing suggest that
> every
> second cycle is stressed so that you get half the pitch. Thats my physical
> explanation.

i think i understand the physical explanation, and even though my
application is (usually) not speech (sometime it *is* the singing voice), i
wonder what approach you comp.speech.research people take to avoid these
spurious "pitch halfing" problems.

e.g. given a 100 Hz periodic waveform (which is detected as 100 Hz) and a 50
Hz waveform of *much* lower amplitude is added to it, at what relative
amplitude do you call the sum a 50 Hz waveform and at what lower relative
amplitude do you call the sum a 100 Hz waveform?

Dmitry Terez

nepročitano,

5. lip 2002. 00:32:1505. 06. 2002.

u

It is encouraging to see so many people downloading the demo from our site !
(http://www.soundmathtech.com/pitch)

Perhaps, a few explanations and comments are needed at this point:

1. The demo program is provided for presenting a new NONLINEAR paradigm
to speech and signal processing research communities.
We DID NOT intend to achieve the best possible results with this demo
(In fact, it does not even employ dynamic programming post-processing
algorithm - just a simple forward-backward tracking procedure with
NO smoothing of the obtained pitch contour).

2. The real value of this demo is in its implemented functionality to
examine each individual signal frame in detail.
For this, you need to click on the corresponding F0 mark after pitch
computation is finished for an entire utterance (You may also want
to zoom-in on some particular portion of a signal first).
Once you click on any F0 mark, a new window pops-up, which contains
space-time separation plot, periodicity histogram etc. for a particular
signal frame.
Then, you can play with a neighborhood radius and see how the histogram
changes, observe state-space embedding and rotate 3-D trajectories , etc.

3. The speech sample "shehadyourdark.wav" provided on our www-site is just
a random sentence from TIMIT database (I lost the original file label
when truncating the signal and renaming the file).

4. We can provide F0 "save" button in the future (open-source) release of this
demo program. We cannot tell at the present moment when we will be ready to
release it.

5. We do not intend to discuss any issues, other than research-related,
in these groups. You can contact us separately for any questions related to
commercial applications, intellectual property, licensing etc.

Dave Davies

nepročitano,

5. lip 2002. 05:18:5305. 06. 2002.

u

Ok Chip, I had the terminology explained to me in the early days of my PhD but
I found the usage in the literature a bit confusing, or confused. I was
initially taking a DSP view in this thread and wanting to compare trackers but
now we are down this path Sun uses 'creak' in the article Steve was refering
to:

http://mel.speech.northwestern.edu/sunxj/pda.htm

I've read a lot of the F0 tracking lit. and abnormal conditions don't get much
of a mention. It was interesting to see Sun's paper where there is a strong
focus on measuring the degree of these 'sub-harmonic' cycles. Pity it uses
fixed length framing though. Muddies the water.

Thanks for your comments on the contexts that vocal fry is found in. That might
explain why I haven't seen it much in the read speech I have been looking at.
The speakers are possibly more composed and don't run out of breath as readily
as they would in normal conversation. Yet another trap for those of us relying
heavily on read speech in our research and for the ASR systems that are trained
on it.

Cheers,

dave

Jerry Avins

nepročitano,

5. lip 2002. 12:07:2205. 06. 2002.

u

Dave Davies wrote:
>
...

>
> Thanks for your comments on the contexts that vocal fry is found in. That might
> explain why I haven't seen it much in the read speech I have been looking at.
> The speakers are possibly more composed and don't run out of breath as readily
> as they would in normal conversation. Yet another trap for those of us relying
> heavily on read speech in our research and for the ASR systems that are trained
> on it.
>

...

You raise a very important and usually overlooked issue. The need to
standardize a test procedure often diminishes the value of a test by
precluding some cases.

Software produced in my shop had to pass a user-interface validation
suite developed for us by the CS guys. It was a good suite that I
couldn't have improved. My boss and I took it upon ourselves to break
software that passed it, and we usually succeeded. Our test procedures
weren't standardized.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Dave Davies

nepročitano,

8. lip 2002. 22:40:1608. 06. 2002.

u

Thanks to everyone who replied here and by email on the F0 anomaly issue,
particularly Xuejing Sun for pointing to the summary statement of the 1994
Workshop on Acoustic Voice Analysis

http://www.ncvs.org/rescol/sumstat/sumstat.pdf

To summarise: Using the workshop's agreed terminology the vocal anomaly at 2.33s,
or 38000 samples in the plot at ftp://discus.anu.edu.au/pub/daved/shehad87.GIF,
may be seen as an example of pulsed phonation at a subharmonic associated with a
period doubling or bifurcation.

If you look carefully in the signal gifs in my ftp directory you can see what may
be traces of the intermediate pulses in some cases - I've seen more obvious
examples. My Fundamental Harmonic plots (email me if interested) suggest that
vocal fold activity continues at around the original F0 - ie the vocal folds
don't actually drop to a subharmonic, just the acoustic pulses generated by the
closure of alternate cycles. Alternatively it is just the IIR filter used to
extract the FH interpolating across the gap.

Dmitry Terez wrote:

> It is encouraging to see so many people downloading the demo from our site !
> (http://www.soundmathtech.com/pitch)
>

....snip

> 3. The speech sample "shehadyourdark.wav" provided on our www-site is just
> a random sentence from TIMIT database (I lost the original file label
> when truncating the signal and renaming the file).
>

OK, thanks. I'll look it up in our copy of TIMIT.

>
> 4. We can provide F0 "save" button in the future (open-source) release of this
> demo program. We cannot tell at the present moment when we will be ready to
> release it.

Looking forward to see how it compares with my implementation of the Fundamental
Harmonic method. It may turn out to be a matter of horses for courses and that a
really smart ASR system will pick the signal processing methods that suit the
context both acoustically and phonetically.

>

...snip

Cheers,

dave