Extracting formant values in voiced intervals ONLY; problem converting Praat script to Parselmouth/Python only

944 views
Skip to first unread message

Eric Jackson

unread,
Mar 9, 2021, 10:40:42 PM3/9/21
to parse...@googlegroups.com
Hello, Yannick (and other Parselmouth users). I'm familiar with using Praat by itself, and I'm somewhat familiar with Python, and libraries like NumPy and Pandas for working with large data tables. I've tried to work out a solution to the problem I'm facing by referring to your documentation, but I've been unable to find an appealing solution. Thanks for any suggestions you may have to get this working well.

My goal:
I have an audio file with natural speech. I would like to end up with an array (ie, a Pandas DataFrame) of times and formant values for only the voiced intervals of the file.

Considerations:
I've seen Praat scripts online that attempt to do this by first removing all the unvoiced intervals from the Sound object, then finding the formants of the concatenated Sound object. However, I believe this creates problems whenever a formant-calculation window straddles a juncture where a voiceless interval was removed. Because of this, I think it's important to first calculate the formants for the entire Sound object, then remove the unvoiced intervals.

A Praat-ish solution:
I believe this Praat script implements this procedure.

# Sources:
#   voiced_extract_auto.txt
#     v 20020423 John Tøndering, modified quite a lot by Niels Reinholt Petersen
#     v 20200817 modified by Eric Jackson to run in Praat 6.0.04 (2015-11-01)
#
# CONSTANTS

# Formant analysis parameters
#   FOR A WOMAN:
# To Formant (burg)... 0.005 5.0 5500.0 0.025 50.0
#   FOR A MAN:
# To Formant (burg)... 0.005 5.0 5000.0 0.025 50.0

time_step = 0.005
max_formant_num = 5
max_formant_freq = 5500
window_length = 0.025
preemphasis = 50

# Pitch analysis parameters
pitch_time_step = 0.005
pitch_floor = 60
max_candidates = 15
very_accurate = 0
silence_thresh = 0.03
voicing_thresh = 0.7
octave_cost = 0.01
oct_jump_cost = 0.35
vuv_cost = 0.14
pitch_ceiling = 600.0
max_period = 0.02

# Other constants
tier = 1
outfile_vuv$ = "/home/emj/ActiveFiles/Personal Development/Personal projects/Vocoid heatmap/Q0 - Getting formant traces/voiced_intervals.csv"

# SCRIPT START

# 1. Check whether the result file exists:
if fileReadable (outfile_vuv$)
pause The file 'outfile_vuv$' already exists! Do you want to overwrite it?
filedelete 'outfile_vuv$'
endif

# Create a header row for the result file: (remember to edit this if you add or change the analyses!)
header$ = "interval start finish'newline$'"
fileappend "'outfile_vuv$'" 'header$'


# 2. Generate pitch track from sound, use that to find Voiced / Unvoiced intervals
name$ = selected$("Sound")

select Sound 'name$'
To Pitch (ac)... pitch_time_step pitch_floor max_candidates very_accurate silence_thresh voicing_thresh octave_cost oct_jump_cost vuv_cost pitch_ceiling

median_f0 = Get quantile... 0 0 0.5 Hertz
mean_period = 1/median_f0

select Sound 'name$'
plus Pitch 'name$'

To PointProcess (cc)
To TextGrid (vuv)... max_period mean_period

# 3. For each interval, output start and end time to file
numberOfIntervals = Get number of intervals... tier

# Pass through all intervals in the designated tier, and if they are voiced, find the start and end time
for interval to numberOfIntervals
label$ = Get label of interval... tier interval
if label$ == "V"
interval$ = string$: interval
start = Get starting point... tier interval
start$ = string$: start
end = Get end point... tier interval
end$ = string$: end

# Save result to text file:
resultline$ = interval$ + tab$ + start$ + tab$ + end$ + newline$
fileappend "'outfile_vuv$'" 'resultline$'

# select the TextGrid so we can iterate to the next interval:
select TextGrid 'name$'_'name$'
endif
endfor

# 4. Calculate and write out formants for this Sound object
select Sound 'name$'
To Formant (burg)... time_step max_formant_num max_formant_freq window_length preemphasis

Down to Table... no yes 6 no 3 yes 3 no
Save as tab-separated file: name$ + "_formants.csv"

This Praat script writes the formant data to a CSV file, which I could then read in to Python as a Pandas dataframe, like this:

image.png

The Praat script likewise writes the VUV intervals from Praat to a CSV, which can be read  in as another DataFrame:

image.png

I could then use these intervals to filter the formant data (in Pandas) to get just the formant rows that occurred in a voiced interval.

A Parselmouth/Pythonic solution?
Instead of working partially in a Praat script and partially in Python and Pandas, What I'd like to do, though, is implement this entirely in Python and Parselmouth. Here is what I have so far as a Python/Parselmouth implementation of this (apologies for duplicating the CONSTANTS section:

# CONSTANTS

# Formant analysis parameters
#   FOR A WOMAN:
# To Formant (burg)... 0.005 5.0 5500.0 0.025 50.0
#   FOR A MAN:
# To Formant (burg)... 0.005 5.0 5000.0 0.025 50.0

time_step = 0.005
max_formant_num = 5
max_formant_freq = 5500
window_length = 0.025
preemphasis = 50

# Pitch analysis parameters
pitch_time_step = 0.005
pitch_floor = 60
max_candidates = 15
very_accurate = False
silence_thresh = 0.03
voicing_thresh = 0.7
octave_cost = 0.01
oct_jump_cost = 0.35
vuv_cost = 0.14
pitch_ceiling = 600.0
max_period = 0.02

# Other constants
tier = 1
outfile = "voiced_intervals.csv"
path = "test1.wav"

# 1. Calculate formants for the Sound object
sound = parselmouth.Sound(path)
formants = sound.to_formant_burg(time_step,
                                 max_formant_num,
                                 max_formant_freq,
                                 window_length,
                                 preemphasis)

data_table = parselmouth.praat.call(formants,
                                    "Down to Table...",
                                    False, True, 6,
                                    False, 3, True, 3, False)

# 2. Generate pitch track from sound, use that to find Voiced / Unvoiced intervals
pitch = sound.to_pitch_ac(pitch_time_step,
                          pitch_floor,
                          max_candidates,
                          very_accurate,
                          silence_thresh,
                          voicing_thresh,
                          octave_cost,
                          oct_jump_cost,
                          vuv_cost,
                          pitch_ceiling)

mean_period = 1/parselmouth.praat.call(pitch, "Get quantile", 0.0, 0.0, 0.5, "Hertz")
pulses = parselmouth.praat.call([sound, pitch], "To PointProcess (cc)")
tgrid = parselmouth.praat.call(pulses, "To TextGrid (vuv)", 0.02, mean_period)

By the end of the Parselmouth/Python script, I have the formants in data_table and the V-UV intervals as a TextGrid in tgrid, but I'm not sure how to get those into a form that I can easily manipulate entirely in Python. I should be able to perform this time-filter without having to call Praat again, but I can't find in the documentation how to get the intervals out of the TextGrid, or how to get the formant values out of the Formants object or out of the Table object. If you can point me to documentation that explains this, that may be enough to help me! If you find it more helpful, my version of this Praat script and the Python code I'm trying are also on Github.

Thanks for your help,
Eric


yannick...@gmail.com

unread,
Mar 10, 2021, 12:41:14 PM3/10/21
to Parselmouth
Hi Eric

Thanks for the extended overview of your problem and what you have already done! This makes it a lot easier to provide a relevant answer :-)

First of all about the TextGrid: Parselmouth doesn't have a full TextGrid interface (yet), but since version 0.4.0, there is the integration with TextGridTools (`TextGrid.to_tgt()`, I believe). This other Python library provides a nice small Python interface to access the information in TextGrids: https://textgridtools.readthedocs.io/en/stable/api.html; it even has convenience functions like `Tier.get_annotations_by_time()`, which should probably give you an easy way of checking whether a time stamp of a formant estimate has label "U" or "V". The other option is a bunch more calls to `parselmouth.praat.call`, more or less directly translating the Praat script from above (but personally, I think I'd go for TextGridTools).

Formant values, you should be able to extract with `formants.get_value_at_time` or formants.get_value_at_sample` (and maybe use `formants.xs()` or `formants.ts()` to get the time stamps).
Querying a Praat `Table`, I haven't really looked into, yet. Again, `parselmouth.praat.call` is your friend to do the exact same thing as Praat would do. If you want a quick, hybrid version, I think this should also work, basically writing the Table object to a CSV (well, TSV), but not writing it to file: `pd.read_csv(io.StringIO(parselmouth.praat.call(data_table, "List", True)), sep='\t')`.

And yes, the documentation is severely lacking, indeed. Hopefully this will get better, bit by little bit. So anyway, I don't mind answering a question here, then; you didn't miss anything in the docs.
If I misunderstood or missed some of your questions, or you have follow-up questions, please don't hesitate to get back to me!

Kind regards
Yannick

yannick...@gmail.com

unread,
Mar 10, 2021, 12:43:12 PM3/10/21
to Parselmouth
Oh, by the way. I just saw through that GitHub link that you have a Praat script already. If you prefer, making minor adaptations and running that script is of course also possible, through `parselmouth.praat.run` or `parselmouth.praat.run_file`.
There is an example in the Journal of Phonetics article; happy to send that supplementary material if this approach would be of interest.

Eric Jackson

unread,
Mar 10, 2021, 9:50:51 PM3/10/21
to Parselmouth
Hello, Yannick. Thank you for this very quick and helpful response!

First of all about the TextGrid: Parselmouth doesn't have a full TextGrid interface (yet), but since version 0.4.0, there is the integration with TextGridTools (`TextGrid.to_tgt()`, I believe).

This sounds promising. I'll give this a try. If I run into too much trouble, I think I could also try the method that you suggest below, where I call Praat to "output" a CSV, but read the output directly into a Pandas DF without writing to disk.
 
Formant values, you should be able to extract with `formants.get_value_at_time` or formants.get_value_at_sample` (and maybe use `formants.xs()` or `formants.ts()` to get the time stamps).

Thank you for making this suggestion! I had been so focused on getting Praat to run multiple formant measurements through the Sound object, but if I'm understanding your meaning correctly, I could also do the time-step calculation on the Python side, and just do individual formant measurements at each of those times. I haven't looked at how those formant values get passed back, but I assume they'll be in a simpler data structure, and if that were the case, I might be able to save them one row at a time directly into a DF.

As a Plan B, I can use the `pd.read_csv(io.StringIO(parselmouth.praat.call(data_table, "List", True)), sep='\t')` trick.
 
And yes, the documentation is severely lacking, indeed. Hopefully this will get better, bit by little bit. So anyway, I don't mind answering a question here, then; you didn't miss anything in the docs.
If I misunderstood or missed some of your questions, or you have follow-up questions, please don't hesitate to get back to me!

No worries on documentation! Maintaining a codebase is time-consuming for one person; documenting it properly is of course great, but is even more to keep up with. Being available for questions helped in my case!

Thanks again,
Eric
 

yannick...@gmail.com

unread,
Mar 11, 2021, 11:48:59 AM3/11/21
to Parselmouth
Hi Eric

Great to hear that that as useful :-)

Still a quick answer about on this:

>> Formant values, you should be able to extract with `formants.get_value_at_time` or formants.get_value_at_sample` (and maybe use `formants.xs()` or `formants.ts()` to get the time stamps).

> Thank you for making this suggestion! I had been so focused on getting Praat to run multiple formant measurements through the Sound object, but if I'm understanding your meaning correctly, I could also do the time-step calculation on the Python side, and just do individual formant measurements at each of those times. I haven't looked at how those formant values get passed back, but I assume they'll be in a simpler data structure, and if that were the case, I might be able to save them one row at a time directly into a DF.
 
So, the actual formant measurements happen in `Sound.to_formant` (just like in Praat). `get_value_at_time` just gets the values (and does some interpolation if there's not a formant measurement at the requested time).

Thanks for your enthusiasm!
Yannick

Dayana Ribas

unread,
Apr 21, 2021, 9:16:57 AM4/21/21
to Parselmouth
Hi! 
You can also see some very useful examples here: https://github.com/drfeinberg/PraatScripts



Good luck!
Dayana
Reply all
Reply to author
Forward
0 new messages