Memory Error with Large Datasets

Abraham Harris

unread,

Mar 1, 2023, 5:46:33 PM3/1/23

to Parselmouth

Hello everyone,

I am newer to using Praat but have used Python for a little while now. I am trying to run a Praat script through parselmouth that analyzes features of audio files in batches. I have tested it on some small batches, and it works great. However, I need to analyze about 6,000 audio files that are each 2-3 minutes long for a machine learning project. When I try to do this, I get this error message:

"Out of memory: there is not enough room for (some large number) more elements whose sizes are 8 bytes each. Matrix of elements not created. Sound Not Created . . ."

I suspect that the operating system does not like allocating so much memory to one application, resulting in the crash. Has anyone else experienced something like this when trying to analyze large batches? Is there a way to work around it by deleting Praat Sound objects as I go, or is the only solution to manually break it into small batches?

Any advice would be very appreciated. I will attach the relevant piece of my code below if it is helpful.

Thanks,

Abe

Code for batch analysis:

# set up path for Praat script and audio folder

script_path = "example_path/syllablenucleiv3.praat"
audios = []
audio = "example_path/audio_folder"

# find audio files within audio folder and create sound objects

for file in os.listdir(audio):
f = os.path.join(audio, file)
if os.path.isfile(f):
sndObj = parselmouth.Sound(f)
sndObj.name = file
audios.append(sndObj)

# run Praat script on the list of Praat sound objects
run_file(audios,
script_path,
sndObj.name,
"None", # Pre processing: "None", "Band pass (300..3330 Hz)", "Reduce Noise"
-25, # Silence threshold (dB): (default -25)
2, # Minimum dip near peak (dB): (default 2)
0.3, # Minimum pause duration (s): (default 0.3)
False, # Detect Filled Pauses (bool)
"English", # Language (for Filled Pasuses): "English"/"Dutch"
1, # Filled Pause threshold: (default 1.00)
"Save as text file", # Data: "TextGrid(s) only", "Praat Info window", "Save as text file", "Table"
"OverWriteData", # DataCollectionType: "OverWriteData"/"AppendData
False) # Keep Objects (when processing files) (bool))

# store the results from SyllableNuclei.txt as pandas DataFrame
results = pd.read_csv(r'example_path/SyllableNuclei.txt')

Abraham Harris

unread,

Mar 1, 2023, 6:09:23 PM3/1/23

to Parselmouth

This error happens after I have added about 700 pareselmouth sound objects to my list.

yannick...@gmail.com

unread,

Mar 1, 2023, 6:27:20 PM3/1/23

to Parselmouth

Hi Abe

This might be an error coming from Praat, when it doesn't manage to allocate more memory.

A couple of things:
- Could you try opening the process manager/task manager/..., to keep an eye on the live amount of memory you're using at the point of the crash?

- Is it easy to try just running Praat (the GUI) and opening these same files? If so, do you get the same error?

Running the following on my own Linux machine, I just run out of memory and Python gets killed by the OS. But I can imagine that this is dependent on the OS; which one are you running?
$ python
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import parselmouth
>>> parselmouth.__file__
'/home/yannick/.local/lib/python3.10/site-packages/parselmouth.cpython-310-x86_64-linux-gnu.so'
>>> parselmouth.__version__
'0.4.1''
>>> [parselmouth.Sound("the_north_wind_and_the_sun.wav") for _ in range(1000000)]
Killed

Moreover, when Parselmouth objects get garbage collected, the underlying Praat objects should be deleted and their memory freed. I also quickly tested that, and it would seem as if this really is the case. So as a workaround, you should be able to run in batches, like this:

all_audio = os.listdir(audio)

for i in range(len(all_audio), 100)
audios = []

for file in os.listdir(audio):

      f = os.path.join(audio, file)
      if os.path.isfile(f):
      sndObj = parselmouth.Sound(f)
   sndObj.name = file
   audios.append(sndObj)

run_file(...)
del audios

I hope this helps to debug and workaround? Happy to look further into this, if you can get me more information on what's happening with your memory or how Praat handles this!

Kind regards

Yannick

Abraham Harris

unread,

Mar 1, 2023, 6:48:18 PM3/1/23

to Parselmouth

Yannick,

I think you are correct about the error coming from Praat. I accidentally left it out earlier, but the beginning of my error message said, "Praat Error." My memory reached 100% usage, and I am using the Windows OS (although I know it is inferior to Linux, haha).

However, I tried your idea with running in batches, and that fixed my problem! I was afraid of manually copying and pasting audio files into different folders, but the code you provided worked wonderfully.

Thank you so much for the help! Your speedy response and helpful ideas really made my day. The world is a better place with people like yourself.

Best,

Abe

yannick...@gmail.com

unread,

Mar 1, 2023, 7:14:31 PM3/1/23

to Parselmouth

Great to hear! And thanks for the update and confirmation of what was going wrong :-)

Oh, and for future record. You obviously figured it out, but I think my code was a bit too hastily written. This should be closer, for anyone looking up this issue in the future:

all_audio = os.listdir(audio)

for i in range(len(all_audio), 100)
audios = []

for file in all_audio[i:i+100]:

      f = os.path.join(audio, file)
      if os.path.isfile(f):
      sndObj = parselmouth.Sound(f)
sndObj.name = file
audios.append(sndObj)

run_file(...)
del audios

Kind regards

Yannick

Reply all

Reply to author

Forward