Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Memory Error with Large Datasets

81 views
Skip to first unread message

Abraham Harris

unread,
Mar 1, 2023, 5:46:33 PM3/1/23
to Parselmouth
Hello everyone,

I am newer to using Praat but have used Python for a little while now. I am trying to run a Praat script through parselmouth that analyzes features of audio files in batches. I have tested it on some small batches, and it works great. However, I need to analyze about 6,000 audio files that are each 2-3 minutes long for a machine learning project. When I try to do this, I get this error message:

"Out of memory: there is not enough room for (some large number) more elements whose sizes are 8 bytes each. Matrix of elements not created. Sound Not Created . . ."

I suspect that the operating system does not like allocating so much memory to one application, resulting in the crash. Has anyone else experienced something like this when trying to analyze large batches? Is there a way to work around it by deleting Praat Sound objects as I go, or is the only solution to manually break it into small batches?

Any advice would be very appreciated. I will attach the relevant piece of my code below if it is helpful.

Thanks,

Abe



Code for batch analysis:

# set up path for Praat script and audio folder
script_path = "example_path/syllablenucleiv3.praat"
audios = []
audio = "example_path/audio_folder"

# find audio files within audio folder and create sound objects
for file in os.listdir(audio):
     f = os.path.join(audio, file)
     if os.path.isfile(f):
          sndObj = parselmouth.Sound(f)
          sndObj.name = file
          audios.append(sndObj)

# run Praat script on the list of Praat sound objects
run_file(audios,
         script_path,
         sndObj.name,
         "None", # Pre processing: "None", "Band pass (300..3330 Hz)", "Reduce Noise"
         -25, # Silence threshold (dB): (default -25)
         2, # Minimum dip near peak (dB): (default 2)
         0.3, # Minimum pause duration (s): (default 0.3)
         False, # Detect Filled Pauses (bool)
         "English", # Language (for Filled Pasuses): "English"/"Dutch"
         1, # Filled Pause threshold: (default 1.00)
         "Save as text file", # Data: "TextGrid(s) only", "Praat Info window", "Save as text file", "Table"
         "OverWriteData", # DataCollectionType: "OverWriteData"/"AppendData
         False) # Keep Objects (when processing files) (bool))
   
   
# store the results from SyllableNuclei.txt as pandas DataFrame
results = pd.read_csv(r'example_path/SyllableNuclei.txt')

Abraham Harris

unread,
Mar 1, 2023, 6:09:23 PM3/1/23
to Parselmouth
This error happens after I have added about 700 pareselmouth sound objects to my list.

yannick...@gmail.com

unread,
Mar 1, 2023, 6:27:20 PM3/1/23
to Parselmouth
Hi Abe

This might be an error coming from Praat, when it doesn't manage to allocate more memory.

A couple of things:
- Could you try opening the process manager/task manager/..., to keep an eye on the live amount of memory you're using at the point of the crash?
- Is it easy to try just running Praat (the GUI) and opening these same files? If so, do you get the same error?

Running the following on my own Linux machine, I just run out of memory and Python gets killed by the OS. But I can imagine that this is dependent on the OS; which one are you running?
$ python
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import parselmouth
>>> parselmouth.__file__
'/home/yannick/.local/lib/python3.10/site-packages/parselmouth.cpython-310-x86_64-linux-gnu.so'
>>> parselmouth.__version__
'0.4.1''
>>> [parselmouth.Sound("the_north_wind_and_the_sun.wav") for _ in range(1000000)]
Killed

Moreover, when Parselmouth objects get garbage collected, the underlying Praat objects should be deleted and their memory freed. I also quickly tested that, and it would seem as if this really is the case. So as a workaround, you should be able to run in batches, like this:

all_audio = os.listdir(audio)
for i in range(len(all_audio), 100)
     audios = []
     for file in os.listdir(audio):
          f = os.path.join(audio, file)
          if os.path.isfile(f):
              sndObj = parselmouth.Sound(f)
               sndObj.name = file
               audios.append(sndObj)
     run_file(...)
     del audios


I hope this helps to debug and workaround? Happy to look further into this, if you can get me more information on what's happening with your memory or how Praat handles this!

Kind regards
Yannick

Abraham Harris

unread,
Mar 1, 2023, 6:48:18 PM3/1/23
to Parselmouth
Yannick,

I think you are correct about the error coming from Praat. I accidentally left it out earlier, but the beginning of my error message said, "Praat Error." My memory reached 100% usage, and I am using the Windows OS (although I know it is inferior to Linux, haha). 

However, I tried your idea with running in batches, and that fixed my problem! I was afraid of manually copying and pasting audio files into different folders, but the code you provided worked wonderfully. 

Thank you so much for the help! Your speedy response and helpful ideas really made my day. The world is a better place with people like yourself.

Best,

Abe

yannick...@gmail.com

unread,
Mar 1, 2023, 7:14:31 PM3/1/23
to Parselmouth
Great to hear! And thanks for the update and confirmation of what was going wrong :-)

Oh, and for future record. You obviously figured it out, but I think my code was a bit too hastily written. This should be closer, for anyone looking up this issue in the future:

all_audio = os.listdir(audio)
for i in range(len(all_audio), 100)
     audios = []
     for file in all_audio[i:i+100]:
          f = os.path.join(audio, file)
          if os.path.isfile(f):
              sndObj = parselmouth.Sound(f)
              sndObj.name = file
              audios.append(sndObj)
     run_file(...)
     del audios

Kind regards
Yannick
Reply all
Reply to author
Forward
0 new messages