closure phase bias - memory leak and fails to complete on large dataset

947 views
Skip to first unread message

sar

unread,
Aug 29, 2022, 11:32:39 PM8/29/22
to MintPy
I'm using the new closure_phase_bias.py script and have resolved the issues I had in running it (i.e. install isce module). I've run the test dataset and it runs to completion for each of the 3 options, but I get a memory leak warning for option 2 and 3:

/xxx/.conda/envs/mintpy/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py:702: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(

The commands I used were:

Quick estimate command:
python3 closure_phase_bias.py -i inputs/ifgramStack.h5 --bw 5 --nl 20 -o . -a quick_estimate --ram 250 --num-worker 5 -c local

Full estimate command:
python3 closure_phase_bias.py -i inputs/ifgramStack.h5 --bw 5 --nl 20 -o . -a estimate --ram 250 --num-worker 5 -c local

I'm using a HPC environment, but using a single node for all my jobs:
OS: linux-CentOS
MintPy version: 1.4.0 (from conda)

I'm not sure if this warning is causing any issues with the actual processing, or if it's to do with the HPC configuration, rather than the script.

However, when I try to run the same command but using my dataset instead, it won't run to completion. This is despite allocating 24hrs walltime with 24 cpus and 500GB ram on a single node. The command being used is:

python3 closure_phase_bias.py -i inputs/ifgramStack.h5 --bw 5 --nl 10 -o . -a estimate --ram 500 --num-worker 24 -c local

Both the test dataset and my dataset commands are submitted the same way, but only the resources and the 'nl' number differ.

I've checked the job while running and the cpus are being used, so it's not frozen. However it only gets as far as working on the 'conn2' directory but fails to generate the 
cumSeqClosurePhase.h5 and maskConnComp.h5 files. Only the files associated with the sequential closure phase stack are generated in this directory. It also doesn't appear to initiate dask (unlike the tests above).

My interferogram network is sequential and is modified to have the same max number of connections/neighbors per acquisition to match the parameters above:

python3 modify_network.py inputs/ifgramStack.h5 --max-conn-num 5

Has anyone run this on a large dataset without any problems? 

Before I seek help from my HPC support area, I' like to check that there aren't any issues with the actual code first.

Thanks

Yujie Zheng

unread,
Sep 7, 2022, 8:29:35 PM9/7/22
to MintPy

Thank you for posting. This script is the first working version, and we welcome feedback to allow us to make the script more robust for various scenarios.

In this case, I believe the failure to completion with your dataset is indeed a code design issue. When we computed the closure phases, we created a matrix  (https://github.com/insarlab/MintPy/blob/d9879502265e53daafc9aa2ccf5dcf4c5fa1cb21/mintpy/closure_phase_bias.py#L376 ) that is similar to the size of the input ifgramStack.h5, which in your case is probably close to or larger than the requested memory (500 GB). 

Before we make changes to the script,  my suggestion right now for processing large datasets is to take further looks so that the size of the stack input ifgramStack.h5 is significantly at least two factors smaller than the allocated memory. 

We will redesign the part of the code so that no large matrices are stored in memory while computing. 

As for the warning of memory leak error,  I believe it is not directly related to the dataset size. It had occurred before when we were testing (confirmed by Yunjun Zhang), but so far seems not to affect the outputs.

 

Best,

Yujie Zheng

sar

unread,
Sep 12, 2022, 7:50:31 PM9/12/22
to MintPy
Thanks Yujie, much appreciated!! 

I've got the scripts working by multi-looking by 2, but will keep in mind the file size while processing my larger datasets.
Reply all
Reply to author
Forward
0 new messages