Stereo fails after pre-processing

97 views
Skip to first unread message

Tricia Nelsen

unread,
Jan 26, 2022, 12:18:19 AM1/26/22
to Ames Stereo Pipeline Support
Hello again, 

I'm sorry for posting so frequently, hopefully one day I will be able to run my full process without issues. After finally getting WSL to work with the compiled binaries, I now can't get my parallel_stereo call to work. It gets through pre-processing, though with many GdalIO errors that didn't stop it from running (those logs are attached), but then gives the following error when it starts low-res correlation:

Traceback (most recent call last):
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-20-x86_64-Linux/libexec/parallel_stereo", line 862, in <module>
    calc_lowres_disp(args, opt, sep)
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-20-x86_64-Linux/libexec/stereo_utils.py", line 267, in calc_lowres_disp
    stereo_run('stereo_corr', local_args, opt, msg='')
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-20-x86_64-Linux/libexec/stereo_utils.py", line 205, in stereo_run
    raise Exception('Stereo step ' + kw['msg'] + ' failed')
Exception: Stereo step  failed

I'm guessing this is probably an issue with my machine/WSL, but do you have any ideas what might be going on? If not, could you help me understand what is the bug with mapproject in ASP 3.0.0 that runs with the conda distribution, and how I can either avoid or rectify it? I'm guessing it may be necessary to go back to that conda to create my DEMs, but am not sure how to monitor whatever error the bug creates. For reference, I am using the Stereo Pipeline to process Digital Globe stereo pairs and have been using the process Bundle adjust -> Map project -> Stereo -> point2dem. 

Thank you,
Tricia Nelsen
output-log-stereo_pprc-01-25-0942-19382.txt
output-log-stereo_corr-01-25-0958-4203.txt

Alexandrov, Oleg (ARC-TI)[KBR Wyle Services, LLC]

unread,
Jan 26, 2022, 12:38:45 AM1/26/22
to Tricia Nelsen, Ames Stereo Pipeline Support
Dear Tricia, 

No need to apologize for posting frequently. Here we survive by feeding off user bugs, so to speak. :)

The issue with crash at "No IP file found, computing IP now." is a bug. It was reported today by a different user and I think I fixed it. You can try getting build 2022-01-26 when it shows up at https://github.com/NeoGeographyToolkit/StereoPipeline/releases, which will be some time in the next 5 hours if it runs smoothly, or the next 10 hours if I have to help it along. 
The other issue:
Error: GdalIO: LZWDecode:Corrupted LZW table at scanline 20736 (code = 1)"
looks like data corruption to me. I don't know what to say. You can try cropping just a small piece of the DEM you use for mapprojection, and mapproject onto it your input images, then run stereo with only those images clips and the piece of the DEM. (One can use gdal_translate -srcwin or -projwin to crop data.)
If that works, maybe you can try to make the clip bigger, then at some point figure out what is going on. 
Such corruption may also happen if two different processes write to the same file. 
I wonder what are the sizes of your raw images, before mapprojection, and what do they become after mapprojection. The command gdalinfo (included in ASP's bin directory) can print image dimensions.
Oleg




From: ames-stereo-pi...@googlegroups.com <ames-stereo-pi...@googlegroups.com> on behalf of Tricia Nelsen <tene...@gmail.com>
Sent: Tuesday, January 25, 2022 9:18 PM
To: Ames Stereo Pipeline Support <ames-stereo-pi...@googlegroups.com>
Subject: [EXTERNAL] Stereo fails after pre-processing
 
--
You received this message because you are subscribed to the Google Groups "Ames Stereo Pipeline Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ames-stereo-pipeline...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ames-stereo-pipeline-support/603a95a2-a95f-4acd-95c4-3d7a38030556n%40googlegroups.com.

Tricia Nelsen

unread,
Jan 29, 2022, 10:13:24 AM1/29/22
to Ames Stereo Pipeline Support
Thanks, Oleg. I was able to run it with the new build, and it has been running for the past 3 days without problem, but it looks like it just failed again during the refinement stage. Looks like it went through all my tiles and couldn't open them because "GDALIO: too many files open". I've attached a log file here, it looks like this happened toe every tile across the image.

Then this is the error message that popped up when it decided to quit:
Traceback (most recent call last):
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux/libexec/parallel_stereo", line 913, in <module>
    spawn_to_nodes(step, settings, self_args)
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux/libexec/parallel_stereo", line 463, in spawn_to_nodes
    asp_system_utils.generic_run(cmd, opt.verbose)
  File "/home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux/libexec/asp_system_utils.py", line 426, in generic_run
    raise Exception('Failed to run: ' + cmd_str)
Exception: Failed to run: parallel --will-cite --env ASP_DEPS_DIR --env PATH --env LD_LIBRARY_PATH --env PYTHONHOME -u -P 12 -a /home/tnels16/pipeline2/tmp9h2raat_ "/home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux/bin/python /home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux/libexec/parallel_stereo -t dgmaprpc --stereo-algorithm asp_mgm --subpixel-mode 2 --alignment-method none --bundle-adjust-prefix run_ba/ba image1_mapped_again.tif image2_mapped_again.tif cam1.XML cam2.XML ba_mapped_stereo_build0126_mgm/output arcticdem_studyextent.tif --subpix-from-blend --processes 12 --threads-multiprocess 8 --entry-point 3 --stop-point 4 --work-dir /home/tnels16/pipeline2 --isisroot /home/tnels16/StereoPipeline-3.0.0-2022-01-26-x86_64-Linux --tile-id {}"

Is there something I can change so it doesn't keep open so many files? Also, is there a way to pick up at refinement stage at the next run so I don't have to wait another 3 days for the correlation to be run again?

Sidenote- I re-mapprojected the images and that solved my corrupted files problem.

Thanks,
Tricia
52224_56320_407_1018-log-stereo_rfne-01-29-0202-1132.txt

Alexandrov, Oleg (ARC-TI)[KBR Wyle Services, LLC]

unread,
Jan 29, 2022, 2:01:06 PM1/29/22
to Tricia Nelsen, Ames Stereo Pipeline Support
We got another error report about "GDALIO: too many files open" recently as well.

The problem is not in our code. Since your images are so big, too many tiles got created, it looks that GDAL can't merge them.  It looks from your log file that the problem is with the intermediate file B.tif in your output dir, which is a giant VRT (virtual collection of many files).

You have two options at this stage, and regretfully I don't think you can reuse your work.  The first is to split the left mapprojected image in two, with some overlap, say 1024 pixels, (likely doing a top half and bottom half will work, as I recall these images to be very tall), then run parallel_stereo on both, create DEMs, and merge them with point2dem.

The second option is to run parallel_stereo with bigger --job-size-h and --job-size-w, such as with a value of 3072 instead of the default 2048. There's a chance SGM may run out of memory with bigger tiles, then fewer processes may be needed.

It should be possible to reuse your work, but it would be some pain. The offending VRT file, which is your ba_mapped_stereo_build0126_mgm/output-B.tif file, is just a list of files and coordinates about how those files should be combined in a big file. There are too  many files and GDAL complains. 

It should be possible to edit this file and break it into two or four smaller VRTs, each still a text file, merge them individually with gdal_translate, get then a small  number of very big files, create a VRT having those, which would have very similar syntax to your original VRT, just file names and coordinates, then merge those with gdal_translate. Looks like doable work but would need some attention to detail.

Sorry. You are hitting the limits of the tools.

Lastly, your choice to use --subpixel-mode 2 will result in the whole thing running for a lot more days after that step. I would suggest --subpixel-mode 3 which is faster and almost as good. 

Lastly, a beefy machine with many nodes is suggested for this kind of work. Waiting 3 days to for something to run is a lot.




 

Sent: Saturday, January 29, 2022 7:13 AM

To: Ames Stereo Pipeline Support <ames-stereo-pi...@googlegroups.com>
Subject: Re: [EXTERNAL] Stereo fails after pre-processing
 

Tricia Nelsen

unread,
Feb 1, 2022, 2:11:05 AM2/1/22
to Alexandrov, Oleg (ARC-TI)[KBR Wyle Services, LLC], Ames Stereo Pipeline Support
Thanks, Oleg, this is very helpful. Funny enough, this stereo pair was supposed to be a smaller image that I was trying to use to validate my ASP method with some ground-truthed snow measurements, but didn't realize that the entire image downloaded instead of just my AOI -oops. I clipped the mapprojected images with gdal_translate to about 1/4 of the size of the total image and am re-running that, and hopefully will run into no issues. Quick question about clipping the image size while trying to figure out the best settings to run - what size window do you recommend? I figure this might vary based on the situation. I'm currently using ASP to try and map snowpack in a far North Alaskan village, where it is very flat. The validation snow depths I'm using are just in a 100m x 100m grid, but I assume that would be too flat in the middle of the tundra to detect much difference. I expanded to include the nearest road and a couple buildings, does this seem like it would be enough?

Do you have any idea at what size is too big for GDAL to merge the tiles so I can avoid this problem when I do return to bigger images?

Hoping to get access to some supercomputing resources soon so my laptop can take a break from the 4 day long processing times...

Thanks, as always, for all your help.
Tricia

Oleg Alexandrov

unread,
Feb 1, 2022, 12:09:32 PM2/1/22
to Tricia Nelsen, Alexandrov, Oleg (ARC-TI)[KBR Wyle Services, LLC], Ames Stereo Pipeline Support
Tricia,

First, the bug you found, even though not strictly in our tools, is high on my list to fix, once I finish with commitments for a different project, hopefully in a few weeks. (It will take time to figure out a solution, or else I would have gotten to it sooner.)

The fact that your area to image is flat is not a problem. ASP does have problems though if the area has no texture. Fresh snow, with no detail, would give it a really hard time. 

You can choose clips as small as you like, as long as the left clip and right clip overlap. In fact, you can do a handful a very small experiments which should just take minutes (or tens of minutes, depending on the region), if you use stereo_gui instead of parallel_stereo, but  also specifying the cameras and output prefix, then selecting some clips with Control-Mouse-Drag, and running stereo on those from the Run menu. (The stereo_gui manual has more detail.)

Oh, and even if you include a nearest road in your image, etc, if your snow is really fresh and has no details that extra landmarks won't help. You can zoom in the GUI and see if anything is discernible in the images. You can also use --corr-max-levels 1 or 2 or so, to help it not get confused. (https://stereopipeline.readthedocs.io/en/latest/stereodefault.html#correlation) But again, this can be tough, depending on exactly what your images got. 

> Do you have any idea at what size is too big for GDAL to merge the tiles so I can avoid this problem when I do return to bigger images?

We got this error just very recently. I assume the images must be really huge if it did not come up before. Likely 150,000 pixels or more on the side. 

Oleg


Oleg Alexandrov

unread,
Feb 8, 2022, 8:30:20 PM2/8/22
to Tricia Nelsen, Alexandrov, Oleg (ARC-TI)[KBR Wyle Services, LLC], Ames Stereo Pipeline Support
I put a fix to the "too many open files" bug. The fix will be in tomorrow morning's build. (The bug was encountered by a customer funding a project; that tends to have amazing effect on one's motivation and sense of urgency.) 
Reply all
Reply to author
Forward
0 new messages