Hello,
I am pretty new to using ASP on a cluster, and I want to create DEM from two worldview-2 images.
I used ‘bundle_adjust’ first, then ‘map project ‘, and finally got an error generating the DEM.
I use the command in the cluster file system :
sbatch -p fat job8.sh
job8.sh:
------------------------
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 40
parallel_stereo --alignment-method none \
--stereo-algorithm 1 \
--sgm-collar-size 512 \
--corr-tile-size 1024 \
--corr-memory-limit-mb 1300 \
--threads-multiprocess 40 \
--threads-singleprocess 20 \
--cost-mode 4 --corr-kernel 7 7 \
--subpixel-mode 7 \
p1/left_mapped_ba.tif p1/right_mapped_ba.tif \
--bundle-adjust-prefix p1/run_ba/run \
p1/run_pc/out dem-adj.tif
---------------------------
ASP fails to run, getting a slurm-65608.out file.
In the file, I found that the main errors are:
Warning! Your current config file enables debug logging. This will be slow.
………………..
Error: /usr/bin/time -f "stereo_corr: elapsed=%E ([hours:]minutes:seconds), memory=%M (kb)" /public/home/102014/.conda/envs/asp/bin/stereo_corr --alignment-method none --stereo-algorithm 1 --sgm-collar-size 0 --corr-tile-size 2048 --corr-memory-limit-mb 1300 --cost-mode 4 --corr-kernel 7 7 --subpixel-mode 7 p1/left_mapped_ba.tif p1/right_mapped_ba.tif --bundle-adjust-prefix p1/run_ba/run p1/run_pc/out-9216_0_1024_1024/9216_0_1024_1024 dem-adj.tif --skip-low-res-disparity-comp --corr-seed-mode 1 --stereo-file ./stereo.default --threads 40 --trans-crop-win 8704 0 2048 1536: [Errno 2] No such file or directory: '/usr/bin/time': '/usr/bin/time'
………………..
Traceback (most recent call last):
File "/public/home/102014/.conda/envs/asp/bin/parallel_stereo", line 1017, in <module>
spawn_to_nodes(step, settings, parallel_args)
File "/public/home/102014/.conda/envs/asp/bin/parallel_stereo", line 526, in spawn_to_nodes
asp_system_utils.generic_run(cmd, opt.verbose)
File "/public/home/102014/.conda/envs/asp/libexec/asp_system_utils.py", line 486, in generic_run
raise Exception('Failed to run: ' + cmd_str)
Exception: Failed to run: parallel --will-cite --env ASP_DEPS_DIR --env PATH --env LD_LIBRARY_PATH --env ASP_LIBRARY_PATH --env PYTHONHOME -u -P 80 -a /public/home/102014/wv/tmpikxhuoax "/public/home/102014/.conda/envs/asp/bin/python /public/home/102014/.conda/envs/asp/bin/parallel_stereo --alignment-method none --stereo-algorithm 1 --sgm-collar-size 512 --corr-tile-size 1024 --corr-memory-limit-mb 1300 --threads-singleprocess 20 --cost-mode 4 --corr-kernel 7 7 --subpixel-mode 7 p1/left_mapped_ba.tif p1/right_mapped_ba.tif --bundle-adjust-prefix p1/run_ba/run p1/run_pc/out dem-adj.tif --skip-low-res-disparity-comp --processes 80 --threads-multiprocess 40 --entry-point 1 --stop-point 2 --work-dir /public/home/102014/wv --tile-id {}"
--------------------------------------------
I would like to know how to fix this error? How should I set up to take advantage of the computing cluster.
I uploaded the four files I set up: .vwrc, stereo.default, job8.sh, slurm-65608.out
If anyone has insight for how to solve this issue or suggestions on improving my overall workflow I would be very thankful.
John
--------------------------------------------
Notes:
Fat node configuration:
Machine model: H3C UniServer R6700 G3
4* C6248R (3.0GHz/24 core/35.75MB/205W) CPU processor;
48* 32GB 2Rx4 DDR4-2933P-R memory module (FIO);
2* 1.92TB 6G SATA 2.5in RI 5300PRO SSD Universal Hard Disk Module (CMCTO);
Os: Centos7.6
Oleg,
Hi, thank you for your suggestion, the previous problems may be caused by me using the wrong slurm command. There are no errors now.
Now use the parallel_stereo command to get the out-pc.tif file.
But then using the point2dem command will force quit without any prompt.
The main commands I use are as follows (ignore the slurm system command):
1. parallel_stereo 42233.tif 42129.tif 42233.xml 42129.xml dg3/out --threads-multiprocess 8 --threads-singleprocess 8 --session-type rpc
2. point2dem out-PC.tif -o dem/out --errorimage --tr 1.0
I checked the series of folders generated
by the para1.pngllel_stereo command and it seems that each folder has the correct
*pc files inside.
Also, found that "out-log-stereo_corr-08-03-2220-104744.txt" had an error saying "[ fileio ] : Error: GdalIO: dg3/out-D_sub.tif: No such file or directory (code = 4)”. This error is weird, I checked that the file is indeed inside the path.
The relevant documents are attached.
Thank you for your patience and any suggestions are welcome.
Best
John
I upgraded the software version with the command "conda install stereo-pipeline==3.1.0", but the upgraded version still shows "ASP 3.0.1-alpha"?
I re-experimented a few times, and the "Error: /usr/bin/time -f" and "Error: GdalIO: dg3/out-D_sub.tif" both disappeared.
This fix works well, and those errors don't appear anymore.
I used the "parallel_stereo" command and the whole process generated the "out-pc.tif" file without any error or warning messages. I only use one node of the cluster, so I can avoid using the "--nodata-value" parameter.
Next, I used the "point2dem" command and got the "out-DEM.tif" file, but it was only 1kb, which is an error result. There were no errors or warning messages throughout the process.
I checked the log file generated by "point2dem" and there was no error either.(See attachment)
I repeated the experiment several times and got the same result. I found that when "point2dem" runs, the steps "Statistics, Bounding box, and triangulation error range estimation, QuadTree" all look normal, except for the last step
"Writing: dg2/dem/out-DEM.tif" and "Writing: dg2/dem/out-IntersectionErr.tif" take a very long time and seem to have some errors causing the program to get stuck.
I'm guessing it could be caused by the cluster using the Slurm file system but ASP uses the PBS system parameters.The same content, their parameters are different, as shown in the figure below.
I upgraded the software version with the command "conda install stereo-pipeline==3.1.0", but the upgraded version still shows "ASP 3.0.1-alpha"?
I re-experimented a few times, and the "Error: /usr/bin/time -f" and "Error: GdalIO: dg3/out-D_sub.tif" both disappeared.
I used the "parallel_stereo" command and the whole process generated the "out-pc.tif" file without any error or warning messages. I only use one node of the cluster, so I can avoid using the "--nodata-value" parameter.
Next, I used the "point2dem" command and got the "out-DEM.tif" file, but it was only 1kb, which is an error result. There were no errors or warning messages throughout the process.
By the way, the images I use are wv-2 stereo pairs, each image is about 1.3G. After using the parallel_stereo command, I get the out-pc.tif file, and the whole folder becomes very large in this process, 17G. I think it might be because the *.TIF files generated in the middle process are in " Float32" format and this format takes up disk space and causes programs to run slowly. For satellite images, it is actually enough to have "Int16" or "Int8", but "parrlel_stereo" does not provide the option to select image bits.
I use" dem_mosaic --hole-fill-length 300" to fill hole ,but there is almost no effect, the picture on the left is filled, and the right is the original picture.Maybe I need to adjust the method of stereo matching, or some previous step?