Possible memory leak in lasduplicate

Skip to first unread message

Michael Perdue

May 23, 2024, 4:38:19 AMMay 23
to LAStools - efficient command line tools for LIDAR processing

I'm not sure, but I think I might have come across a memory leak in lasduplicate. When I run the command (in linux):
lasduplicate -nearby 0.03 -i raw_tiles/5094_51874.laz -olaz -odir thinned

Memory consumption soars. The file has 42146479 records and a record length of 42 bytes. Loading the entire file into RAM should consume ~1.6GB of ram. But when I run lasduplicate on the file I can watch memory for that process climb in the monitor to 25.4GB before the task finally completes.

Maybe this is legit, but it seems excessive if true.

I will send the file as a sample offline.



Jochen Rapidlasso

May 23, 2024, 4:52:40 AMMay 23
to LAStools - efficient tools for LiDAR processing
Hi Mike,
memory consumption is no memory leak :)
A memory leak is defined, if a program continues to occupy memory and NEVER gives it back.
A program like lasduplicate with *one* file can not have a memory leak by operating system mechanisms:
When the program ends, it give all memory back it occupied during runtime.
So your concern seems to be just about the memory consumption.
Please keep in mind that we have to measure the distance to each point against each other. So the matrix we have to build increases exponential to your number of points.
This is why we always tell the boring story about tiling...

The file you supplied did well in around 6 minutes:

LAStools lasduplicate (by in...@rapidlasso.de) version 240522
reading 42146479 points of type 6 from '5094_51874.laz' and writing to 'thinned\5094_51874.laz'.
number of xy-duplicates 40532093
found 1330464 duplicates in '5094_51874.laz'. took 376.538 sec.

If you run out of memory on even bigger files than 42 million points we recommend 
- to tile
- optional to thin the file first using lasthin, there we do not have to calculate all the distances and can operate more "on the fly", so this saves memory.



Terje Mathisen

May 23, 2024, 5:56:53 AMMay 23
to last...@googlegroups.com, Jochen Rapidlasso
lasduplicate could probably run using less memory in nearby mode if you allow each point to be indexed in 4 or 8 cells, with cell size based on the -nearby parameter: Each cell (storied in a hash table since it can be quite sparse) would contain the index value for the points that are sufficiently close so when you read a new point you can immediately see the list of potential candidates, and this list will be very short.

Memory use for n points will be O(n), no exponential growth.


Jochen Rapidlasso wrote:
- <Terje.M...@tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Reply all
Reply to author
0 new messages