Laszip - integer overflow, writing a lot of points

144 views
Skip to first unread message

Martin Graner

unread,
Sep 13, 2023, 10:29:34 AM9/13/23
to LAStools - efficient tools for LiDAR processing
Hey,

I am not sure if this group is the right place to post this question at all, so if not - please tell me and I will go and ask somewhere else.

Basically we are using the LASzip implementation to read and write point clouds and have now the issue that we approach the uint32 maximum of points ( 2^32-1  ~ 4.2 billion) which will result in an integer overflow in the field number_of_point_records

When writing, there are multiple things we can adept, like las/laz version, point data format, ...
In addition there are extended fields which should fix the limit of unit32.

Currently we are writing las version 1.4 with point data format 3 (we tried 6 as well), using or disabling the compatibility mode.
Our code looks similar to the EXAMPLE_ELEVEN with laszip_update_inventory (since we don't know how many points we are going to write beforehand).

It would be great to know, if it is possible to write more than 2^32 points (it is, there are more points in the file, but to have the correct extended_number_of_point_records), and if so, can somebody point me to an example or tell me which las format and point data format to use, and which functions won't work.

Thank you very much.
Martin

Jochen Rapidlasso

unread,
Sep 13, 2023, 4:36:48 PM9/13/23
to LAStools - efficient tools for LiDAR processing
Hi Martin,
there were some requests about very large files and all we usually say is: Avoid them! 
I can not imagine a case where it make sense to deal with a single 20 GB file.
Nevertheless, I made a few attempts today to see what the problems are with huge files.
Biggest thing I found is a file with 2.922.714.287 points and a LAZ size of almost 16 GB.

To store pointclouds with > 32Bit of points you have to use point data format >= 6.

lasinfo -i big.laz

...
  file signature:             'LASF'
...
  version major.minor:        1.4
...
  point data format:          6
  point data record length:   30
  number of point records:    0
  number of points by return: 0 0 0 0 0
...
  extended number of point records: 2922714287

First I decompress this file to a 85 GB monster:

laszip -i big.laz -o big.las -v
1114.76 secs to write 87681429509 bytes for 'big.las' with 2922714287 points of type 6

Then I created a slightliy different copy of the file
las2las -i big.las -translate_raw_xy_at_random 100 100 -o big2.las -v

Now I compress this 2 monsters to a LAZ file:

laszip -i big.las -i big2.las -merged -o tmp.laz -v
1961.9 secs to write 29377949441 bytes for 'tmp.laz' with 5845428574 points of type 6
needed 1961.9 sec for 2 files

lasinfo -i tmp.laz

lasinfo (230821) report for 'tmp.laz'

reporting all LAS header entries:
...
  version major.minor:        1.4
...
  point data format:          6
  point data record length:   30
  number of point records:    0
...
  extended number of point records: 5845428574
  extended number of points by return: 4311329972 1209518008 263458672 51621652 8271580 1104812 118234 5644 0 0 0 0 0 0 0
...
LASzip compression (version 3.4r3 c3 50000): POINT14 3
...

All of this went without any problems.

Still I think this does not make sense - but so far everything works well.
If you have problems writing big files tell exactly 
- what you did, 
- what commands you used and 
- what the program presents you as output.
Maybe I can help then.

Best regards,

Jochen @rapidlasso

Martin Graner

unread,
Sep 26, 2023, 11:16:59 AM9/26/23
to LAStools - efficient tools for LiDAR processing
Hi Jochen,

thanks for the explanation and the hints.
We went down this rabbit hole a little deeper.

So one issue we found is that using the laszip_update_inventory() is not working, because internally it is using U32 fields for the number of point records (see line 75, 76
U32 number_of_point_records;
U32 number_of_points_by_return[16];
in
https://github.com/LASzip/LASzip/blob/master/src/laszip_dll.cpp )

After changing these to U64 and recompiling (did the classic, forgot to copy the dll the first time ...) we are now able to write a laz file with ~5 Billion points

When we import this back using the laszip and printing the head of the laz file we get:
Header ................
SourceID       :  1
Global Encoding:  0
GUID data1     :  0
GUID data2     :  0
GUID data3     :  0
Proj_ID_GUID   :  LAS Version    :  1 . 4
Ident          :  LAS writer
Software       :  LASzip DLL 3.4 r3 (191111)
Day            :  268
Year           :  2023
HeadSize       :  375
OffsetPData    :  375
NumVarLR       :  0
PointDataFormat:  7
PointDataFormat:  Points + Reflectivity + color
PointDRL       :  39
NumPoiRecords  :  0
NumPoiRecordsR :  0
EXTNumPoiRecords  :  4995044410
EXTNumPoiRecordsR :  0
Scales         :  0.0005   0.0005   0.0005
Offset         :  0   0   0
Min            :  50.505   0.01   0.1
Max            :  151.495   201.99   100.99
UD in H size   :  0
UD a H size    :  0

After importing (and exporting an E57 to be tripple sure), we found that we wrote nearly 5 billion points and were able to read them as well.

However testing this in CloudCompare, ReCap and LasTools, it always only displayed the integer overflow values ~705 million points.

Fully working example attached - perhaps you can give this a whirl.
Should be around 2GB of laz file (great compression ;) )


Output testcode on my machine:
Open Writer....
Write all points....
0% points to write: 5000000000
99% written points: 4994549361
how much data is left:5000000000-4995044410=4955590
Close writer....


What we are now woundering is: Are we doing something wrong since it is not working in the LasTools, where I'd expect it to work.
ReCap is using the LASzip as far as I know, so I could write them and ask them.
I think CloudCompares' LAS is based on PDAL, but no idea if they use LASzip or LAZperf.

Thanks a lot
Martin

PS: Are you coming to Intergeo in Berlin in October?
main.cpp

Jochen Rapidlasso

unread,
Sep 26, 2023, 5:14:11 PM9/26/23
to LAStools - efficient tools for LiDAR processing
Hi Martin,
the 32 Bit to 64 Bit port isn't that easy - but we work on that.
As long as you do a linear read you should be able to process files larger than 4.294.967.295 points.
We will be at Intergeo. Please visit our booth C1.048 in hall 1.2. We look forward to meeting you.
Best regards,

Jochen @rapidlasso
Reply all
Reply to author
Forward
0 new messages