LasZip reads points as integers???

116 views
Skip to first unread message

Chris Volpe

unread,
May 24, 2015, 12:40:08 PM5/24/15
to last...@googlegroups.com
Hi. I'm using laszip to read LAS files and noticed something strange when comparing a C++ version of my application with a matlab equivalent that reads the files with a Matlab-based reader: The matlab reader yields points with greater precision. I tracked this down to the definition of the struct laszip_point in laszip_dll.h. The X, Y, and Z fields are defined as type laszip_I32. Why is that? The fields are stored as floating point in the files, so I don't understand the rationale for it. The difference was subtle in this case because the point values are fairly large, so most of the mantissa is in the integer portion of the value, and there's a 0.01 scale factor associated with it, so I get two decimal digits of precision after applying the scale factor. But I'm concerned that in other data sets this won't be the case. And even if it is the case, it makes it difficult to regression test C++-based algorithms against the Matlab prototype when the C-based reader is needlessly truncating floats as integers. So, my questions are:

1. Why?!?!?
2. Is there a work-around? An alternate API? A newer version?
3. Is there another 3rd-party LAS file reader that doesn't do this, which I should use instead of laszip?

Thanks,
Chris

Martin Isenburg

unread,
May 24, 2015, 1:50:16 PM5/24/15
to LAStools - efficient command line tools for LIDAR processing

Hello,

There is an "infamous" discussion on this topic in the LiDAR CLICK bulletin board that the USGS decided to shut down over the usual "security concerns" of a paranoid government. (-: However, this made-up "threat" lives on as an interesting "thread" in this Web archive:

http://web.archive.org/web/20111018220017/https://lidarbb.cr.usgs.gov/index.php?showtopic=538

Executive summary: Do not use a floating-point format for storing linear entities that are sampled with uniform precision such as x, y, z, GPS time, R, G, B, intensity, water depth, normalized reflectance, echo width, height above ground, ... (-: Use a fixed-point format (i.e. scaled and offset integers) and if you want a larger range then use more bits for your integers (e.g. I64 instead of I32) but do not store coordinates in an IEEE floating-point format (unless you are subtracting any large offset first).

I suggest you also read the official LAS specification that defines those integers and the corresponding offsets and scale factors used by LASzip as well as by LADlib to losslessly write the LAS and LAZ points as fixed-point numbers ...

Regards,

Martin @LAStools

Chris Volpe

unread,
May 25, 2015, 1:24:52 PM5/25/15
to last...@googlegroups.com
Hi Martin-

Thanks so much for your quick reply and for all the work you've done for this community. I agree completely with all your points about the reasons why fixed-point representations are better for storing this type of data. My question is now moot because it was based on a misunderstanding on my part: I thought that the LAS file format stored these values as floating point, and that the reader was reading them in and truncating them. That's not the case. So, my question was based on a false premise, and I apologize for wasting the group's time,

-Chris

Martin Isenburg

unread,
Jun 4, 2015, 3:57:57 PM6/4/15
to LAStools - efficient command line tools for LIDAR processing
Hello Chris,

noone's time was wasted. The great design decision for LAS/LAZ by those early LiDAR pioneers to store fixed-point numbers instead of floating-point numbers is one that we cannot repeat often enough. Floating point numbers are great for computations in high-dynamic ranges and close to zero. The resolution at which they store numbers varies widely depending on the number. Close to zero the resolution is incredible high. But far from zero - like in the hunder thousands and in the millions - it is rather poor ... and that is where our UTM coordinates typially lie.

The correct way to store a spatial coordinate in the fields of airborne / mobile / terretrial LiDAR is with a user defined resolution of either centimeter (0.01), millimeter (0.001), or maybe 0.1 millimeter (0.001) for projected coordinates and 100th nanodegree (1e-7) or 10th nanodegree (1e-8) resolution for geometric coordinates.

The default export resolution of Cyclone into the PTS ASCII format seems to be six decimal digits. That always makes me laugh because that means these scans are stored with micrometer resolution. A human hair has a diamater of 40 to 120 micrometer. Hence - in theory - Cyclone exports sufficient resolution in order to calculate the exact volume of each hair of a person standing in the scanned area. Or - in theory - enough resolution to create a highly accurate finite element mesh from each individual hair and then run a simulation to measure the wind-resistance of your hair-style (assuming your brand of hair gel is known too), Hey ... the amount of applied gel might even be measurable with that resolution. But in practise you are just storing an enourmous number of noise that increases your files size, slows your parsing, and lowers compression.

You see the topic of "resolution fluff" is dear to my heart and thanks for raising it again ... (-:

Regards,

Martin @rapidlasso

--
http://rapidlasso,com - fast tools for less fluffy LiDARs

 
Reply all
Reply to author
Forward
0 new messages