finding and fixing crappy LAS/LAZ files

2,368 views
Skip to first unread message

Martin Isenburg

unread,
Feb 13, 2013, 6:12:01 AM2/13/13
to LAStools - efficient tools for LiDAR processing
Hello folks,

the problem with the "crappy" LAS files I am addressing in this
message is that they (a) do not compress well to LAZ files, (b)
sometimes cause troubles in LAStools, (c) make you look a bit
unprofessional when you produce and distribute such files, and (d) -
most importantly - do not please my aesthetic needs. (-:

Below are the lasinfo outputs of two of these files that caused
troubles recently. You notice them due to their unnecessarily precise
scale factors that are fine enough to store the scan of a human hair.

scale factor x y z: 4.26875e-007 5.23e-007 2.54059e-008
scale factor x y z: 1.0055e-006 9.355e-007 4.73613e-008

As a general rule these scaling factors should be no smaller than 0.01
for aerial surveys, no smaller than 0.001 for mobile surveys, and no
smaller than 0.0001 for the highly-precise terrestrial scans. These
numbers are already very very conservative and usually can be
increased. One exception: if the Easting and the Northing are
expressed in Longitude and Latitude then a scaling factor of 1e-7 for
the x coordinate and the y coordinate is required.

Another thing that I often see are very ugly offsets. Take a look
below. This offset is added to all points and its many decimal digits
means that all points are being slightly translated by this
meaningless fractional offset. The offset is supposed to be some large
number that is subtracted from all points before they are stored. This
moves the origin closer to the stored points to make the scaled-
integer storage used by the LAS format more robust (aka avoid integer
overflows).

offset x y z: 604107.62500244146 4209893
1332.2413330841064
offset x y z: 381322.40624984744 4338247.5
2907.6025392913816

So the following offsets would do the job just as well

offset x y z: 600000 4210000 0
offset x y z: 381000 4338000 3000

Other things that are bad? The return counts and numbers are not
populated for "proctor_2012.laz" and "Pine_Ridge_2012.laz" and these
files use an unnecessarily bloated point format 3 although they have
no GPS time stamp so could get away with the 8 bytes smaller point
format 2! How to fix this? EasyPeasy LazyToolsy:

las2las -i proctor_2012.laz ^
-rescale 0.01 0.01 0.01 ^
-auto_reoffset ^
-repair_zero_returns ^
-set_point_data_format 2 ^
-o proctor_2012_fixed.laz

or

las2las -i Pine_Ridge_2012.laz ^
-rescale 0.01 0.01 0.01 ^
-auto_reoffset ^
-repair_zero_returns ^
-set_point_data_format 2 ^
-o Pine_Ridge_2012_fixed.laz

or

las2las -i input\*.laz ^
-rescale 0.01 0.01 0.01 ^
-auto_reoffset ^
-repair_zero_returns ^
-set_point_data_format 2 ^
-odir output\ -olaz

Now the files are nice (see lasinfo before and after) and the
compression gains are substantial (see below).

8,676,598 proctor_2012.laz
4,248,552 proctor_2012_fixed.laz

1,173,141 Pine_Ridge_2012.laz
609,704 Pine_Ridge_2012_fixed.laz

Cheers,

Martin @lastools

--
http://rapidlasso.com - easypeasy tools to fix LiDARs

-----------------------------------------------------------------------
-----------------------------------------------------------------------
-----------------------------------------------------------------------

C:\lastools\bin>lasinfo proctor_2012.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 0 0 0 ''
version major.minor: 1.2
system identifier: 'libLAS'
generating software: 'libLAS 1.7.0'
file creation day/year: 23/2013
header size: 227
offset to point data: 227
number var. length records: 0
point data format: 3
point data record length: 34
number of point records: 1064634
number of points by return: 0 0 0 0 0
scale factor x y z: 4.26875e-007 5.23e-007 2.54059e-008
offset x y z: 604107.62500244146 4209893
1332.2413330841064
min x y z: 604107.62500244146 4209893
1332.2413330841064
max x y z: 604534.50000053411 4210416
1357.6472168731689
LASzip compression (version 2.1r0 c2 50000): POINT10 2 GPSTIME11 2
RGB12 2
reporting minimum and maximum for all LAS point record entries ...
X 0 999999996
Y 0 1000000000
Z 0 1000000000
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 0 0
number_of_returns_of_given_pulse 0 0
return_number 0 0
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 0 0
gps_time 0.000000 0.000000
Color R 2304 64512
G 0 65280
B 6144 65280
WARNING: 1 points outside of header bounding box
WARNING: there are 1064634 points with return number 0
WARNING: there are 1064634 points with a number of returns of given
pulse of 0
histogram of classification of points:
1064634 Created, never classified (0)
real max x larger than header max x by 0.000000

-----------------------------------------------------------------------

C:\lastools\bin>lasinfo proctor_2012_fixed.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 0 0 0 ''
version major.minor: 1.2
system identifier: 'LAStools (c) by Martin Isenburg'
generating software: 'las2las (version 130213)'
file creation day/year: 23/2013
header size: 227
offset to point data: 227
number var. length records: 0
point data format: 2
point data record length: 26
number of point records: 1064634
number of points by return: 1064634 0 0 0 0
scale factor x y z: 0.01 0.01 0.01
offset x y z: 600000 4200000 0
min x y z: 604107.63 4209893.00 1332.24
max x y z: 604534.50 4210416.00 1357.65
LASzip compression (version 2.1r0 c2 50000): POINT10 2 RGB12 2
reporting minimum and maximum for all LAS point record entries ...
X 410763 453450
Y 989300 1041600
Z 133224 135765
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 0 0
number_of_returns_of_given_pulse 1 1
return_number 1 1
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 0 0
Color R 2304 64512
G 0 65280
B 6144 65280
overview over number of returns of given pulse: 1064634 0 0 0 0 0 0
histogram of classification of points:
1064634 Created, never classified (0)

-----------------------------------------------------------------------
-----------------------------------------------------------------------
-----------------------------------------------------------------------

C:\lastools\bin>lasinfo Pine_Ridge_2012.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 0 0 0 ''
version major.minor: 1.2
system identifier: 'libLAS'
generating software: 'libLAS 1.7.0'
file creation day/year: 24/2013
header size: 227
offset to point data: 227
number var. length records: 0
point data format: 3
point data record length: 34
number of point records: 100572
number of points by return: 0 0 0 0 0
scale factor x y z: 1.0055e-006 9.355e-007 4.73613e-008
offset x y z: 381322.40624984744 4338247.5
2907.6025392913816
min x y z: 381322.40624984744 4338247.5
2907.6025392913816
max x y z: 382327.90622314456 4339183
2954.9638674163816
LASzip compression (version 2.1r0 c2 50000): POINT10 2 GPSTIME11 2
RGB12 2
reporting minimum and maximum for all LAS point record entries ...
X 0 999999973
Y 0 1000000000
Z 0 1000000000
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 0 0
number_of_returns_of_given_pulse 0 0
return_number 0 0
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 0 0
gps_time 0.000000 0.000000
Color R 256 62464
G 256 57600
B 0 64512
WARNING: there are 100572 points with return number 0
WARNING: there are 100572 points with a number of returns of given
pulse of 0
histogram of classification of points:
100572 Created, never classified (0)

-----------------------------------------------------------------------

C:\lastools\bin>lasinfo Pine_Ridge_2012_fixed.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 0 0 0 ''
version major.minor: 1.2
system identifier: 'LAStools (c) by Martin Isenburg'
generating software: 'las2las (version 130213)'
file creation day/year: 24/2013
header size: 227
offset to point data: 227
number var. length records: 0
point data format: 2
point data record length: 26
number of point records: 100572
number of points by return: 100572 0 0 0 0
scale factor x y z: 0.01 0.01 0.01
offset x y z: 300000 4300000 0
min x y z: 381322.41 4338247.50 2907.60
max x y z: 382327.91 4339183.00 2954.96
LASzip compression (version 2.1r0 c2 50000): POINT10 2 RGB12 2
reporting minimum and maximum for all LAS point record entries ...
X 8132241 8232791
Y 3824750 3918300
Z 290760 295496
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 0 0
number_of_returns_of_given_pulse 1 1
return_number 1 1
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 0 0
Color R 256 62464
G 256 57600
B 0 64512
overview over number of returns of given pulse: 100572 0 0 0 0 0 0
histogram of classification of points:
100572 Created, never classified (0)

Martin Isenburg

unread,
Feb 22, 2013, 12:19:35 AM2/22/13
to LAStools - efficient tools for LiDAR processing
Hello,

i just received another "crappy" LAS file. Does anyone know the
technical folks at Agisoft PhotoScan? I really would like to contact
them and fix their LAS exporter before they produce more of these
dense-matching, near-LiDAR-style point clouds with awful scaling
factors and awful offsets (see below). Their exporter essentially
squeezes the points into a cube that makes full use of the 31 positive
bits of a signed integer by translating the near lower left corner to
(0/0/0) via the offset and by stretching each coordinate range to
extend across the maximal positive integer extent possible via the
scaling factors. Hence the x, y, and z coordinates each range from 0
to 2,147,483,647. Beats me why they were not really consequent in
getting the "maximum possible resolution by moving the near lower left
corner to the (-2,147,483,648/-2,147,483,648/-2,147,483,648) and then
use the full 32 bits going from -2,147,483,648 to 2,147,483,647 for
the coordinates ... (-:

Below is the command to fix it followed by the lasinfo outputs for the
awful original LAS and full of awe repaired LAZ ... (-;

Regards,

Martin @rapidlasso

--
http://rapidlasso.com - fast tools for full of awe LiDARs

las2las -i small.las ^
-rescale 0.01 0.01 0.01 ^
-auto_reoffset ^
-o small_fixed.laz

+++++++++++++++++++++++++++++++++++++++++++++++++++
+++ original file
+++++++++++++++++++++++++++++++++++++++++++++++++++

C:\lastools\bin>lasinfo small.las
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 1
global_encoding: 0
project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000
version major.minor: 1.2
system identifier: 'Agisoft PhotoScan'
generating software: 'Agisoft PhotoScan'
file creation day/year: 52/2013
header size: 227
offset to point data: 321
number var. length records: 1
point data format: 2
point data record length: 26
number of point records: 51842
number of points by return: 51842 0 0 0 0
scale factor x y z: 2.01349e-007 1.14195e-007 5.6572e-007
offset x y z: 583747.62815681368 1243626.6946355633
-1126.5122044803559
min x y z: 583747.62815681368 1243626.6946355633
-1126.5122044803559
max x y z: 584180.02082825685 1243871.9264777289
88.362037669460065
variable length header record 1 of 1:
reserved 0
user ID 'LASF_Projection'
record ID 34735
length after header 40
description ''
GeoKeyDirectoryTag version 1.1.0 number of keys 4
key 1024 tiff_tag_location 0 count 1 value_offset 1 -
GTModelTypeGeoKey: ModelTypeProjected
key 1025 tiff_tag_location 0 count 1 value_offset 1 -
GTRasterTypeGeoKey: RasterPixelIsArea
key 3072 tiff_tag_location 0 count 1 value_offset 32618 -
ProjectedCSTypeGeoKey: PCS_WGS84_UTM_zone_18N
key 3076 tiff_tag_location 0 count 1 value_offset 9001 -
ProjLinearUnitsGeoKey: Linear_Meter
reporting minimum and maximum for all LAS point record entries ...
X 0 2147483647
Y 0 2147483647
Z 0 2147483647
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 1 1
number_of_returns_of_given_pulse 1 1
return_number 1 1
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 1 1
Color R 3584 65280
G 4096 65024
B 5376 65024
overview over number of returns of given pulse: 51842 0 0 0 0 0 0
histogram of classification of points:
51842 Created, never classified (0)

+++++++++++++++++++++++++++++++++++++++++++++++++++
+++ fixed file
+++++++++++++++++++++++++++++++++++++++++++++++++++

C:\lastools\bin>lasinfo small_fixed.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 1
global_encoding: 0
project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000
version major.minor: 1.2
system identifier: 'LAStools (c) by Martin Isenburg'
generating software: 'las2las (version 130221)'
file creation day/year: 52/2013
header size: 227
offset to point data: 321
number var. length records: 1
point data format: 2
point data record length: 26
number of point records: 51842
number of points by return: 51842 0 0 0 0
scale factor x y z: 0.01 0.01 0.01
offset x y z: 500000 1200000 0
min x y z: 583747.63 1243626.69 -1126.51
max x y z: 584180.02 1243871.93 88.36
variable length header record 1 of 1:
reserved 0
user ID 'LASF_Projection'
record ID 34735
length after header 40
description ''
GeoKeyDirectoryTag version 1.1.0 number of keys 4
key 1024 tiff_tag_location 0 count 1 value_offset 1 -
GTModelTypeGeoKey: ModelTypeProjected
key 1025 tiff_tag_location 0 count 1 value_offset 1 -
GTRasterTypeGeoKey: RasterPixelIsArea
key 3072 tiff_tag_location 0 count 1 value_offset 32618 -
ProjectedCSTypeGeoKey: PCS_WGS84_UTM_zone_18N
key 3076 tiff_tag_location 0 count 1 value_offset 9001 -
ProjLinearUnitsGeoKey: Linear_Meter
LASzip compression (version 2.1r0 c2 50000): POINT10 2 RGB12 2
reporting minimum and maximum for all LAS point record entries ...
X 8374763 8418002
Y 4362669 4387193
Z -112651 8836
intensity 0 0
edge_of_flight_line 0 0
scan_direction_flag 1 1
number_of_returns_of_given_pulse 1 1
return_number 1 1
classification 0 0
scan_angle_rank 0 0
user_data 0 0
point_source_ID 1 1
Color R 3584 65280
G 4096 65024
B 5376 65024
overview over number of returns of given pulse: 51842 0 0 0 0 0 0
histogram of classification of points:
51842 Created, never classified (0)
> --http://rapidlasso.com- easypeasy tools to fix LiDARs

Martin Isenburg

unread,
Feb 22, 2013, 6:54:59 PM2/22/13
to LAStools - efficient tools for LiDAR processing
Hello,

Not sure which LAS file you are referring to but "small.las" was -
according to it's author - created like this:

Here's what I did. Hope you're familiar with ArcGIS.

1) Exported my Agisoft points in .las format and in UTM 18N
projection.
2) Downloaded you're software and extracted the contents.
3) Opened ArcGIS 10.0 and added the LAStools to the tools in ArcGIS.
4) Used lasground tool.
5) As input file I selected the .las I exported from agisoft
6) The file is in meters so I left the feet boxes unchecked. I finish
the process and the file is created although it has exactly the same
size as the one before.
7) After that I tried to create a DEM using blast2dem to check if the
vegetation was actually removed.
...

Cheers,

Martin
On Feb 22, 10:57 pm, Tom Noble <macsurv...@gmail.com> wrote:
> Hello,
>
> Just to clarify. The LAS files that Martin is referring to, and masterfully
> showed how to fix, did not come directly from Agisoft Photoscan. A binary
> PLY file was exported from Photoscan and then converted to LAS using the
> wonderful software CloudCompare.
>
> Regards,
>
> Tom
>
>
>
> On Thursday, February 21, 2013 10:19:35 PM UTC-7, Martin Isenburg wrote:
>
> > Hello,
>
> > i just received another "crappy" LAS file. Does anyone know the
> > technical folks at Agisoft PhotoScan? I really would like to contact
> > them and fix their LAS exporter before they produce more of these
> > dense-matching, near-LiDAR-style point clouds with awful scaling
> > factors and awful offsets (see below). Their exporter essentially
> > squeezes the points into a cube that makes full use of the 31 positive
> > bits of a signed integer by translating the near lower left corner to
> > (0/0/0) via the offset and by stretching each coordinate range to
> > extend across the maximal positive integer extent possible via the
> > scaling factors. Hence the x, y, and z coordinates each range from 0
> > to 2,147,483,647. Beats me why they were not really consequent in
> > getting the "maximum possible resolution by moving the near lower left
> > corner to the (-2,147,483,648/-2,147,483,648/-2,147,483,648) and then
> > use the full 32 bits going from -2,147,483,648 to 2,147,483,647 for
> > the coordinates ... (-:
>
> > Below is the command to fix it followed by the lasinfo outputs for the
> > awful original LAS and full of awe repaired LAZ ... (-;
>
> > Regards,
>
> > Martin @rapidlasso
>
> > --
> >http://rapidlasso.com- fast tools for full of awe LiDARs
> >         B5376 65024
> >         B5376 65024
> ...
>
> read more »

Martin Isenburg

unread,
Mar 1, 2013, 2:57:45 PM3/1/13
to LAStools - efficient tools for LiDAR processing
Hello,

in the meantime I have been exchanging emails with Dmitry Semyonov from Agisoft who have a very nice user forum at http://www.agisoft.ru/forum/ where you can ask in the future about any LAS file issues due to scale factos and offsets ... (-;

It seems that all reported issues have been fixed except the (what I consider "crappy") scaling and offsetting will continue to be a feature of the LAS files generated by Agisoft. I include the reasoning they gave me below. Needless to say that I do not agree with their argument. It should be possible to calculate the resolution of the point coordinates that their software produces by rigorously quantifying the error in the correlation method and then setting the scale factor accordingly.

The "not worth of any (even minor) loss in accuracy" argument seems to be a curse that has followed me from academia, over committee work, up to my work today. (-; There really is no loss in accuracy. Their way of scaling to the bounding box extends merely adds many random low-order bits to the coordinate of each point by shifting them to arbitrary positions on a ultra-fine grid that is so precise that it could be used to model the surface and diameter of each individual hair of every person within the survey area. (-:

I also disagree with the assessment on not needing human readable offset values. Often the points and their bounding box is converted to ASCII or needs to be summarized in a quality check report or the metadata. Having an extra 10 random fractional digits translates all data point by a random fraction. Due to the lack of proper scaling factors there in no guidance on how many decimal digits are required to store the numbers to ASCII. This results in grossly verbose ASCII files and hard to read summaries.

Below Dmitry's two responses ...

Regards,

Martin @rapidlasso

------------

Thank you for contacting us and for detailed description of the issue. I will try to elaborate the choices we made for our current implementation.

1.       Offset and scaling

I perfectly agree that discarding least significant bits will improve compression of the resulting LAS files. Nevertheless, for us it looks very important to keep the full precision of point coordinates in the exported cloud. In this case utilization of the full data range supported by LAS specification seems reasonable. Please note that PhotoScan supports processing of different kinds of images, from high altitude aerial photography to images of tiny objects captured by a microscope.

I believe that this choice complies with LAS specification, and as LAS files are binary anyway, human friendly offsets should not be required. Does it introduce any compatibility problems?

2.       The return counts and numbers are not populated

Thanks for pointing, we will fix that.

3.       files use an unnecessarily bloated point format 3 although they have no GPS time stamp

This was made intentionally to provide compatibility with Pointools View  software. For some reason the version we tested failed to display point  colors for LAS files saved in format 2, while format 3 files displayed
fine. We will check if the problem is fixed in the latest Pointools View version, and if Pointools View works with format 2 files now, we will  modify our exporter as well. Please let us know if there are any other issues with PhotoScan LAS files.

With best regards,

Dmitry Semyonov

AgiSoft LLC

------------ (and a follow-up a few hours later) -------------

Hello Martin,

We have checked the most recent Pointools version, and it seem to open LAS files of format 2 properly now (the colors are correctly displayed). So we have modified our LAS exporter to write format 2 LAS files. Also we have verified that return counts and numbers are written properly by PhotoScan (1 of 1) for all points. This behavior was from the very beginning, so the LAS files with 0 returns are generated by some other software, not PhotoScan.

As for scaling and offsets, we still believe that 2x gain in compression is not worth of any (even minor) loss in accuracy. If our current implementation with using 0 – 2147483647 data range causes any compatibility problems, we will be happy to review it and make any required modifications, but we would like to use as large data range as possible for PhotoScan exports. Please note that PhotoScan used 0 – 2147483647 data range from the very beginning, so LAS files with 0 -  1000000000 data range are generated by some other software.

With best regards,

Dmitry Semyonov

AgiSoft LLC

Mike Childs

unread,
Mar 1, 2013, 3:03:47 PM3/1/13
to Martin Isenburg
Hello Martin,

One thought for compressing those, assuming they multiple all of the values by some fixed scaling factor to fill out the full 32-bit coordinate space, when compressing in LAZ could you check for a greatest common divisor for all of the X, Y, or Z values in a compression segment? Assuming there is one, you could store that once for the whole packet, then compress the values divided by that with no loss at all. Then when you uncompress you can re-apply that greatest common divisor. All of this could be done seamless within the LAZ compression/decompression library to achieve almost exactly the same compression as the unscaled data, you just need a few extra bytes per compression stream for the multiplier.

Thanks,

Mike Childs
Global Mapper Guru
Blue Marble Geographics
Parker, CO USA
 
Global Mapper 14.1 Get it Now!
Massive Point Cloud Support, Dongle Licensing, and Much More
Please visit www.bluemarblegeo.com for more information
The information in this E-mail message is legally privileged and confidential information intended only for the use of the individual(s) named above. If you, the reader of this message, are not the intended recipient, you are hereby notified that you should not further disseminate, distribute, or forward this E-mail message. If you have received this E-mail in error, please notify the sender and delete all copies of the message promptly. Thank you.
----------------------- Original Message -----------------------
Reply all
Reply to author
Forward
0 new messages