USGS Earth Explorer: LiDAR Point Order matters biggly for LASzip compression

96 views
Skip to first unread message

Martin Isenburg

unread,
Aug 4, 2017, 12:02:43 PM8/4/17
to LAStools - efficient command line tools for LIDAR processing
Hello,

I watched this nice video on how to download LiDAR from the USGS Earth Explorer. 


But then I saw the moment that the files were downloaded and I was initially shocked to see that the LAZ file was hardly smaller than the winzipped LAS file. So I decided to investigate by following the steps in the video to download the exact same tile (see attached picture series). 

Turns our the points are in a horrible order for compression. They are ordered by elevation from top to bottom. Does anyone know who is behind these LiDAR deliveries? Or who is in charge for the LiDAR holdings on the USGS Earth Explorer This is from quite a while ago but it would still make sense to improve the files as the download sizes shrinks by over 60 percent. Maybe it helps to note that the files name "Aero-Metric LASLib" as their generating software. 

A simple sort by GPS time and return number reduces the compressed LAZ file size from 56 MB to 22 MB. The reason for this incredible gain is that a sort in z order pretty much destroys all predictive rules used by LASzip:


lassort -i CO_ArkansasValley_2010_000536.laz^
            -gps_time -return_number ^
            -odix _sorted -olaz

Here the file sizes for comparison:

  85,144,477 CO_ArkansasValley_2010_000536.ZIP
162,272,731 CO_ArkansasValley_2010_000536.las
  56,629,291 CO_ArkansasValley_2010_000536.laz
  22,292,134 CO_ArkansasValley_2010_000536_sorted.laz

Regards,

Martin @rapidlasso

-------------

E:\LAStools\bin>lasinfo CO_ArkansasValley_2010_000536.laz
lasinfo (170802) report for CO_ArkansasValley_2010_000536.laz
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            1
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          'MODIFICATION                    '
  generating software:        'Aero-Metric LASLib              '
  file creation day/year:     41/2011
  header size:                227
  offset to point data:       915
  number var. length records: 5
  point data format:          1
  point data record length:   28
  number of point records:    5795422
  number of points by return: 3740148 1266858 588933 199483 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  394500.00 4353000.00 3137.81
  max x y z:                  395999.98 4354499.98 3460.21
variable length header record 1 of 5:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34735
  length after header  200
  description          'GeoTiff Projection Keys'
    GeoKeyDirectoryTag version 1.1.0 number of keys 24
      key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
      key 2048 tiff_tag_location 0 count 1 value_offset 32767 - GeographicTypeGeoKey: user-defined
      key 2049 tiff_tag_location 34737 count 46 value_offset 53 - GeogCitationGeoKey: NAD83(HARN)|datum: D_North_American_1983_HARN
      key 2050 tiff_tag_location 0 count 1 value_offset 32767 - GeogGeodeticDatumGeoKey: user-defined
      key 2051 tiff_tag_location 0 count 1 value_offset 8901 - GeogPrimeMeridianGeoKey: PM_Greenwich
      key 2054 tiff_tag_location 0 count 1 value_offset 9102 - GeogAngularUnitsGeoKey: Angular_Degree
      key 2055 tiff_tag_location 34736 count 1 value_offset 9 - GeogAngularUnitSizeGeoKey: 0.01745329252
      key 2056 tiff_tag_location 0 count 1 value_offset 7019 - GeogEllipsoidGeoKey: Ellipse_GRS_1980
      key 2057 tiff_tag_location 34736 count 1 value_offset 6 - GeogSemiMajorAxisGeoKey: 6378137
      key 2059 tiff_tag_location 34736 count 1 value_offset 7 - GeogInvFlatteningGeoKey: 298.2572221
      key 2061 tiff_tag_location 34736 count 1 value_offset 8 - GeogPrimeMeridianLongGeoKey: 0
      key 3072 tiff_tag_location 0 count 1 value_offset 32767 - ProjectedCSTypeGeoKey: user-defined
      key 3073 tiff_tag_location 34737 count 27 value_offset 0 - PCSCitationGeoKey: NAD83(HARN) / UTM zone 13N
      key 3074 tiff_tag_location 0 count 1 value_offset 32767 - ProjectionGeoKey: user-defined
      key 3075 tiff_tag_location 0 count 1 value_offset 1 - ProjCoordTransGeoKey: CT_TransverseMercator
      key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
      key 3077 tiff_tag_location 34736 count 1 value_offset 5 - ProjLinearUnitSizeGeoKey: 1
      key 3081 tiff_tag_location 34736 count 1 value_offset 4 - ProjNatOriginLatGeoKey: 0
      key 3082 tiff_tag_location 34736 count 1 value_offset 0 - ProjFalseEastingGeoKey: 500000
      key 3083 tiff_tag_location 34736 count 1 value_offset 1 - ProjFalseNorthingGeoKey: 0
      key 3088 tiff_tag_location 34736 count 1 value_offset 2 - ProjCenterLongGeoKey: -105
      key 3092 tiff_tag_location 34736 count 1 value_offset 3 - ProjScaleAtNatOriginGeoKey: 0.9996
      key 4097 tiff_tag_location 34737 count 26 value_offset 27 - VerticalCitationGeoKey: NAVD88 - Geoid09 (Meters)
      key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
variable length header record 2 of 5:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34736
  length after header  80
  description          'GeoTiff double parameters'
    GeoDoubleParamsTag (number of doubles 10)
      500000 0 -105 0.9996 0 1 6.37814e+006 298.257 0 0.0174533
variable length header record 3 of 5:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34737
  length after header  100
  description          'GeoTiff ASCII parameters'
    GeoAsciiParamsTag (number of characters 100)
      NAD83(HARN) / UTM zone 13N|NAVD88 - Geoid09 (Meters)|NAD83(HARN)|datum: D_North_American_1983_HARN|
variable length header record 4 of 5:
  reserved             43707
  user ID              'NIIRS10'
  record ID            4
  length after header  10
  description          'NIIRS10 Timestamp'
variable length header record 5 of 5:
  reserved             43707
  user ID              'NIIRS10'
  record ID            1
  length after header  26
  description          'NIIRS10 Tile Index'
the header is followed by 2 user-defined bytes
LASzip compression (version 2.1r0 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
  X            39450000   39599998
  Y           435300000  435449998
  Z              313781     346021
  intensity           1       4095
  return_number       1          4
  number_of_returns   1          4
  edge_of_flight_line 0          1
  scan_direction_flag 0          1
  classification      1          7
  scan_angle_rank   -22         22
  user_data          32         32
  point_source_ID   135        139
  gps_time -32257557.295246 -32255969.434287
number of first returns:        3740148
number of intermediate returns: 788412
number of last returns:         3740285
number of single returns:       2473423
overview over number of returns of given pulse: 2473423 1355713 1168420 797866 0 0 0
histogram of classification of points:
         4267975  unclassified (1)
         1527336  ground (2)
             111  noise (7)
usgs_explorer_01.jpg
usgs_explorer_02.jpg
usgs_explorer_03.jpg
usgs_explorer_04.jpg

Martin Isenburg

unread,
Aug 18, 2017, 7:51:39 AM8/18/17
to LAStools - efficient command line tools for LIDAR processing
Hello,

to make it easier for LiDAR portals to not only compress their LiDAR data prior to distribution but also to improve the file for better compression and coherency I have added a new tool in the latest version (170818) called "lasoptimize" that is available for download now:


Here a small sample run on the same file mentioned in the earlier post:

E:\LAStools\bin>lasoptimize -i CO_ArkansasValley_2010_000536.laz -odix _opt
17.087 secs to write 22686697 bytes for 'CO_ArkansasValley_2010_000536_opt.laz' with 5795422 points of type 1

E:\LAStools\bin>dir CO_Ar*
08/04/2017 05:22 PM 56,629,291 CO_ArkansasValley_2010_000536.laz
08/18/2017 01:47 PM 22,686,697 CO_ArkansasValley_2010_000536_opt.laz

In the default setting the tool will do the following:

  (1) remove fluff in coordinate resolution (i.e. when all X, Y, or Z coordinates are multiples of 10, 100, or 1000).
  (2) remove any additional padding in the LAS header or before the point block
  (3) set nicely rounded offsets in the LAS header
  (4) zero the contents of the user data field
  (5) turn (sufficiently small) EVLRs into VLRs (LAS 1.4 only)
  (6) rearrange points for better compression and spatial indexing

The GUI or command switches allow turning off options (1) through (5).

For multi-beam systems (RIEGL Q-1560, Leica ALS80, Optech Pegasus) that produce returns with (near-)identical GPS times with laser beams that point into different directions, it is important to tell lasoptimize how the beam ID that is called 'scanner channel' in the new LAS 1.4 point types is coded. When older point types are used or when the 'scanner channel' is mapped into the flight line numbering or stored in the user data field it's wise to tell lasoptimize that either '-scanner_channel_in_point_source_ID' or that '-scanner_channel_in_user_data' as compression will also suffer biggly when beam switches are ignored. 

Regards,

Martin

David Friedrich

unread,
Aug 18, 2017, 12:04:44 PM8/18/17
to last...@googlegroups.com
Hi Martin,

does LASZip re-arrange the point order automatically before writing the file? And if not, wouldn't it be better to enforce a disk space efficient ordering during LASZip export instead of having a second tool?
If somebody needs the original point order you could provide an option like -keep_porder for laszip.

Just an idea of mine.

Cheers from Chile,
David

Martin Isenburg

unread,
Aug 18, 2017, 12:14:38 PM8/18/17
to LAStools - efficient command line tools for LIDAR processing
Hello David,

I disagree. I think the laszip tool is the wrong place to add such an optimizer. The laszip tool is a 100% lossless compressor. Being completely lossless is what has given the people confidence to use the laszip tool. The file is exactly the same after a LAS -> LAZ -> LAS round-trip. After all, the point order may also contain information. For example, you would not expect Winzip to rearrange the letters in your document into a more compressible order because the run-length coding really works a lot better on a text that goes like "aaaaaaaaaaabbbbcccccccddddd...." (-:

The laszip tool compresses without any loss whichever point cloud they gave it. It is in general not okay to simply change the point order without the user requesting it explicitly. There could be a hidden message in the order of points. There could be a clever pyramiding in the point order. There could be a spatial indexing based upon the point order. There could be an incredibly important external reference that assumes a particular point to be at position 23,221,022. The particular point order that started this thread was very hard-to-compress (like any axis sort) but the original files were faithfully preserved.

As n points can be arranged in n! ways the amount of information needed to encode the point order increases super-linear with the number of points. You can even hide a message (steganography) just by reordering the points in respect to a canonical reference order. The number of bits you can hide is nlogn and therefore grows super-linear with the number of points in the file.

But this is probably more than you wanted to know ... (-:

By keeping simple compression and optimization for distribution in separate tools the choice is clearly left to the user so that an important point order or an important header padding is not accidentally lost.

Martin

David Friedrich

unread,
Aug 18, 2017, 1:09:57 PM8/18/17
to last...@googlegroups.com
Hi Martin,

thank you for the explanation, now i understand your point of view.
In this case, why not just add all the options to laszip so the user can opt-in if he wants to.

Nevertheless at the end it doesn't matter, because its more like a personal preference to have everything under one single hood.
It is more important to have the option than where to find it.

Cheers,

Evon Silvia

unread,
Aug 18, 2017, 1:42:42 PM8/18/17
to last...@googlegroups.com
Hi Martin,

For what it's worth I'll back up your assertion that laszip should by default maintain the point order, for all the reasons that you stated. Out of curiosity, is lasorganize part of the open-source LAStools suite (and therefore free to redistribute unaltered), or is it considered closed-source?

Regarding David's sentiment that it's nice to have everything under one roof, I'll just note that LAZ is a viable output option for lasorganize, so everything can be done in one step there instead of using LASzip. :-)

Thanks again Martin. Clever idea, and thanks for providing toggles for all the optimizations! Small nitpick: I foresee a lot of potential issues from people using option #4 with LAS 1.2 without realizing that they're wiping out the channel information often stored there. If it was me, I'd probably default that option OFF because it represents potential data loss, but you understand your userbase better than I.

Evon
--
Evon Silvia PLS
QSI Solutions Developer
ASPRS LAS Working Group Chair

Quantum Spatial
517 SW 2nd Street, Suite 400, Corvallis, OR 97333



Martin Isenburg

unread,
Aug 18, 2017, 1:58:24 PM8/18/17
to LAStools - efficient command line tools for LIDAR processing
Hello Evon,

Out of curiosity, is lasorganize part of the open-source LAStools suite (and therefore free to redistribute unaltered), or is it considered closed-source?

Interesting name. Maybe a future tool? But this one is called 'lasoptimize' which is free to use and distribute (including commercial use) but the source code is not open. See http://lastools.org/LICENSE.txt for detail.

Regarding David's sentiment that it's nice to have everything under one roof, I'll just note that LAZ is a viable output option for lasorganize, so everything can be done in one step there instead of using LASzip. :-)

Compressed output is the default output option of 'lasoptimize'.
 
Small nitpick: I foresee a lot of potential issues from people using option #4 with LAS 1.2 without realizing that they're wiping out the channel information often stored there. If it was me, I'd probably default that option OFF because it represents potential data loss, but you understand your user base better than I.

Is that so? Well ... this can be changed in the future. I plan for a few other improvements but let's see how this is received. Note that using lasheight with the default options will also "wipe-out" information in the user data field as the heigh above ground in decimeter will be stored there. It's called *user* data for a reason ... (-:

I usually advise people to encode this in the flight line number that is stored in the 16 bit "point source ID". A simple option is to use the lowest digit for the scanner channel / beam ID and the higher digits for the flight line number. Here an example:

flightline 22 channel 1 = point source ID 221
flightline 22 channel 2 = point source ID 222
[...]
flightline 37 channel 1 = point source ID 371
flightline 37 channel 2 = point source ID 372

Regards,

Martin @rapidlasso

Evon Silvia

unread,
Aug 18, 2017, 2:32:56 PM8/18/17
to last...@googlegroups.com
My mistake on lasorganize vs lasoptimize. Happy Friday? 

Useful to know your convention re: PtSourceID. Not sure we can adopt it since massive projects occasionally venture into the tens of thousands of flightlines, but I like knowing what others folks do. I understand you can't always count on the User Data field to be preserved since it's "use"ful for so many other uses. :) 

Evon

Olsen, Richard (RC) (CIV)

unread,
Aug 18, 2017, 3:30:44 PM8/18/17
to last...@googlegroups.com

Martin,

I believe there are sort tools as part of lastools, that you give a variety of options  (I should check with Jeremy, but he is off today).

As such, what would be the command line option that would, for example, sort on time, then pipe to laz ?   

 

rco

 

RC Olsen

Naval Postgraduate School

Evon Silvia

unread,
Aug 22, 2017, 2:55:05 PM8/22/17
to last...@googlegroups.com
Hello Martin,

Just downloaded the 170818 version of LAStools from your website. When I attempt to use lasoptimize while disabling the "zero user data" option, I get the error "cannot understand argument '-do_not_zero_user_data'". Is it possible the args weren't added?

Evon

Martin Isenburg

unread,
Aug 23, 2017, 5:13:19 PM8/23/17
to LAStools - efficient command line tools for LIDAR processing
Hello Evon,

Sorry about that. The functionality is in fact there but I forgot to add the command line parsing of that particular option. This will be fixed in the next release. Note that indicating to "lasoptimize" that the scanner channel is coded into the user data attribute with "-scanner_channel_in_user_data" will also protect the field from being zeroed in this next release.

Regards,

Martin @rapidlasso
Reply all
Reply to author
Forward
0 new messages