Maximizing LAZ (LASzip) Compression

754 views
Skip to first unread message

Evon Silvia

unread,
Jun 27, 2017, 1:06:03 PM6/27/17
to last...@googlegroups.com
Hello,

Based on my experience with LAZ I've found that my LAS files can compress somewhere around 60-90% using the LAZ format. I'd like to maximize this compression for archiving and distribution purposes, and I know that the sorting method has an impact on the compression ratio.

Does anyone (Martin?) have any sorting strategies that they've played with that can maximize compression? Possible options...
  1. Sort by time
  2. Simple XYZ sort
  3. Spatial sort by small tiles (e.g., spatially coherent clusters of roughly 50k points)
  4. What about RGB/NIR bands?
  5. Some combination?
I don't care if a tool exists (I can write my own)... I just want to know which sorting schemes will maximize the compression ratio. Any input is welcome.

Thanks,
Evon
--
Evon Silvia PLS
QSI Solutions Developer
ASPRS LAS Working Group Chair

Quantum Spatial
517 SW 2nd Street, Suite 400, Corvallis, OR 97333



Kirk Waters - NOAA Federal

unread,
Jun 27, 2017, 1:19:11 PM6/27/17
to LAStools - efficient command line tools for LIDAR processing
Evon,
I've tried the xyz sort before and it doesn't compress very well. The sort by time works far better. If I understand how the compression works, you want to arrange the points such that you minimize the number of bits that change as you go from point to point. It seems to me like time would be a good choice for that, but maybe one of the others would too.

Kirk Waters, PhD                     | NOAA Office for Coastal Management
Applied Sciences Program      | 2234 South Hobson Ave
843-740-1227                          | Charleston, SC 29405    


Evon Silvia

unread,
Jun 27, 2017, 1:36:10 PM6/27/17
to last...@googlegroups.com
Kirk et al,

Thanks for the input. I can confirm your suggestion. Here are my findings so far for one large LAS 1.2 (0.5GB) example file...
  1. Sort by Flightline > Channel > Timestamp > Return # – 80%
  2. Sort by Timestamp > Return # > Channel > Flightline – 73%
  3. Sort by X > Y > Z > Return # – 61%
And for the same file in LAS 1.4...
  1. Sort by Flightline > Channel > Timestamp > Return # – 79%
  2. Sort by Timestamp > Return # > Channel > Flightline – 73%
  3. Sort by X > Y > Z > Return # – 59%
These results were achieved using the LASzip DLL version v2.4.r1.b150923, which still uses the LAS 1.4 compatibility mode. A little old, but still reliable.

I think time works out well because points that are adjacent in time also have very similar scan angles and are somewhat spatially coherent. Sorting simply by XYZ without tiling removes these guarantees. Perhaps grouping into small tiles would help the spatial sort?

Any other ideas? 

Evon

Kirk Waters - NOAA Federal

unread,
Jun 27, 2017, 1:51:06 PM6/27/17
to LAStools - efficient command line tools for LIDAR processing
Evon,
I think the newer laszip with native 1.4 support separates the channels and does them separately. I think with that, your options 1 and 2 will become the same.

Kirk Waters, PhD                     | NOAA Office for Coastal Management
Applied Sciences Program      | 2234 South Hobson Ave
843-740-1227                          | Charleston, SC 29405    


Martin Isenburg

unread,
Jun 27, 2017, 1:51:09 PM6/27/17
to LAStools - efficient command line tools for LIDAR processing
Hello Evon,

the compression scheme was optimized for points in acquisition order by one laser beam and your findings are consistent with this:

Sort by Flightline > Channel > Timestamp (ascending/descending) > Return # (ascending)

however, it is sufficient that points are *mainly* in this acquisition order. After you put them in this order you may order them in a space filling curve  in xy, for example with 50 meter by 50 meter or 100 by 100 meter tiles, to have not only good compression but also good spatial access coherence when using the LAZ files together with LAX files.

Note that the "native LAS 1.4 extension" of LASzip will automatically take the scanner channel into account when compression new point types 6, 7, 8, 9, or 10 and automatically switch context based on the scanner channel so that you do not need to sort by scanner channel for compression efficiency (but maybe to keep the number of context switches small).

There is an ASPRS paper linked on http://laszip.org that described the dependencies of LASzip for point types 0 through 5.

Regards from Montpellier,

Martin

Martin Isenburg

unread,
Jun 28, 2017, 5:54:00 AM6/28/17
to LAStools - efficient command line tools for LIDAR processing
Hello,

indeed. That said I would like to ask folks again to help verify that the new "native LAS 1.4 extension" for LASzip is bug free so we can release it for good. The compressor has been fully integrated into LAStools and the LASzip DLL but is still considered to be in beta. To activate it you will have to add the keyword '-native' to the command line. Here an example:

E:\LAStools\bin>las2las -version
LAStools (by mar...@rapidlasso.com) version 170625

E:\LAStools\bin>las2las -i ..\data\fusa.laz ^
                                       -set_version 1.4 ^
                                       -set_point_type 6 ^
                                       -o fusa14p6.las

E:\LAStools\bin>las2las -i fusa14p6.las -o fusa14p6.laz
ERROR: point type 6 requires using "native LAS 1.4 extension" of LASzip
ERROR: cannot open laswriterlas with file name 'fusa14p6.laz'
ERROR: could not open laswriter

E:\LAStools\bin>las2las -i fusa14p6.las -o fusa14p6.laz -native

E:\LAStools\bin>lasinfo -i fusa14p6.laz
lasinfo (170625) report for fusa14p6.laz
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.4
  system identifier:          'LAStools (c) by rapidlasso GmbH'
  generating software:        'las2las (version 170625)'
  file creation day/year:     40/2010
  header size:                375
  offset to point data:       469
  number var. length records: 1
  point data format:          6
  point data record length:   30
  number of point records:    0
  number of points by return: 0 0 0 0 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  277750.00 6122250.00 42.21
  max x y z:                  277999.99 6122499.99 64.35
  start of waveform data packet record: 0
  start of first extended variable length record: 0
  number of extended_variable length records: 0
  extended number of point records: 277573
  extended number of points by return: 263413 13879 281 0 0 0 0 0 0 0 0 0 0 0 0
variable length header record 1 of 1:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34735
  length after header  40
  description          'by LAStools of Martin Isenburg'
    GeoKeyDirectoryTag version 1.1.0 number of keys 4
      key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
      key 3072 tiff_tag_location 0 count 1 value_offset 32754 - ProjectedCSTypeGeoKey: WGS 84 / UTM 54S
      key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
      key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
LASzip compression (version 3.0r1 c3 50000): POINT14 3
reporting minimum and maximum for all LAS point record entries ...
  X            27775000   27799999
  Y           612225000  612249999
  Z                4221       6435
  intensity          10      62293
  return_number       1          3
  number_of_returns   1          3
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      1          6
  scan_angle_rank    79        103
  user_data           0        197
  point_source_ID     1          1
  gps_time 5880.963028 5886.739738
  extended_return_number          1      3
  extended_number_of_returns      1      3
  extended_classification         1      6
  extended_scan_angle         13167  17167
  extended_scanner_channel        0      0
number of first returns:        263413
number of intermediate returns: 283
number of last returns:         263370
number of single returns:       249493
overview over extended number of returns of given pulse: 249493 27232 848 0 0 0 0 0 0 0 0 0 0 0 0
histogram of classification of points:
           17553  unclassified (1)
          180868  ground (2)
           37030  high vegetation (5)
           42122  building (6)
Reply all
Reply to author
Forward
0 new messages