without scanner channel lasoptimize will *unoptimize* multi-beam LAS/LAZ files

24 views
Skip to first unread message

Martin Isenburg

unread,
Aug 29, 2017, 12:52:41 PM8/29/17
to LAStools - efficient command line tools for LIDAR processing, The LAS room - a friendly place to discuss specifications of the LAS format
Hello,

the guys from UNAVCO / OpenTopography send me an interesting scenario of where lasoptimize fails to optimize the file.


They ran the lasoptimize tool on a number of LAZ files just to see their file sizes grow substantially:

lasoptimize -i C440000_5003000.laz ^
                   -odix _opt
lasoptimize -i C441000_5005000.laz ^
                    -odix _opt

25,853,381 C440000_5003000.laz
35,182,061 C440000_5003000_opt.laz
32,038,249 C441000_5005000.laz
50,864,395 C441000_5005000_opt.laz

That seemed bizarre at first. But my initial suspicion was quickly confirmed. It's probably data from a multi-beam system. So I looked at the files with lasview and ran a few lasinfo reports (see at the end of the email) with histograms on user_data and point_source_ID and it quickly became clear that this must be Optech Titan data. I think that is currently the (only?) integrated system available that produces airborne LiDAR with three different beams.

As the file are LAS 1.2 with point type 1 there is no "scanner channel" field. But the scanner channel is often stored to the user data field or is coded into the point source ID as mentioned in this earlier discussion already:


Clearly in this data set both was done (see lasinfo reports). So we can use either of these two fields to "help" lasoptimize to properly arrange the points by scanner channel as follows: 

lasoptimize -i C440000_5003000.laz ^
                   -scanner_channel_in_user_data ^
                   -odix _user_data

lasoptimize -i C440000_5003000.laz ^
                   -scanner_channel_in_point_source_ID ^
                   -odix _point_source

lasoptimize -i C441000_5005000.laz ^
                   -scanner_channel_in_user_data ^
                   -odix _user_data

lasoptimize -i C441000_5005000.laz ^
                   -scanner_channel_in_point_source_ID ^
                   -odix _point_source

Because the scanner channel is coded into the point source ID as well as stored in the user data field (see lasinfo reports below) either of these commands above give better (but not as good as before) compression.

25,853,381 C440000_5003000.laz
35,182,061 C440000_5003000_opt.laz
27,061,144 C440000_5003000_point_source.laz
27,098,862 C440000_5003000_user_data.laz

32,038,249 C441000_5005000.laz
50,864,395 C441000_5005000_opt.laz
34,941,362 C441000_5005000_point_source.laz
34,863,047 C441000_5005000_user_data.laz

It's not as good as before because in this case the points already have *optimal* compression order. What lasoptimize adds on top is a better order for indexing via a spatial sort ... but I could maybe fine-tune the granularity to make the compression a little higher and the spatial coherence a little coarser.

Regards,

Martin @rapidasso

---------------------

lasinfo -i C440000_5003000.laz -histo user_data 1 -histo point_source 1
lasinfo (170828) report for C440000_5003000.laz
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          ''
  generating software:        'TerraScan'
  file creation day/year:     322/2016
  header size:                227
  offset to point data:       229
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    8602840
  number of points by return: 6926844 1059960 486580 129456 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  440000.00 5003000.00 987.01
  max x y z:                  440999.99 5003999.99 1425.87
the header is followed by 2 user-defined bytes
LASzip compression (version 3.0r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
  X            44000000   44099999
  Y           500300000  500399999
  Z               98701     142587
  intensity           1       1135
  return_number       1          4
  number_of_returns   1          4
  edge_of_flight_line 0          1
  scan_direction_flag 0          1
  classification      1          9
  scan_angle_rank   -49         42
  user_data           1          3
  point_source_ID   111        613
  gps_time 494243.381029 497635.631495
number of first returns:        6926844
number of intermediate returns: 616956
number of last returns:         6924397
number of single returns:       5865357
overview over number of returns of given pulse: 5865357 1146547 1072249 518687 0 0 0
histogram of classification of points:
         2213703  unclassified (1)
         6189309  ground (2)
              11  noise (7)
          199817  water (9)
user data histogram with bin size 1
  bin 1 has 2650469
  bin 2 has 3014674
  bin 3 has 2937697
  average user data 2.03339 for 8602840 element(s)
point source id histogram with bin size 1
  bin 111 has 39566
  bin 112 has 22303
  bin 113 has 61277
  bin 211 has 636347
  bin 212 has 727856
  bin 213 has 665358
  bin 311 has 1008129
  bin 312 has 1092211
  bin 313 has 1209907
  bin 411 has 731705
  bin 412 has 916404
  bin 413 has 756100
  bin 611 has 234722
  bin 612 has 255900
  bin 613 has 245055
  average point source id 339.18 for 8602840 element(s)

lasoptimize -i C441000_5005000.laz  -scanner_channel_in_user_data -odix _user_data

lasinfo -i C441000_5005000.laz -histo user_data 1 -histo point_source 1
lasinfo (170828) report for C441000_5005000.laz
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          ''
  generating software:        'TerraScan'
  file creation day/year:     322/2016
  header size:                227
  offset to point data:       229
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    14327226
  number of points by return: 13907006 284918 107195 28107 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  441000.00 5005000.00 981.77
  max x y z:                  441999.99 5005999.99 1433.28
the header is followed by 2 user-defined bytes
LASzip compression (version 3.0r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
  X            44100000   44199999
  Y           500500000  500599999
  Z               98177     143328
  intensity           1       1096
  return_number       1          4
  number_of_returns   1          4
  edge_of_flight_line 0          1
  scan_direction_flag 0          1
  classification      1          9
  scan_angle_rank   -31         32
  user_data           1          3
  point_source_ID   311        713
  gps_time 495534.566481 498747.354700
number of first returns:        13907006
number of intermediate returns: 135686
number of last returns:         13905935
number of single returns:       13621401
overview over number of returns of given pulse: 13621401 355416 237589 112820 0 0 0
histogram of classification of points:
         2043461  unclassified (1)
        12228647  ground (2)
              54  noise (7)
           55064  water (9)
user data histogram with bin size 1
  bin 1 has 4712828
  bin 2 has 4807827
  bin 3 has 4806571
  average user data 2.00654 for 14327226 element(s)
point source id histogram with bin size 1
  bin 311 has 80976
  bin 312 has 83738
  bin 313 has 84726
  bin 411 has 539771
  bin 412 has 552933
  bin 413 has 579636
  bin 511 has 1204463
  bin 512 has 1227738
  bin 513 has 1221525
  bin 611 has 1662247
  bin 612 has 1707777
  bin 613 has 1684591
  bin 711 has 1225371
  bin 712 has 1235641
  bin 713 has 1236093
  average point source id 583.741 for 14327226 element(s)

Martin Isenburg

unread,
Aug 30, 2017, 12:23:46 PM8/30/17
to LAStools - efficient command line tools for LIDAR processing, The LAS room - a friendly place to discuss specifications of the LAS format
Hello,

thanks, Benjamin, for sharing these timing results. Note that indexed LAS will always win over indexed LAZ when running on one core and with a fast and available file system (i.e. when *not* being I/O bound). However, indexed LAZ starts winning when you used more than one core and access the file system with many read and/or write requests simultaneously. As soon as your LAS-file based pipeline becomes I/O bound (either because you use many cores or because the file system is slow or otherwise busy) then processing starts stalling as the CPUs are just waiting for data from the file system.

Regards,

Martin @rapidlasso

On Wed, Aug 30, 2017 at 5:13 PM, Benjamin Gross <mbg...@unavco.org> wrote:
Hi Martin,
Thanks again for the explanation and diagnostics. For reference, here are our unscientific test results for our example dataset running las2las with an -inside query:

LAS v1.2 (187,303,668 points), indexed


LAS: 5.26 GB; 45.19 s

LAZ: 482 MB; 112.91 s

LAZ (lasoptimize'd, scanner channel field set incorrectly): 722 MB; 126.26 s

LAZ (lasoptimize'd”, scanner channel field set correctly): 516 MB; 111.50 s


So it's definitely important to pay attention to where the channel info is with LAS v1.2 files and set the correct flag on lasoptimize. Doing this incorrectly has consequences for both compression and spatial queries on indexed datasets.

Reply all
Reply to author
Forward
0 new messages