the guys from UNAVCO / OpenTopography send me an interesting scenario of where lasoptimize fails to optimize the file.
They ran the lasoptimize tool on a number of LAZ files just to see their file sizes grow substantially:
lasoptimize -i C440000_5003000.laz ^
-odix _opt
lasoptimize -i C441000_5005000.laz ^
-odix _opt
25,853,381 C440000_5003000.laz
35,182,061 C440000_5003000_opt.laz
32,038,249 C441000_5005000.laz
50,864,395 C441000_5005000_opt.laz
That seemed bizarre at first. But my initial suspicion was quickly confirmed. It's probably data from a multi-beam system. So I looked at the files with lasview and ran a few lasinfo reports (see at the end of the email) with histograms on user_data and point_source_ID and it quickly became clear that this must be Optech Titan data. I think that is currently the (only?) integrated system available that produces airborne LiDAR with three different beams.
As the file are LAS 1.2 with point type 1 there is no "scanner channel" field. But the scanner channel is often stored to the user data field or is coded into the point source ID as mentioned in this earlier discussion already:
Clearly in this data set both was done (see lasinfo reports). So we can use either of these two fields to "help" lasoptimize to properly arrange the points by scanner channel as follows:
lasoptimize -i C440000_5003000.laz ^
-scanner_channel_in_user_data ^
-odix _user_data
lasoptimize -i C440000_5003000.laz ^
-scanner_channel_in_point_source_ID ^
-odix _point_source
lasoptimize -i C441000_5005000.laz ^
-scanner_channel_in_user_data ^
-odix _user_data
lasoptimize -i C441000_5005000.laz ^
-scanner_channel_in_point_source_ID ^
-odix _point_source
Because the scanner channel is coded into the point source ID as well as stored in the user data field (see lasinfo reports below) either of these commands above give better (but not as good as before) compression.
25,853,381 C440000_5003000.laz
35,182,061 C440000_5003000_opt.laz
27,061,144 C440000_5003000_point_source.laz
27,098,862 C440000_5003000_user_data.laz
32,038,249 C441000_5005000.laz
50,864,395 C441000_5005000_opt.laz
34,941,362 C441000_5005000_point_source.laz
34,863,047 C441000_5005000_user_data.laz
It's not as good as before because in this case the points already have *optimal* compression order. What lasoptimize adds on top is a better order for indexing via a spatial sort ... but I could maybe fine-tune the granularity to make the compression a little higher and the spatial coherence a little coarser.
Regards,
Martin @rapidasso
lasinfo -i C440000_5003000.laz -histo user_data 1 -histo point_source 1
lasinfo (170828) report for C440000_5003000.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000
version major.minor: 1.2
system identifier: ''
generating software: 'TerraScan'
file creation day/year: 322/2016
header size: 227
offset to point data: 229
number var. length records: 0
point data format: 1
point data record length: 28
number of point records: 8602840
number of points by return: 6926844 1059960 486580 129456 0
scale factor x y z: 0.01 0.01 0.01
offset x y z: 0 0 0
min x y z: 440000.00 5003000.00 987.01
max x y z: 440999.99 5003999.99 1425.87
the header is followed by 2 user-defined bytes
LASzip compression (version 3.0r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
X 44000000 44099999
Y 500300000 500399999
Z 98701 142587
intensity 1 1135
return_number 1 4
number_of_returns 1 4
edge_of_flight_line 0 1
scan_direction_flag 0 1
classification 1 9
scan_angle_rank -49 42
user_data 1 3
point_source_ID 111 613
gps_time 494243.381029 497635.631495
number of first returns: 6926844
number of intermediate returns: 616956
number of last returns: 6924397
number of single returns: 5865357
overview over number of returns of given pulse: 5865357 1146547 1072249 518687 0 0 0
histogram of classification of points:
2213703 unclassified (1)
6189309 ground (2)
11 noise (7)
199817 water (9)
user data histogram with bin size 1
bin 1 has 2650469
bin 2 has 3014674
bin 3 has 2937697
average user data 2.03339 for 8602840 element(s)
point source id histogram with bin size 1
bin 111 has 39566
bin 112 has 22303
bin 113 has 61277
bin 211 has 636347
bin 212 has 727856
bin 213 has 665358
bin 311 has 1008129
bin 312 has 1092211
bin 313 has 1209907
bin 411 has 731705
bin 412 has 916404
bin 413 has 756100
bin 611 has 234722
bin 612 has 255900
bin 613 has 245055
average point source id 339.18 for 8602840 element(s)
lasoptimize -i C441000_5005000.laz -scanner_channel_in_user_data -odix _user_data
lasinfo -i C441000_5005000.laz -histo user_data 1 -histo point_source 1
lasinfo (170828) report for C441000_5005000.laz
reporting all LAS header entries:
file signature: 'LASF'
file source ID: 0
global_encoding: 0
project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000
version major.minor: 1.2
system identifier: ''
generating software: 'TerraScan'
file creation day/year: 322/2016
header size: 227
offset to point data: 229
number var. length records: 0
point data format: 1
point data record length: 28
number of point records: 14327226
number of points by return: 13907006 284918 107195 28107 0
scale factor x y z: 0.01 0.01 0.01
offset x y z: 0 0 0
min x y z: 441000.00 5005000.00 981.77
max x y z: 441999.99 5005999.99 1433.28
the header is followed by 2 user-defined bytes
LASzip compression (version 3.0r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
X 44100000 44199999
Y 500500000 500599999
Z 98177 143328
intensity 1 1096
return_number 1 4
number_of_returns 1 4
edge_of_flight_line 0 1
scan_direction_flag 0 1
classification 1 9
scan_angle_rank -31 32
user_data 1 3
point_source_ID 311 713
gps_time 495534.566481 498747.354700
number of first returns: 13907006
number of intermediate returns: 135686
number of last returns: 13905935
number of single returns: 13621401
overview over number of returns of given pulse: 13621401 355416 237589 112820 0 0 0
histogram of classification of points:
2043461 unclassified (1)
12228647 ground (2)
54 noise (7)
55064 water (9)
user data histogram with bin size 1
bin 1 has 4712828
bin 2 has 4807827
bin 3 has 4806571
average user data 2.00654 for 14327226 element(s)
point source id histogram with bin size 1
bin 311 has 80976
bin 312 has 83738
bin 313 has 84726
bin 411 has 539771
bin 412 has 552933
bin 413 has 579636
bin 511 has 1204463
bin 512 has 1227738
bin 513 has 1221525
bin 611 has 1662247
bin 612 has 1707777
bin 613 has 1684591
bin 711 has 1225371
bin 712 has 1235641
bin 713 has 1236093
average point source id 583.741 for 14327226 element(s)