laszip use of both -stdin and -stdout

312 views
Skip to first unread message

Stefano Polloni

unread,
Aug 29, 2018, 2:50:09 PM8/29/18
to LAStools - efficient tools for LiDAR processing
Hi there!

I am trying to to incorporate laszip in a python workflow without having to create intermediate files. My goal is to compress a .las file by piping bytes in and out of laszip in the following fashion:

laszip -stdin -stdout -olaz

in python, this procedure looks like:

laszip_cmd = "laszip -stdin -stdout -olaz"
laz_file = subprocess.check_output(
    laszip_cmd, 
    shell=True, 
    input=las_file
)

I noticed that doing this produces a corrupt .laz file. Lasinfo will return Failed to open LASzip stream: init() of LASreadPoint failed (LASzip v2.2r0) when trying to read the file.  Comparing with the output of more traditional and successful output of: 
laszip -stdin -o compressed.laz

I notice that the file size is extremely similar yet a couple of bytes are either missing or added. Is there any obvious reasons why using both -stdin and -stdout would corrupt files?

Many thanks!
Stefano

 


Martin Isenburg

unread,
Aug 30, 2018, 4:52:05 AM8/30/18
to LAStools - efficient command line tools for LIDAR processing
Hello Stefano,

I cannot replicate your issue. Maybe you can provide a small example? Below is my example and it works.

E:\LAStools\bin>las2las -i ..\data\lake.laz -olas -stdout | laszip -stdin -o lake.laz
WARNING: stream not seekable. cannot update header.

E:\LAStools\bin>lasview -i lake.laz

E:\LAStools\bin>lasdiff -i  ..\data\lake.laz -i lake.laz
checking '..\data\lake.laz' against 'lake.laz'
  different system_identifier: 'LAStools (c) by Martin Isenburg' 'LAStools (c) by rapidlasso GmbH'
  different generating_software: 'las2las (version 120505)' 'las2las (version 180814)'
headers have 2 differences.
raw points are identical.
both have 102622 points. took 0.093 secs.

E:\LAStools\bin>las2las -i ..\data\lake.laz -olas -stdout | laszip -stdin -olaz -stdout > lake1.laz
WARNING: stream not seekable. cannot update header.

E:\LAStools\bin>lasview -i lake1.laz

E:\LAStools\bin>lasdiff -i  ..\data\lake.laz -i lake1.laz
checking '..\data\lake.laz' against 'lake1.laz'
  different system_identifier: 'LAStools (c) by Martin Isenburg' 'LAStools (c) by rapidlasso GmbH'
  different generating_software: 'las2las (version 120505)' 'las2las (version 180814)'
headers have 2 differences.
raw points are identical.
both have 102622 points. took 0.087 secs.

E:\LAStools\bin>lasdiff -i lake.laz -i lake1.laz
checking 'lake.laz' against 'lake1.laz'
headers are identical.
raw points are identical.
files are identical. both have 102622 points. took 0.093 secs.

Regards,

Martin @rapidlasso


Stefano Polloni

unread,
Aug 30, 2018, 1:25:28 PM8/30/18
to LAStools - efficient tools for LiDAR processing
Martin, 

many thanks for your response. I was able to perfectly replicate your example, so I suspect the issue is somehow related to python's handling of the las and laz bytes. I also wanted to make an important clarification from my previous post: I was using libLAS's  lasinfo rather than the executable from LAStools. LAStool's lasinfo will open the problematic file, but with a corruption warning, as is shown in this example:

I first produce a .las file from the same lake.laz file used in your example:
las2las -i /LAStools/data/lake.laz -olas -o lake.las

I then compress the resulting las with laszip, in a conventional manner:
laszip -i lake.las -o lake.laz

I produce an alternative .laz with the piping method I am trying to implement. In python:
# load las bytes in memory
las_file = open("lake.las", "rb").read()

# pipe bytes through laszip: las-->laz
laszip_cmd = "laszip -stdin -stdout -olaz"
laz_file = subprocess.check_output(
    laszip_cmd, 
    shell=True, 
    input=las_file
)

# save laz bytes to disk
open("lake_piped.laz", "wb").write(laz_file)



Now the lasdiff comparisons:

lasdiff -i /LAStools/data/lake.laz -i lake.laz

Output:
checking '/LAStools/data/lake.laz' against 'lake.laz'
  different system_identifier: 'LAStools (c) by Martin Isenburg' 'LAStools (c) by rapidlasso GmbH'
  different generating_software: 'las2las (version 120505)' 'las2las (version 180812)'
headers have 2 differences.
raw points are identical.
both have 102622 points. took 0.124072 secs.

lasdiff -i /LAStools/data/lake.laz -i lake_piped.laz

Output:
checking '/LAStools/data/lake.laz' against 'lake_piped.laz'
  different system_identifier: 'LAStools (c) by Martin Isenburg' 'LAStools (c) by rapidlasso GmbH'
  different generating_software: 'las2las (version 120505)' 'las2las (version 180812)'
headers have 2 differences.
WARNING: 'corrupt chunk table'
raw points are identical.
both have 102622 points. took 0.133385 secs.

lasdiff -i lake.laz -i lake_piped.laz

Output:
checking 'lake.laz' against 'lake_piped.laz'
headers are identical.
WARNING: 'corrupt chunk table'
raw points are identical.
files are identical. both have 102622 points. took 0.122818 secs.

Stefano Polloni

unread,
Sep 1, 2018, 9:10:47 PM9/1/18
to LAStools - efficient tools for LiDAR processing
I am specifically curious about the meaning of a 'corrupt chunk table' and why this would occur when streaming the file as described.

Martin Isenburg

unread,
Sep 2, 2018, 4:33:52 PM9/2/18
to LAStools - efficient command line tools for LIDAR processing
Hi,

Are you using the pre-compiled Windows executable laszip.exe or did you compile laszip on another operating system? Maybe it's an endian issue. Can you send me the file that produces the "corrupt chunk table" warning?

Regards from Costa Rica,

Martin @rapidlasso

Stefano Polloni

unread,
Sep 5, 2018, 4:40:33 PM9/5/18
to LAStools - efficient tools for LiDAR processing
Martin, 

thanks again for your reply. I am using a complied version of laszip on MacOs. I don't know much about endianness, but please find attached the file producing the warning. 
Thank you for looking into this, much appreciated.

Stefano
lake_piped.laz

Martin Isenburg

unread,
Sep 9, 2018, 10:12:30 AM9/9/18
to LAStools - efficient command line tools for LIDAR processing
Hello,

it seems that your piped writer behaves differently. When the output is a pipe then the start position of the "chunk table" in the file cannot be updated at the end of writing, and so the process writes the position of the chunk table as the last 8 bytes. Usually it writes the position here:


but it writes a value of -1 when is is a pipe (aka not seekable) Then - at the very end of the writing the points process - it gets the file / pipe position in this line which is where the "chunk table" will be starting from: 


But for a non-seekable pipe it does not attempt to write it in its usual place. Instead it writes this position at the very end: 


but - and I checked in the debugger - in your case it writes another value of -1 instead of the correct position. So I think the call to

I64 position = outstream->tell();

in this line


fails on MacOS to produce a correct number. Could you insert a fprintf in your code, recompile, and see what is going on? The LASwriter is opened with the FILE* set to stdout and creates a ByteStreamOutFile with the correct endianess in this code segment:

ByteStreamOut* out;
  if (IS_LITTLE_ENDIAN())
    out = new ByteStreamOutFileLE(file);
  else
    out = new ByteStreamOutFileBE(file);

that you can find here:

 
It seems your ByteStreamOutFileBE() does not return the correct position when "outstream->tell()" is called if opened with a stdout instead of a regular file-on-disk-based FILE*. Could you insert a fprintf as shown below at that place in your code, recompile, and confirm that the value is not correct?

I64 position = outstream->tell();
fprintf(stderr, "position of stream after compressing all points: %u\n",  (U32) position);

Regards,

Martin @rapidlasso
 

Stefano Polloni

unread,
Sep 10, 2018, 11:13:06 AM9/10/18
to LAStools - efficient tools for LiDAR processing
Martin,

Thanks so much for looking into this, much appreciated. I inserted an fprintf statement as requested, and get the following output after running the piped writer:

position of stream after compressing all points: 4294967295 

Martin Isenburg

unread,
Sep 10, 2018, 11:24:48 AM9/10/18
to LAStools - efficient command line tools for LIDAR processing
Hello,

that confirms my suspicion. Instead of the real position your code returns -1 which is what gives the value 4294967295 after being case to an unsigned int. The true value would be a little smaller than the number of bytes in the written file.  Apparently your code stdout behaves differently when file is set to stdout in this code segment:

inline I64 ByteStreamOutFile::tell() const
{
#if defined _WIN32 && ! defined (__MINGW32__)
  return _ftelli64(file);
#elif defined (__MINGW32__)
  return (I64)ftello64(file);
#else
  return (I64)ftello(file);
#endif
}


maybe there is another xxx_ftell__xxx() command that should be used on your MacOS compiler / platform that also returns the correct value when the file argument equals stdout? This is something you will need to figure out. Alternatively you can make your ByteStreamOut class maintain its own counter but that would replicate code and be less clean.

Regards,

Martin

On Mon, Sep 10, 2018 at 5:13 PM Stefano Polloni <stefano...@brown.edu> wrote:
Martin,

Stefano Polloni

unread,
Sep 10, 2018, 1:42:00 PM9/10/18
to LAStools - efficient tools for LiDAR processing
Martin, 

thanks for investigating this. I am now curious as to why this happens only in the case where -stdout is used AND I am also piping the output with python. 

Specifically, why would
return (I64)ftello(file);

return the appropriate position when I am doing this  instead ?
las2las -i ..\data\lake.laz -olas -stdout | laszip -stdin -olaz -stdout > lake1.laz

Is the FILE* in the above case the same as when I am using the python piped writer? Does this answer offer the right explanation?

thanks again

Martin Isenburg

unread,
Sep 10, 2018, 2:01:35 PM9/10/18
to LAStools - efficient command line tools for LIDAR processing
Hello,

i see. Seems like the Windows implementation for the MSVC6.0 32-bit compiler and the MSVC2017 64-bit compiler return the number of bytes written to stdout when ftell() is called but the MacOS compiler version you use does not. The only solution I can see here is for you to put a counter into the ByteStreamOut class that explicitly keeps track of the number of bytes written and then use this value at the point where you placed the fprintf statement. The ftell on the pipe works fine also on my new 64 bit compile (see below). Also on my Linux compile it works fine (see very below). Strange ...

C:\software\LAStools\bin>las2las64 -i ..\data\lake.laz -olas -stdout | laszip64 -stdin -olaz -stdout > lake_piped64.laz
WARNING: stream not seekable. cannot update header.

C:\software\LAStools\bin>lasinfo lake_piped64.laz
lasinfo (180907) report for 'lake_piped64.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          'LAStools (c) by rapidlasso GmbH'
  generating software:        'las2las (version 180907)'
  file creation day/year:     55/2012
  header size:                227
  offset to point data:       229
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    102622
  number of points by return: 93604 9018 0 0 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  476941.35 4366469.50 2725.29
  max x y z:                  477208.56 4366726.49 2768.74
the header is followed by 2 user-defined bytes
LASzip compression (version 3.2r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
  X            47694135   47720856
  Y           436646950  436672649
  Z              272529     276874
  intensity           9        839
  return_number       1          2
  number_of_returns   1          3
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      1          9
  scan_angle_rank     0          0
  user_data           0        255
  point_source_ID    40         45
  gps_time 70291.064400 71058.522000
number of first returns:        93604
number of intermediate returns: 638
number of last returns:         93513
number of single returns:       85133
overview over number of returns of given pulse: 85133 16851 638 0 0 0 0
histogram of classification of points:
           37375  unclassified (1)
           27929  ground (2)
            2690  low vegetation (3)
            3772  medium vegetation (4)
           26934  high vegetation (5)
            3922  water (9)

================================================================

[bluetang: LAStools/bin] {61} % uname -a
Linux bluetang.cs.unc.edu 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[bluetang: LAStools/bin] {62} % las2las -i ../data/lake.laz -olas -stdout | laszip -stdin -olaz -stdout > lake_pipeLinux.laz
WARNING: stream not seekable. cannot update header.
[bluetang: LAStools/bin] {63} % lasinfo -i lake_pipeLinux.laz
lasinfo (180907) report for 'lake_pipeLinux.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          'LAStools (c) by rapidlasso GmbH'
  generating software:        'las2las (version 180907)'
  file creation day/year:     55/2012
  header size:                227
  offset to point data:       229
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    102622
  number of points by return: 93604 9018 0 0 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               -0 -0 -0
  min x y z:                  476941.35 4366469.50 2725.29
  max x y z:                  477208.56 4366726.49 2768.74
the header is followed by 2 user-defined bytes
LASzip compression (version 3.2r4 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
  X            47694135   47720856
  Y           436646950  436672649
  Z              272529     276874
  intensity           9        839
  return_number       1          2
  number_of_returns   1          3
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      1          9
  scan_angle_rank     0          0
  user_data           0        255
  point_source_ID    40         45
  gps_time 70291.064400 71058.522000
number of first returns:        93604
number of intermediate returns: 638
number of last returns:         93513
number of single returns:       85133
overview over number of returns of given pulse: 85133 16851 638 0 0 0 0
histogram of classification of points:
           37375  unclassified (1)
           27929  ground (2)
            2690  low vegetation (3)
            3772  medium vegetation (4)
           26934  high vegetation (5)
            3922  water (9)
 

--

Stefano Polloni

unread,
Sep 10, 2018, 7:32:20 PM9/10/18
to LAStools - efficient tools for LiDAR processing
I am weirdly experiencing  the same 'corrupt chunk table' problem when running the python script inside a docker container built on Linux with:

root@7544fecd7ff9:/# uname -a
Linux 7544fecd7ff9 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 GNU/Linux

I am inclined to say that it may not be related to the compiler but rather the nature of the FILE* object when using python's Pipe specifically? Is that possible? 
I am attaching the simple python (>=3.4) scrip if you wish to test on Windows. I think I will otherwise put the matter to rest and write the output to temporary files instead. I am unfortunately not comfortable enough with C++ to quickly write a counter for ByteStreamOut.

Many thanks for all your time looking into this!

Best,
Stefano
piped_laszip.py
Reply all
Reply to author
Forward
0 new messages