las and laz files have different header information when accessed using Python

1,100 views
Skip to first unread message

Phil Wilkes

unread,
Oct 3, 2013, 6:42:17 AM10/3/13
to last...@googlegroups.com
Hello

I have been using Python to access the header information of a las file, when I compress it using lastools I have noticed there is a difference between the two.  Although when I use the lasinfo command there isn't!

I have been using the las2las tool as below to compress my las file:

las2las -i infile.las -o outfile.laz

Not all fields are different - just some important ones! Such as pointformat (1 in las, 129 in laz) and vlr records (1 in las, 2 in laz)

Any help greatly appreciated!

Many thanks, Phil

The python script I have been using is as follows (have also attached):

import struct

headerstruct = ( ('filesig', 4,'c',4) ,
                   ('filesourceid' , 2,'H',1) ,
                   ('reserved'     , 2,'H',1) ,
                   ('guid1'        , 4,'L',1) ,
                   ('guid2'        , 2,'H',1) ,
                   ('guid3'        , 2,'H',1) ,
                   ('guid4'        , 8,'B',8) ,
                   ('vermajor'     , 1,'B',1) ,
                   ('verminor'     , 1,'B',1) ,
                   ('sysid'        , 32,'c',32) ,
                   ('gensoftware'  , 32,'c',32) ,
                   ('fileday'      , 2,'H',1) ,
                   ('fileyear'     , 2,'H',1) ,
                   ('headersize'   , 2,'H',1) ,
                   ('offset'       , 4,'L',1) ,
                   ('numvlrecords' , 4,'L',1) ,
                   ('pointformat'  , 1,'B',1) ,
                   ('pointreclen'  , 2,'H',1) ,
                   ('numptrecords' , 4,'L',1) ,
                   ('numptbyreturn', 20,'L',5) ,
                   ('xscale'       , 8,'d',1) ,
                   ('yscale'       , 8,'d',1) ,
                   ('zscale'       , 8,'d',1) ,
                   ('xoffset'      , 8,'d',1) ,
                   ('yoffset'      , 8,'d',1) ,
                   ('zoffset'      , 8,'d',1) ,
                   ('xmax'         , 8,'d',1) ,
                   ('xmin'         , 8,'d',1) ,
                   ('ymax'         , 8,'d',1) ,
                   ('ymin'         , 8,'d',1) ,
                   ('zmax'         , 8,'d',1) ,
                   ('zmin'         , 8,'d',1) )
                   
def parseHeader(filename, verbose=True):

    fh = open(filename,'rb')

    header = {'infile':filename}
    
    with open(filename, 'rb') as fh:
        for i in headerstruct:
            if i[2] == 'c':
                value = fh.read(i[1])
            elif i[3] > 1:
                value = struct.unpack( '=' + str(i[3]) + i[2] , fh.read(i[1]) )
            else:
                value = struct.unpack( '=' + i[2] , fh.read(i[1]) )[0]
            if verbose:
                print i[0] + '\t', i[2] + '\t', value
        
            header[i[0]] = value
parseHeader.py

Martin Isenburg

unread,
Oct 3, 2013, 6:55:24 AM10/3/13
to last...@googlegroups.com
Hello Phil,

your observations are correct. And if you were to look into the open source code you will see exactly those items being added to the LASheader when writing LAZ and removed from the LASheader when reading LAZ ... either through the DLL, the LASlib API, the LASzip API, or in the integration with PDAL/libLAS.

The point data type - an eight bit number - is getting its highest bit turned on, hence plus 128. This is to prevent any other LAS reader accidentally trying to parse the points (e.g. if anyone was to rename a *.laz file to *.las manually. And the extra VLR that has the user ID "laszip encoded" stores the compressor configuration. Hence you can recognize a LAZ file by having a point data type of 128 or higher and a VLR with the user ID "laszip encoded". 

lasinfo does not report those because they are "stripped" on-the-fly by the decompressor so that to all LAStools and to all LASlib API or LASzip DLL users a LAZ file appears identical to a LAS file.

Hope that explains things.

Jill Kelly

unread,
Oct 12, 2014, 1:02:07 PM10/12/14
to last...@googlegroups.com
Thanks for this code, Phil.  You just saved me a lot of time!

Howard Butler

unread,
Oct 13, 2014, 8:09:03 PM10/13/14
to last...@googlegroups.com

On Oct 12, 2014, at 12:02 PM, Jill Kelly <gorgonzo...@gmail.com> wrote:

> Thanks for this code, Phil. You just saved me a lot of time!

laspy is a feature-complete LAS 1.4 Python implementation. It of course doesn't handle LAZ data, and it isn't as fast as a C/C++ implementation like LASlib, but for easy manipulation of LAS data, it's tough to beat.

https://github.com/grantbrown/laspy

Howard
Reply all
Reply to author
Forward
0 new messages