Detecting Corrupt LAZ files

977 views
Skip to first unread message

Martin Isenburg

unread,
Nov 8, 2014, 3:11:42 PM11/8/14
to LAStools - efficient command line tools for LIDAR processing, The LAS room - a friendly place to discuss specifications of the LAS format

Dear friends of LAZ,

Once last year I used a USB stick to copy some LAZ files from one computer to another and somehow this did corrupt one or more bits in the LAZ file, which resulted - after decompression - in a LAS file that was both shortened as well as contained a number of invalid points at the end. This was easily noticed with lasinfo and lasvalidate ...

This seems a rather rare occurance as it has not happened since (and i am constantly copying LAZ files across a smorgasboard of computers). But in theory this could even happen if the file on disk was correct due to a load (aka bus) error ...

However ... it would be possible to add a sanity check to LASzip that notices such corruptions and warns about them. So i am wondering ... is this an issue in practise and if yes how wshould such an error be treated and what sort of validation tools would be useful to you?

Regards,

Martin @rapidlasso

PS: Check out (or even better add your thoughts to) the comment section of http://rapidlasso.com/2014/11/06/keeping-esri-honest/ ... (-;

ken yates.googlemail

unread,
Nov 9, 2014, 3:28:06 AM11/9/14
to last...@googlegroups.com

We contract with LIDAR suppliers.  I had proposed that we use overnight FTP transfers of compressed LAS with file checksum like RAR.  The vendors resisted the RAR step before FTP as the compression task was not in original contract writing.  We have gotten some rare errors during these FTP uploads.  Our vendors squeal a bit when asked to resend.  I am rewriting our delivery spec now and it is time for us to smarten up and use a double-duty compression like LAZ with this proposed built-in file sanity checker... (If it is very fast.) A simple checksum-style failure message would be adequate for us.  Our work flow currently begins with lasboundary.

Terje Mathisen

unread,
Nov 9, 2014, 3:28:31 AM11/9/14
to last...@googlegroups.com
Martin Isenburg wrote:
>
> Dear friends of LAZ,
>
> Once last year I used a USB stick to copy some LAZ files from one
> computer to another and somehow this did corrupt one or more bits in
> the LAZ file, which resulted - after decompression - in a LAS file
> that was both shortened as well as contained a number of invalid
> points at the end. This was easily noticed with lasinfo and
> lasvalidate ...
>
> This seems a rather rare occurance as it has not happened since (and i
> am constantly copying LAZ files across a smorgasboard of computers).
> But in theory this could even happen if the file on disk was correct
> due to a load (aka bus) error ...
>
> However ... it would be possible to add a sanity check to LASzip that
> notices such corruptions and warns about them. So i am wondering ...
> is this an issue in practise and if yes how wshould such an error be
> treated and what sort of validation tools would be useful to you?
>

In order to correspond to ZIP behavior you should indeed add a CRC or
similar checksum to LAZ files: It is a big feature of sending zipped
documents that the unpacking will verify that all bits are still OK. (In
fact I thought LAZ already had such a check...)

Terje
>
> Regards,
>
> Martin @rapidlasso
>
> PS: Check out (or even better add your thoughts to) the comment
> section of http://rapidlasso.com/2014/11/06/keeping-esri-honest/ ... (-;
>
--
- <Terje.M...@tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Dwight Crouse

unread,
Nov 9, 2014, 2:22:50 PM11/9/14
to last...@googlegroups.com, The LAS room - a friendly place to discuss specifications of the LAS format
Hi Martin,

It would be great if this feature could be added since we are more and more using laz as the deliverable to clients as well as requesting data from providers in laz format and having this check would mitigate or at least detect any potential issues.

Dwight 
--


--
  
DWIGHT CROUSE  |  Analysis Manager  |  Cochrane, AB, Canada
T: 403.241.9020  |  C: 403.836.9429  |  TFN: 1.866.698.8789 ext. 4

          
This email communication and any files transmitted with it may contain confidential and or proprietary information and is provided for the use of the intended recipient only. Any review, retransmission or dissemination of this information by anyone other than the intended recipient is prohibited. If you receive this email in error, please contact the sender and delete this communication and any copies immediately. Thank you.


Martin Isenburg

unread,
Nov 10, 2014, 7:21:46 AM11/10/14
to The LAS room - a friendly place to discuss specifications of the LAS format, LAStools - efficient command line tools for LIDAR processing
Hello,

thank you for your encouraging comments. (-:

But several questions went unanswered. To be more specific:

(a) how the DLL should react when noticing a bit-corruption in a LAZ file?
(b) how should laszip react when encountering a (partially) bit-corrupted LAZ file?
(c) how should a verification tool operate and / or report bit-corrupted LAZ files?

We could just decide all those things to the best of our knowledge but - as always - like to give everybody an opportunity for input before making any such design decision (unlike a certain company East of LA ... (-;).


Regards,

Martin @rapidlasso

--
http://rapidlasso.com - transparent and open lazzing your LiDARs

On Mon, Nov 10, 2014 at 12:22 AM, Howard Butler <how...@hobu.co> wrote:

> On Nov 8, 2014, at 2:10 PM, Martin Isenburg <martin....@gmail.com> wrote:
>
> However ... it would be possible to add a sanity check to LASzip that notices such corruptions and warns about them. So i am wondering ... is this an issue in practise and if yes how wshould such an error be treated and what sort of validation tools would be useful to you?

In my opinion, this feature would be desirable in a future LAZ revision as long as its implementation doesn't end up being too complicated. It would be especially helpful for those of us doing network'y LAZ stuff to have a standard way to verify things rather than crafting up our own thing.

Howard

--
--
You are subscribed to "The LAS room - a friendly place to discuss the the LAS or LAZ formats" for those who want to see LAS or LAZ succeed as open standards. Go on record with bug reports, suggestions, and concerns about current and proposed specifications.

Visit this group at http://groups.google.com/group/lasroom
Post to this group with an email to las...@googlegroups.com
Unsubscribe by email to lasroom+u...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "The LAS room - a friendly place to discuss the LAS and LAZ formats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lasroom+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Knudsen

unread,
Nov 10, 2014, 9:40:16 AM11/10/14
to last...@googlegroups.com

In my opinion, a bare minimum implementation would be to give laszip the equivalent of the "-t" (test) command line flag of the InfoZip unzip tool, for checking the file integrity.

Even if that was the only thing implemented, it would be very useful.

At the API side, a bare minimum, and (probably) posixly correct, implementation would be to let lasreader->read_point() return 0 on CRC-mismatch, and set errno to something sensible - e.g. by overloading (in a mildly entertaining way) the meaning of EILSEQ.

/thomas


Angenent, Arnout [FGBV]

unread,
Nov 10, 2014, 10:52:31 AM11/10/14
to last...@googlegroups.com

Hi,

 

 

We send around data in LAZ format frequently as well, and as the data is frequently transferred by FTP-like uploads, we do get the odd error every now and then that an uploaded file was incomplete. It would be of great help if an option is added to skip extracting the LAZ if it is noticed to be corrupt. A warning message usually doesn’t work for us, as we transfer many files at the same time (tiled data, thousands of LAZ files), so warning messages are simply missed as the files are extracted in batch mode.

 

A completely missing LAS file is easier to detect then a LAS file that is present but has points missing. I am not sure if laszip is currently skipping corrupt files at the moment as I haven’t come across corrupt LAZ files for a while now, so this request might be outdated already.

 

Kind regards,
Fugro Geospatial B.V.

Arnout Angenent
Processing Supervisor

T +31 (0)70 31 70780 | M +31 (0)6 29 53 75 04 | F +31 (0)70 31 70750
a.ang...@fugro.com | www.fugrogeospatial.com
Dillenburgsingel 69, 2263 HW Leidschendam | P.O. Box 3000, 2260 DA Leidschendam, The Netherlands
Trade Register nr 27152156 | VAT nr NL005621409B29

--

Evon Silvia

unread,
Nov 10, 2014, 12:10:25 PM11/10/14
to last...@googlegroups.com
I agree with most of what has been said thus far, and would add that it would be incredibly helpful if you do go this route to publicly publish some sort of VLR standard so that other developers can take advantage of this concept. I mean, if you go through all this work, why limit it to just LAZ when it just as easily applies to LAS? That would also enable other people's software to run the same check without being restricted to your API.

Evon

Andrew Bell

unread,
Nov 10, 2014, 12:10:50 PM11/10/14
to last...@googlegroups.com
Why is this a LAZ issue?  Wouldn't it make more sense to do something in LAS?  This solves the issue for LAZ, too, doesn't it?  I would expected that a corrupted LAZ file is going to yield a corrupted LAS file.

In the example case you did detect the error given a check as simple as a point count.  Why is more necessary?

--

Thomas Knudsen

unread,
Nov 10, 2014, 12:20:30 PM11/10/14
to last...@googlegroups.com
To answer two of your rhetorical questions in interleaved mode:

1. a flipped bit may mean a wrong height (or position) in a las file, but a totally corrupted laz file.

2. A flipped bit in las will (usually) not imply a wrong point count. In laz it may or may not.

Andrew Bell

unread,
Nov 10, 2014, 12:29:37 PM11/10/14
to last...@googlegroups.com
On Mon, Nov 10, 2014 at 11:15 AM, Thomas Knudsen <knudsen...@gmail.com> wrote:
To answer two of your rhetorical questions in interleaved mode:

1. a flipped bit may mean a wrong height (or position) in a las file, but a totally corrupted laz file.

You can't know the nature of corruption in a file.  A wrong height or position may be a big deal.  A bad offset/size in a LAS file can also cause an unusable file.
 
2. A flipped bit in las will (usually) not imply a wrong point count. In laz it may or may not.

Yes, I understand this.  The case that Martin brought forth, corruption via USB issues, is in my experience pretty regularly a truncation issue.  But you're correct, something more may be needed beyond point count.

--

Terje Mathisen

unread,
Nov 10, 2014, 2:00:14 PM11/10/14
to last...@googlegroups.com
Martin Isenburg wrote:
> Hello,
>
> thank you for your encouraging comments. (-:
>
> But several questions went unanswered. To be more specific:
>
> (a) how the DLL should react when noticing a bit-corruption in a LAZ file?

Probably write an error message to the console, plus a best-effort
attempt to get as many plausible records as possible.
> (b) how should laszip react when encountering a (partially)
> bit-corrupted LAZ file?

Error message and stop, or restarted with a -recover option do the same
as above?
> (c) how should a verification tool operate and / or report
> bit-corrupted LAZ files?

Report where the file stops making sense?

Terje
>
> We could just decide all those things to the best of our knowledge but
> - as always - like to give everybody an opportunity for input before
> making any such design decision (unlike a certain company East of LA
> ... (-;).
>
> http://rapidlasso.wordpress.com/2014/11/06/keeping-esri-honest/
>
> Regards,
>
> Martin @rapidlasso
>
> --
> http://rapidlasso.com -Â transparent and open lazzing your LiDARs
>
> On Mon, Nov 10, 2014 at 12:22 AM, Howard Butler <how...@hobu.co
> <mailto:how...@hobu.co>> wrote:
>
>
> > On Nov 8, 2014, at 2:10 PM, Martin Isenburg
> <martin....@gmail.com <mailto:martin....@gmail.com>> wrote:
> >
> > However ... it would be possible to add a sanity check to LASzip
> that notices such corruptions and warns about them. So i am
> wondering ... is this an issue in practise and if yes how wshould
> such an error be treated and what sort of validation tools would
> be useful to you?
>
> In my opinion, this feature would be desirable in a future LAZ
> revision as long as its implementation doesn't end up being too
> complicated. It would be especially helpful for those of us doing
> network'y LAZ stuff to have a standard way to verify things rather
> than crafting up our own thing.
>
> Howard
>
> --
> --
> You are subscribed to "The LAS room - a friendly place to discuss
> the the LAS or LAZ formats" for those who want to see LAS or LAZ
> succeed as open standards. Go on record with bug reports,
> suggestions, and concerns about current and proposed specifications.
>
> Visit this group at http://groups.google.com/group/lasroom
> Post to this group with an email to las...@googlegroups.com
> <mailto:las...@googlegroups.com>
> Unsubscribe by email to lasroom+u...@googlegroups.com
> <mailto:lasroom%2Bunsu...@googlegroups.com>
> ---
> You received this message because you are subscribed to the Google
> Groups "The LAS room - a friendly place to discuss the LAS and LAZ
> formats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to lasroom+u...@googlegroups.com
> <mailto:lasroom%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>

Jonah Sullivan

unread,
Nov 10, 2014, 4:28:51 PM11/10/14
to last...@googlegroups.com, las...@googlegroups.com
I would prefer to have the file extraction to fail (LAZ --> LAS = no output file) with an error message printed to console.
I would like a verification tool that can output the filename of the corrupted files.

Martin Isenburg

unread,
Nov 12, 2014, 6:49:11 AM11/12/14
to LAStools - efficient command line tools for LIDAR processing, The LAS room - a friendly place to discuss specifications of the LAS format
Hello,
 
thank you for the useful feedback. Everybody seems to want some form of sanity checking. Let me summarize the most common feedbacks.
 
(1) Everubody wants a tool that can validate the integrity of a LAZ file.
 
Here my proposed solution: We can monitor the internal state of the LASzip compressor and judge whether all assertions are met. This will not always immediately catch an error during decompression (e.g. we may decompress another 10000 - 50000 corrupt points before noticing the invalid decompressor state) but it will always be able to tell whether a file contains corrupted areas or not. This can simply be implemented with a '-test' option for laszip and the same check can also be added to the outputs of lasvalidate and lasinfo (only in case of failures).
 
This is *not* an additional CRC check but makes - for the most part - use of the existing code and requires only a few additional if statements that either return a bad code or throw an exception. What is better in terms of run-time efficiency? Are exceptions thrown in the innermost loop and caught whenever reading (or seeking to) a point a performance issue?
 
The advantage here is minimal changes and all existing LAZ content can immediately be checked without recompressing or otherwise touching existing content. It will be as if validity verifiability had always been part of LAZ (which it kind of has ...).
 
(2) It was also suggested to add this check to both LAS and LAZ (via a VLR)
 
This would (a) mean an incompatible addition that will not cover existing content and would (b) mean an addendum to the LAS format. This will need to go through the LAS Working Group (LWG) of the ASPRS. I have heard negative opinions about the strong CRC checking that is integral part of the E57 format from those trying to implement it as they seem to have a big affect on performance and may not be suited for massive data formats that are also used as working formats (such as LAS).
 
(3) How should partial LAZ file corruptions be handles?
 
Most commonly a single bit flip will destroy maximally 50000 points (unless it happens at the very beginning in some particular bit in the header or at the end in some particular bit in the chunk table). Maybe a '-recover' option for LASzip that outputs as many recovered point blocks as possible?
 
How about for the API / DLL? We cannot do an "unread" of already read points from a block for which we later discover a bit corruption. We could - once the issue is discovered - skip to the next valid part of the file but we cannot undo the unknown number of, say 5000 to 45000 corrupt points that were already handed over to the user who is typically parsing the file with a call to the lasreader->read_points() functions and may each tme directly process these points already.
 
(4) How should LAZ file truncations be handled?
 
At the end of the LAZ file is the table that allows spatial indexing and seeking in the file. If it is truncated off then indexed reads and seeks are slower. Also the integrity checks for earlier content will be a bit weaker. Currently lthe decompressor tries to repair truncated files and decompress as much as possible from the file (with the last few thousand points being likely corrupt). For the latter a simple clip against the bounding box will usually filter those out.
 
(a) How should LAZ file truncations be handled by LASlib and the LASzip library?
(b) How should LAZ file truncations be handled by LASzip DLL?
(c) How should LAZ file truncations be handled by the laszip.exe tool ?
 
(5) How should we deal with fatal bit flips in the header?
 
I suggest not at all. After all, the ASC, the SHP, and the BIL format (and maybe many other raster formats) do not do any such error checking either.
 
Regards,
 
Martin @rapidlasso
 
--
http://rapidlasso.com - open and transparent compression of LiDARs
 

Thomas Knudsen

unread,
Nov 12, 2014, 7:31:35 AM11/12/14
to last...@googlegroups.com
Given that failing immediately is obviously not possible (unless introducing additional latency through a look-ahead buffer decompressing the first 50000-100000 points before handing off any of them to the application level), I believe that the library should stick to the philosophy of "fail early, fail noisily" - no reason to try to repair anything, since the shit has already hit the fan at the application level.

But what "fail early, fail noisily" should be taken to mean in the LASlib case is probably highly ambiguos in the C++-world?

From a C/POSIX point of view, I think my earlier suggestion of "return 0 and set errno" is both simple and idiomatic. It would also be perfectly useful (but probably not be squarely idiomatic) in the C++ world.

Also, the return 0/set errno route would not make it even harder to interface LASlib and C.

/thomas

Terje Mathisen

unread,
Nov 12, 2014, 11:13:41 AM11/12/14
to last...@googlegroups.com
Martin Isenburg wrote:
> Hello,
> thank you for the useful feedback. Everybody seems to want some form
> of sanity checking. Let me summarize the most common feedbacks.
> (1) Everubody wants a tool that can validate the integrity of a LAZ file.
> Here my proposed solution: We can monitor the internal state of the
> LASzip compressor and judge whether all assertions are met. This will
> not always immediately catch an error during decompression (e.g. we
> may decompress another 10000 - 50000 corrupt points before noticing
> the invalid decompressor state) but it will always be able to tell
> whether a file contains corrupted areas or not. This can simply be
> implemented with a '-test' option for laszip and the same check can
> also be added to the outputs of lasvalidate and lasinfo (only in case
> of failures).
> This is *not* an additional CRC check but makes - for the most part -
> use of the existing code and requires only a few additional if
> statements that either return a bad code or throw an exception. What
> is better in terms of run-time efficiency? Are exceptions thrown in
> the innermost loop and caught whenever reading (or seeking to) a point
> a performance issue?
> The advantage here is minimal changes and all existing LAZ content can
> immediately be checked without recompressing or otherwise touching
> existing content. It will be as if validity verifiability had always
> been part of LAZ (which it kind of has ...).

From what you write later it seems like 50K points is the block size,
this corresponds to (usually) 100-300KB of laz data, right?

If it is less than a couple of MB when decoded, then I would suggest you
simply default to decoding a full block at once, and immediately signal
if any errors are found.
> (2) It was also suggested to add this check to both LAS and LAZ (via a
> VLR)
> This would (a) mean an incompatible addition that will not cover
> existing content and would (b) mean an addendum to the LAS format.
> This will need to go through the LAS Working Group (LWG) of the ASPRS.
> I have heard negative opinions about the strong CRC checking that is
> integral part of the E57 format from those trying to implement it as
> they seem to have a big affect on performance and may not be suited
> for massive data formats that are also used as working formats (such
> as LAS).
> (3) How should partial LAZ file corruptions be handles?
> Most commonly a single bit flip will destroy maximally 50000 points
> (unless it happens at the very beginning in some particular bit in
> the header or at the end in some particular bit in the chunk table).
> Maybe a '-recover' option for LASzip that outputs as many recovered
> point blocks as possible?

Yes!

Using a separate las2las -recover run to fix any broken file seems
reasonable.
> How about for the API / DLL? We cannot do an "unread" of already read
> points from a block for which we later discover a bit corruption. We
> could - once the issue is discovered - skip to the next valid part of
> the file but we cannot undo the unknown number of, say 5000 to 45000
> corrupt points that were already handed over to the user who is
> typically parsing the file with a call to the lasreader->read_points()
> functions and may each tme directly process these points already.
> (4) How should LAZ file truncations be handled?

I would require a separate "las2las -recover" run for any LAZ collection
with one or more partially broken files.

Terje

Evon Silvia

unread,
Nov 12, 2014, 1:02:18 PM11/12/14
to last...@googlegroups.com
On Wed, Nov 12, 2014 at 5:22 AM, Terje Mathisen <terje.m...@tmsw.no> wrote:
From what you write later it seems like 50K points is the block size, this corresponds to (usually) 100-300KB of laz data, right?

If it is less than a couple of MB when decoded, then I would suggest you simply default to decoding a full block at once, and immediately signal if any errors are found

​This is the first thing that came to mind for me as well. It's usually more efficient in the first place to read and buffer a hundreds-of-KB chunk
​ of data from disk once, then retrieve it from memory one point at a time, so you might even seen a performance boost as a side effect. Most users of the LASlib are probably reading the points sequentially anyway (repeated calls to lasreader->read_point()), at least to a certain extent. 
Since Martin has already built similar assumptions into LAX and LAZ, it seems to follow naturally to read read and validate the entire 50k point chunk whenever the point reader has "used up" (so to speak) the previously-read chunk. 

The implementation code-wise would be a matter of adding another layer of code between lasreader->read_point and the actual disk read call. This extra layer would be a simple logic switch that either retrieves a point from the buffer of points already read (if the next point requested is the next one on disk) or retrieves another chunk from disk (if the buffer is empty OR if the user seeked to a different position). If I'm reading Martin's email correctly, then he has already decided that the size of this chunk/buffer should be around 50000 points, which sounds like a reasonable starting point to me.

​Evon​


Andrew Bell

unread,
Nov 12, 2014, 1:30:00 PM11/12/14
to last...@googlegroups.com
On Wed, Nov 12, 2014 at 5:46 AM, Martin Isenburg <martin....@gmail.com> wrote:
Hello,
 
thank you for the useful feedback. Everybody seems to want some form of sanity checking. Let me summarize the most common feedbacks.
 
(1) Everbody wants a tool that can validate the integrity of a LAZ file.
 
Here my proposed solution: We can monitor the internal state of the LASzip compressor and judge whether all assertions are met. This will not always immediately catch an error during decompression (e.g. we may decompress another 10000 - 50000 corrupt points before noticing the invalid decompressor state) but it will always be able to tell whether a file contains corrupted areas or not. This can simply be implemented with a '-test' option for laszip and the same check can also be added to the outputs of lasvalidate and lasinfo (only in case of failures).
 
So you're essentially talking about turning assertions into release mode constructs?  This seems fine, but no discussion need take place at all to do this -- simply throw some exceptions on test failures.  Why would this even be an option?  It should just happen.

--
Reply all
Reply to author
Forward
0 new messages