Native JavaScript LAZ support

214 views
Skip to first unread message

Howard Butler

unread,
Jul 7, 2014, 12:30:51 PM7/7/14
to las...@googlegroups.com
All,

One of the challenges we all struggle with as point cloud practitioners is data volume. LAZ has been an excellent solution to that challenge, but as-is it has a couple of limitations. For http://plas.io, Uday Verma and I wanted to be able to natively decompress LAZ data in a browser, with no plugins, no special sidecar software, and across all WebGL-capable browsers. I want to give you an update on this effort and highlight some of the success that we've achieved.

One of the most significant challenges is there is no specification of how to write LAZ data -- it is simply the code at laszip.org that does it. LASzip's choice of using the LGPL license is a recognition of this limitation. The LGPL does not allow a proprietary developer to take LASzip's codebase and write insignificant-but-different variants of LAZ data using the LASzip codebase. It does not prevent a clean-room reimplementation, but the cost is high and not valuable considering you can already use the LASzip code for free if you participate under the rules of the LGPL.

In light of this fact, a from-scratch port of LASzip to another language like JavaScript is a harsh task. It will be perpetually out of sync with the main LASzip codebase (which is essentially the specification of how to write LAZ). It will potentially write incompatible data due to missing small intricacies of the format and lack of an LAZ verification mechanism. It is more fruitful to simply compile LASzip into another language and let the compiler handle the porting task at the cost of some performance.

Emscripten [1] is an LLVM-to-JavaScript compiler that can take C/C++ and compile it into JavaScript. It was developed by Mozilla for Firefox. Wishing to have JavaScript-native LAZ support, Uday and I embarked on using Emscripten to compile the LASzip codebase into JavaScript. The LASzip codebase didn't afford us much satisfaction, however, due to some design choices of that end up being poorly emulated in JavaScript.

We ended up refactoring [2] LASzip to better enable Emscripten to compile the C++ into JavaScript. The result of that effort is Plasio can now decompress LAZ using JavaScript, and it works on all WebGL-capable browsers -- not just Chrome.

Please visit http://plas.io/jslaz/ to test opening one of your LAZ files (or pick the drop down to fetch an example). We have tested on Chrome 35, FF 30, and IE11 and it seems to work on all three. Performance isn't blistering, but it's much faster in total versus uncompressed download. We would be interested to hear any hiccups or trouble you might have. Tickets of such issues at https://github.com/verma/plasio/ would be appreciated.

Howard
http://pdal.io


[1] https://github.com/kripken/emscripten
[2] https://github.com/verma/laz-perf

Thomas Knudsen

unread,
Jul 8, 2014, 8:40:55 AM7/8/14
to las...@googlegroups.com
The lack of a proper spec is certainly the most annoying aspect of LAZ. I work for a national mapping agency, and one of our roles is that of "geodata custodian", i.e. ensuring the long term (multiple century) accessibility of geodata. Using open and well specified file formats is a crucial element of fulfilling this role - which makes LAZ tough to sell here.

I have mentioned the problem to Martin, but I think he is too busy with more productive things than writing specifications for compressed file formats (although somewhat relevant inspiration can be found in IETF RFCs 1950-1952 [1-3], should anyone have the guts :-) .)

I have not taken a look at your refactored code yet, but I really hope you and Martin will collaborate to to merge your changes into the official release: Enabling in-browser LAZ-viewing is a wonderful way of making LiDAR observations available to the general public. Having the emscripten/HoBu version drift out of sync with the Rapidlasso version would be a sad and bad thing.

Thomas


Martin Isenburg

unread,
Oct 19, 2014, 5:50:39 AM10/19/14
to The LAS room - a friendly place to discuss specifications of the LAS format
Hello Thomas,

maybe having more than one LAZ implementation will actually "strengthen" the LAZ format to ultimately become an officially adopted standard (ASPRS? ISPRS? OGC?) because it is then no longer solely seen as "Martin's format". I agree your worry that having the emscripten/HoBu laz-perf version drift out of sync with the rapidlasso LASzip version would be a horrible thing to happen. But the wide-spread support that LAZ has today has a lot to do with Howard's efforts to promote it via libLAS and secure funding for the first open source release via the Army Corps of Engineers (see "Gold Sponsors" on http://laszip.org). Given that the two teams who are working on independent LAZ libraries are exactly those who have invested many years of efforts into seeing LAZ succeed as a standard should ease your worries somewhat.

What is more important? A well-written specification or one - or now two - reference implementations? The standard LAS format, for example, has a specification but still no reference implementation. We have deficiencies in the specification language for the full waveform extension that was introduced with LAS 1.3 FWF. The inaccuracies in description were not found until the leading vendors (Riegl/Optech/Trimble) had released many Terabytes of non-spec-conform full waveform LiDAR as LAS 1.3 files utilizing point types 4 or 5. And the wording of the specification has still not been fixed. However, more people are requesting full waveform deliveries in PulseWaves instead of LAS FWF (my strong recommendation due to the incomplete waveform information in a LAS FWF file), so this is becoming less and less important to fix. More on that here:


This just as an example that the LAS specification that you consider (so I assume) an open and well specified file format is not perfect either. It seems the E57 folks have done a better job at creating both, a specification and an implementation because they followed the protocol of a standardization organization from the start and had the benefit of starting with the LAS format as an example (but they still do not have compression). The LAS format has a very different history and (despite minor hick-ups) the LiDAR community can really be grateful to those visionaries who started this format so early on. In contrast, the folks working with MBES data have still no standard exchange format. Their workflows seem entirely dominated (and locked-in) by whatever vendor they are tied to (CARIS, HYPACK, QPS, ...) ... (-:

Martin @rapidlasso

--
--
You are subscribed to "The LAS room - a friendly place to discuss the the LAS or LAZ formats" for those who want to see LAS or LAZ succeed as open standards. Go on record with bug reports, suggestions, and concerns about current and proposed specifications.
 
Visit this group at http://groups.google.com/group/lasroom
Post to this group with an email to las...@googlegroups.com
Unsubscribe by email to lasroom+u...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "The LAS room - a friendly place to discuss the LAS and LAZ formats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lasroom+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Howard Butler

unread,
Oct 20, 2014, 1:53:13 AM10/20/14
to las...@googlegroups.com

On Oct 19, 2014, at 4:49 AM, Martin Isenburg <martin....@gmail.com> wrote:

> Hello Thomas,
>
> maybe having more than one LAZ implementation will actually "strengthen" the LAZ format to ultimately become an officially adopted standard (ASPRS? ISPRS? OGC?) because it is then no longer solely seen as "Martin's format". I agree your worry that having the emscripten/HoBu laz-perf version drift out of sync with the rapidlasso LASzip version would be a horrible thing to happen. But the wide-spread support that LAZ has today has a lot to do with Howard's efforts to promote it via libLAS and secure funding for the first open source release via the Army Corps of Engineers (see "Gold Sponsors" on http://laszip.org). Given that the two teams who are working on independent LAZ libraries are exactly those who have invested many years of efforts into seeing LAZ succeed as a standard should ease your worries somewhat.

This may be true, but it doesn't prevent another party from coming in without the same intention and desire to get along and go with the flow. You are correct that I am very interested in not disrupting LAZ with our efforts to get it working in Javascript, but at the same time, I'm disappointed by your disinterest in moving forward with our efforts as LASzip software. Our effort is better engineering (works on ARM, JavaScript, benchmarks faster, more flexible, modern C++) and it is byte-compatible with current LASzip. We put a lot into it. It is frustrating to have to chose between the burden of maintaining a separate fork or trying to go back and put our square peg (Javascript support) in a round hole (current LASzip codebase) (tried and failed).

Another frustrating aspect of LASzip as a second banana implementer is you've been able to permute the wire format at your leisure. This is just fine if it's only you who has to worry about it, but now that there is at least one other implementation, what are we supposed to do with regard to things like your 1.4 compatibility or potentially baking in spatial sorting to the format proper? Again, we've chosen to follow along, but this posture is a hindrance. Do we get to have a say on how such things are to be implemented? Design principles that should be followed? Such are the travails of software formats and their implementers...

Government agencies like the ones Thomas represents get very uneasy with the idea of community-based formats because it is easy for this community to be disrupted in intentional and unintentional ways. True standards, in a true standards body with real release, development, and conflict procedures make those agencies feel more comfortable about mandating formats in contracting language. Because these organizations and/or their derivatives are funding the collection of a majority of the LiDAR data at this time, they have an outsized impact in determining formats of the entire ecosystem.

In that regard, I think LAZ has wildly outperformed an expected market adoption due to its fantastic achievement of its stated task. It works awesome and it is free -- this makes it very difficult for a commercial vendor to come along and try to sell a better mouse trap. A government agency, however, can feel comfort in mandating a closed format due to the fact that they have only one body to deal with to achieve its goals. An open standard (not simply a specification) has the same properties -- a single entity -- that provides that same comfort. A community standard, with slightly incompatible implementations and no body to evolve the format beyond the needs of commercial interests who are its gatekeepers, causes anxiety even if there is no real reason to be nervous as you've described above.

This is the sociological aspect of zLAS that LAZ hasn't addressed. It isn't simply enough to tackle every real and imagined technological challenge that ESRI's format introduces. It's also the fact that USGS or OS (UK) or CountryGovernmentDataAgency don't have to herd us cats into getting the format to respond to their data archival needs which they then turn into contracting language. Now that we have at least two different-but-derived implementations, we get to test whether or not the LAZ ecosystem better preserves its past in compatible ways. LASzip has done a perfect job thus far, but that's only one piece of software with one set of commercial priorities tugging on it. Now there's at least one more...

I'll note that I have lots of experience with another community-based format -- I am an author of the GeoJSON specification. We have achieved great success with the format by being supremely stubborn in never changing it. It has not evolved in six years since the first document was released. It has plenty of warts, is terribly inefficient, redundant, and plodding. We've lost count of the ways we've been told we're idiots. Indeed, many have made much better mouse traps, but GeoJSON is the one that has had the widest penetration due to the fact that it followed prior art, it is easy to implement by following simple examples, and it side-steps the cat herding by always saying no to evolving its scope into anything more than it was 6 years ago. LAZ has some of these same properties (follow LAS, evolved slowly, and long stability), and we should be very careful about tinkering with it.

> What is more important? A well-written specification or one - or now two - reference implementations?

Ideally, I want a standard first, then one or two reference implementations, in a body that has a history of doing standards development, ratification, and promotion. Hobu's current efforts are focused on reading LAZ data, and while the laz-perf software can write byte-compatible LAZ, we aren't really doing so beyond some simple tests. There are no validation tests of LAZ beyond making sure LASzip can read/write it. There's no conformance, validation, or regression suite to protect against inadvertently screwing up. The only protection we've had thus far is taking great care. The advantage is both our implementations are open source software, and anyone can submit a patch to fix/change/improve it. This is its disadvantage too though, with ample possibility of all implementations being out of sync.

I think it would be wonderful for a CountryGovernmentDataAgency to fund you to write out a full document for how to make LAZ. This document would be a great starting point for standardization in the organization of your and CountryGovernmentDataAgency's choosing. In my opinion, it would be a worthwhile effort.

Howard



Michael Gerlek

unread,
Oct 20, 2014, 11:01:38 AM10/20/14
to las...@googlegroups.com
Howard is wise and well-spoken.

Two thoughts I had on the way in this morning:

(1) The world does not need two implementations of a single spec UNLESS the two different implementations exist for very different reasons. For example, it might make sense to have one implementation that is GPU-enabled and one that is tight scalar code. Or one that is designed for stability and robustness, and one that is designed for experimentation and research. Sometimes, such different objectives will indeed justify a different codebase. In the case of Martin’s codebase, I suspect his interest is in his own future extension work; others like Howard and myself are more interested in stability and performance.

(2) There are several of us in this community who have considerable experience in implementing file formats, writing specs, and shipping production code. I’d submit that, along the lines of how some other open source projects are run [hi, geotiff!], a small set of people might want to step up, draft a spec on a wiki, for the code base, and set up a formal release process.

Let me be crystal clear: I share Howard’s sentiments and by no means wish to denigrate Martin’s work or cut him out of the loop. However, if the community of production-oriented users has different goals than Martin does, it might well make sense to fork.

-mpg

Michael Rosen

unread,
Oct 20, 2014, 12:57:42 PM10/20/14
to las...@googlegroups.com
+1.

I've always thought of the article published in Feb 2013, http://digital.ipcprintservices.com/publication/?i=144145 as the LAZ Specification.  Not as concise as some might like but certainly more accessible than some of its more verbose cousins.  Here's the article in PDF:  https://www.cs.unc.edu/~isenburg/lastools/download/laszip.pdf

msr

Michael Gerlek

unread,
Oct 20, 2014, 3:12:50 PM10/20/14
to las...@googlegroups.com
+1. Yes, the article was very good and I would expect it to serve as the start of the spec.


And for the record, by “spec” I don’t mean something as verbose as the usual OGC or JP2 docs... I think one could do a “good enough” job at 80% of what is needed to define the encoder and byte format in 20% of the page count!

Maybe we should do a kickstarter campaign for this :-)

-mpg

Reply all
Reply to author
Forward
0 new messages