indexing directories of many LAZ files

415 views
Skip to first unread message

hank wilde

unread,
Jun 11, 2015, 10:56:11 AM6/11/15
to last...@googlegroups.com
Hello has anyone developed a tool or system for indexing many LAZ files (containing many LAS files) ?  I have written some small scripts that go through a directory of LAS files, pull the header min & max, and then can quickly tell which LAS file contains any given GPS point.  This process gets more complicated when there are huge LAZ files to be uncompressed on the fly.  I am thinking of some kind of indexing script that leaves small txt files behind for an index.  I suspect other projects have run into the same issue.  Let me know how you handle it and how you organize all the data.  
Thanks

Hank

mc...@u.washington.edu

unread,
Jun 11, 2015, 11:07:16 AM6/11/15
to last...@googlegroups.com
The FUSION package has a Catalog utility that creates an HTML report and CSV file with header info. It can optionally do pulse and return density rasters and images and intensity images and provides a validation check to identify common problems with LAS/LAZ tiles. FUSION supports LAZ using Martin's LASzip dll so you need to add a copy of the dll to FUSION's install folder to read/write LAZ files.

You can download FUSION from here:
http://forsys.cfr.washington.edu/fusion/fusionlatest.html

Bob

==========================================================
Bob McGaughey USDA Forest Service
(206) 543-4713 University of Washington
FAX (206) 685-0790 Bloedel 386
PO Box 352100
bmcga...@fs.fed.us Seattle, WA 98195-2100
==========================================================
> --
> Download LAStools at
> http://lastools.org
> http://rapidlasso.com
> Be social with LAStools at
> http://facebook.com/LAStools
> http://twitter.com/LAStools
> http://linkedin.com/groups/LAStools-4408378
> Manage your settings at
> http://groups.google.com/group/lastools/subscribe
>
>

Evon Silvia

unread,
Jun 11, 2015, 1:33:11 PM6/11/15
to last...@googlegroups.com
The header of the LAZ file isn't substantially different from the header of the LAS file, including the min/max values. Any well-written tool that works on indexing LAS headers should also work on LAZ headers with little to no modification. The only feature that changes (off the top of my head) is the value of the point format.

Evon

hank wilde

unread,
Jun 11, 2015, 1:49:59 PM6/11/15
to last...@googlegroups.com
OK Thanks Bob, I will check on FUSION.  

To be more clear for Evon, I have trajectory files with GPS lat/lon points.  I need a way to quickly find the corresponding subset of LAS tiles that cover this trajectory.  Simply looking at the LAZ header is an ok start, but the LAZ files may contain hundreds of LAS tiles.  So I need to uncompress the LAZ, examine each LAS header, and find the subset of LAS files such that the GPS lat/lon is within the bounding box.  

I need to do this process repeatedly for many different trajectories, and it's not feasible to uncompress all the data.  I am thinking of making the script uncompress the LAZ of many, and instead compress each LAS file individually to LAZ.  If this is the setup then I can read the headers from the LAZ and it will be 1:1 corresponding LAZ:LAS.

So maybe I should ask a different question, is it possible to go from one giant LAZ file with hundreds of LAS inside, to individual LAZ files (one per LAS file) ?    Then I can just use the headers directly.

Is there any space savings in the compression going 100LAS -> LAZ or   1 LAS -> LAZ  ?

Hank

Martin Isenburg

unread,
Jun 11, 2015, 2:13:05 PM6/11/15
to LAStools - efficient command line tools for LIDAR processing
Hello,

for about three years I have now been waiting for the team of OpenTopography to provide me with a write-up to be featured on the rapidlasso blog that would desribes how they are using laszip, lasindex, and lasmerge to serve billions of compressed and indexed LiDAR points in compressed LAZ format via the http://opentopography.org portal. Hopefully sometimes in the future I will be able to point you to such an article (hint hint ... (-;).

However, there is a great litte project from Hugo Ledoux of TU Delft:

http://3dsm.bk.tudelft.nl/matahn

who uses open-layer, a Python-based server (Flask) and PostGIS to
store the tiles' boundaries and all the metadata. If you have such a
setup then you can let PostGIS do the intersection of tile boundaries
with area-of-interest query and hand only a '-lof list_of_files.txt'
to las2las as input that contains a list of those LiDAR file names
that are actually overlapping the query area ... he provides open source code on github that may be an inspiration to you.

You may aso find this discussion worthwhile 
that goes a little deeper with the work of Oscar Martinez Rubi of TU Delft who compares a LAStools read-only data base of LAZ + LAX with Oracle and PostGIS database.

I have plans to add a "super-index" file to the LAX functionality. I was hoping to be able to build upon the LAS Dataset work of ESRI - rather than baking my own cake - but they have not yet released a description of the LAS Dataset format. 


But given Jack Dangermond's reassurance to work with the LiDAR community in response to the "LAZ clone" controversy maybe we'll be seeing a few more open format contributions from ESRI


Otherwise I'll probably end up using a simple shapefile of bounding boxes as this "super-index" file.

Regards,

Martin

Oscar Martinez Rubi

unread,
Jun 12, 2015, 6:37:09 AM6/12/15
to last...@googlegroups.com
Hi,

I will extend a bit Martin's response, at least the part about our work

The script in
https://github.com/NLeSC/pointcloud-benchmark/blob/master/python/pointcloud/lidaroverview.py

will create a postgres database with the extents of all the LAS/LAZ files in a folder and can do that in parallel (python multiprocessing)

Once the DB has been created and filled you need to run a small file selector before using the LAStools. For example in python:
# Get the list of files that overlap the query region
query = 'SELECT filepath FROM lasextent,query_table where ST_Intersects(query_table.geom,lasextent.geom ) and query_table.id = 1'
precommand1 = 'psql mydb -t -A -c "' + query + '" > list1'
os.system(precommand1)
# For lasclip we need a shapefile, so we convert the postgis geometry into a shapefile with pgsql2shp (provided by postgis)
query = "select ST_SetSRID(geom,28992) from query_table where id = 1;"
precommand2 = 'pgsql2shp -f 1.shp mydb "' + query + '"'
os.system(precommand2)
# You can finally run lassclip after you have the list of files and the shaefile
command = 'lasclip.exe -lof list1 -poly 1.shp ' + zquery
os.system(command)

BTW, note that case I have my query regions in a table

BTW2, highly recommended to run lassort and lasindex before you do any query

Regards,

O.

Andrew Bell

unread,
Jun 12, 2015, 9:48:26 AM6/12/15
to last...@googlegroups.com
On Thu, Jun 11, 2015 at 9:51 AM, hank wilde <hwi...@gmail.com> wrote:
Hello has anyone developed a tool or system for indexing many LAZ files (containing many LAS files) ?  I have written some small scripts that go through a directory of LAS files, pull the header min & max, and then can quickly tell which LAS file contains any given GPS point.  This process gets more complicated when there are huge LAZ files to be uncompressed on the fly.  I am thinking of some kind of indexing script that leaves small txt files behind for an index.  I suspect other projects have run into the same issue.  Let me know how you handle it and how you organize all the data.

We've recently added an file indexing bit to PDAL.  It works much like GDAL's tindex command and creates an index including a boundary that can be output in any format that OGR supports.  One can then use the index in "merge" mode to quickly extract points from the indexed files that are in some specified region.  See http://www.pdal.io/apps.html#id13 or write if you have questions.  Also, pdal tindex works on files other than LAS/LAZ.  It works on any point cloud format supported by PDAL.

--
Reply all
Reply to author
Forward
0 new messages