conversion BIL+HDR 2 CSV

Luca Bardone

unread,

Jul 15, 2015, 5:09:59 PM7/15/15

to geotrel...@googlegroups.com

Hi all, I'm new to Geotrellis and Scala related stuff so forgive my naivety.

I have to convert bil files to csv, at the moment I'm using a python+gdal script.

I'm wondering if such conversion is allowed by geotrellis (+ ApacheSpark).

If so, could anyone aim me to some code example ?

Thanks a lot

Luca

Rob Emanuele

unread,

Jul 15, 2015, 8:43:37 PM7/15/15

to geotrel...@googlegroups.com

Hi Luca,

There's no example that exists, but I can help you try to work through it.

Here's the approach I'm thinking:

1. Have source rasters listed in the local file system or HDFS.

2. You'll have use the hadoop InputFormat that we wrapped GDAL in to read with GDAL through the HDFS library, for either local file system or HDFS. Using the GDAL bindings has some implications about what needs to be installed on a system for it to run, but there's a system for setting up worker nodes on a cluster with what it needs.

3. With that you'll have an RDD of tiles. Perhaps you'll want to chunk them up to speed up parallelism, which depends on the resolution and size of your input data.

4. map the RDD to CSV record rows. If you want all the files to map to a single CSV file, then do a foreach and write the CSV files. If you want one big CSV, then you can convert the CSV rows to string and save them off through the spark hadoop api.

Does this sound like a good approach? Let me know if so, and I can start thinking about what the code will look like.

Cheers,

Rob

--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Luca Bardone

unread,

Jul 16, 2015, 3:14:52 AM7/16/15

to geotrel...@googlegroups.com

Hi Rob,

Thank so much for your replay!

I agree with your proposed way, and add some details about my bil files.

I am prototyping a solution for doing geographic calculation in ApacheSpark architecture and Geotrellis seems to be the right library for that purpose.

I receive a bil file (+hdr) of about 130MB every quarter of an hour and I would like to process it within that time.

The full process means to load the file as RDD (possibly) and join with other RDD to extract analytics.

At the moment I am focused on the first step of loading a bil file (from local fs) and converting to csv to follow my actual analysis chain, next step will be to set up the full process into ApacheSpark and HDFS.

I look forward to get deeper into your proposal,

Bye

Luca

Reply all

Reply to author

Forward