Efficient "streaming" of pixels from large GeoTiff _without_ intermediate place holder

56 views
Skip to first unread message

Mansour Raad

unread,
Aug 20, 2016, 9:10:02 AM8/20/16
to geotrellis-user
Please advise on most effective way to stream the lat,lon,double-value of a geotiff to say for example a CSV file without holding all the pixels first in an array buffer and flat mapping it.

this is what I'm doing now - there gotta be a better way for large tiffs - Thanks

sc.hadoopGeoTiffRDD(path)
.flatMap {
case (extent, tile) => {
val rasterExtent = RasterExtent(extent.extent, tile.cols, tile.rows)
val rows = new ArrayBuffer[(Double, Double, Double)](tile.cols * tile.rows)
tile.foreachDouble { (col, row, z) => {
val (lng, lat) = rasterExtent.gridToMap(col, row)
rows.append((lng, lat, z))
}
}
rows
}
}
  .map(....)

Eugene Cheipesh

unread,
Aug 22, 2016, 12:44:22 PM8/22/16
to geotrel...@googlegroups.com
Hello Mansour,

To clarify with you and the list: Reading a GeoTiff fully into memory before decoding is problematic past certain size as it is very easy to exceed java heap.

We are working on a feature to extend of GeoTiff reader to do windowed and streaming GeoTiff reading. 

Currently the actual logic to do a streaming read of GeoTiff seems to be worked out, the work to do a streaming read from S3 and HDFS is in progress. 

What we’re aiming for is to be able to split read a large GeoTiff into a set of windows on read, the windows themselves can be of a reasonable tile size (ex: 512x512). You’d still need to flatMap over the resulting window tiles, but they will be GCed as you’re done with them.

I expect this feature will be fully merged by mid September.

Thank you,
-- 
Eugene Cheipesh
--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rob Emanuele

unread,
Aug 22, 2016, 1:48:59 PM8/22/16
to geotrel...@googlegroups.com

What you can do in the meantime is use the "split" method to split large geotiffs up into smaller tiles, and then repartition the RDD so that you will not experience out of memory errors (which I am assuming is the motivation for the question).

An example is here https://github.com/lossyrob/geotrellis-ned-example/blob/master/src/main/scala/elevation/Main.scala#L89


To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-user+unsubscribe@googlegroups.com.

Mansour Raad

unread,
Aug 22, 2016, 1:56:30 PM8/22/16
to geotrellis-user
Thanks - crazy idea - how about an off-heap solution using the unsafe API ? tho - not sure how to flapMap that array back into the spark flow - idea ?
Reply all
Reply to author
Forward
0 new messages