Multiple GeoTiffs output from Spark job

54 views
Skip to first unread message

Mansour Raad

unread,
Jul 14, 2015, 2:12:33 PM7/14/15
to geotrel...@googlegroups.com
Give an RDD of (somekey -> (x,y,value))
I would like to group by key
and for each (somekey, iterator((x,y,value))) generate a GeoTiff whose content is derived from the iterator and the geotiff name is based on somekey

Thanks

Rob Emanuele

unread,
Jul 14, 2015, 2:31:24 PM7/14/15
to geotrel...@googlegroups.com
Hey Mansour,


import geotrellis.proj4._
import geotrellis.vector._
import geotrellis.raster._
import geotrellis.raster.io.geotiff._
 
import org.apache.spark.rdd._
 
trait Example {
// Some information you'll need to supply...
val crs = CRS.fromName("EPSG")
 
// Depends on your key type
type KeyType
 
// The dimensions of each GeoTiff. You could derive this from each of
// the groupings if need be.
val (cols, rows): (Int, Int) = ???
 
// We need to get the bounding box for the GeoTiff. I'm assuming this changes
// per key, but perhaps it's only one extent for each key? If so this can be a
// val instead of a def.
def extentFromKey(k: KeyType): Extent = ???
 
def filePathFromKey(k: KeyType): String = ???
 
val rdd: RDD[(KeyType, Iterator[(Int, Int, Double)])] = ???
 
rdd
.foreach { case (key, iterator) =>
val tile = DoubleArrayTile.empty(cols, rows)
for((col, row, value) <- iterator) {
tile.setDouble(col, row, value)
}
val extent = extentFromKey(key)
val filePath = filePathFromKey(key)
 
SingleBandGeoTiff(tile, extent, crs).write(filePath)
}
}

--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Rob Emanuele, Tech Lead, GeoTrellis

Azavea |  340 N 12th St, Ste 402, Philadelphia, PA
rema...@azavea.com  | T 215.701.7692  | F 215.925.2663
Web azavea.com  |  Blog azavea.com/blogs  | Twitter @azavea

Mansour Raad

unread,
Jul 14, 2015, 3:15:03 PM7/14/15
to geotrel...@googlegroups.com
Will give it a shot - Thanks !
You received this message because you are subscribed to a topic in the Google Groups "geotrellis-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/geotrellis-user/eHLTmCAi-XY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to geotrellis-us...@googlegroups.com.

Mansour Raad

unread,
Jul 14, 2015, 3:15:53 PM7/14/15
to geotrel...@googlegroups.com
BTW - can this work on HDFS if I set the path to hdfs:///…. ???
Thanks

On Jul 14, 2015, at 11:31 AM, Rob Emanuele <rema...@azavea.com> wrote:

You received this message because you are subscribed to a topic in the Google Groups "geotrellis-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/geotrellis-user/eHLTmCAi-XY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to geotrellis-us...@googlegroups.com.

Rob Emanuele

unread,
Jul 14, 2015, 3:25:34 PM7/14/15
to geotrel...@googlegroups.com
No, you'd have to move the files to HDFS afterwards. There's some code that I've been meaning to write to allow us to write a GeoTiff to a byte array instead of a path; you'd be able to use that to write an HDFS file just like you'd write any sort of byte array to HDFS. Let me see how quickly that can be added.

Mansour Raad

unread,
Jul 14, 2015, 3:31:54 PM7/14/15
to geotrel...@googlegroups.com
So….all has to done in local[*] context ? no spark://// ?

Rob Emanuele

unread,
Jul 14, 2015, 4:26:10 PM7/14/15
to geotrel...@googlegroups.com
No, but the local file save would happen on the workers. You could use RDD .pipe to pipe the file names to a script that moves the files to HDFS.

Rob Emanuele

unread,
Jul 14, 2015, 5:03:01 PM7/14/15
to geotrel...@googlegroups.com
This PR allows you to write a GeoTiff to a byte array: https://github.com/geotrellis/geotrellis/pull/1137

From there, you have the bytes of the file, so it's a matter of saving it to HDFS, either directly in the foreach loop, or by using something like BytesWritable.
Reply all
Reply to author
Forward
0 new messages