Combine Multiple GeoTiff files into one catalog?

415 views
Skip to first unread message

James Vannordstrand

unread,
Nov 25, 2016, 10:31:58 AM11/25/16
to geotrellis-user

I have a directory full of single spectral band tiff files. 3 of the tiff files correspond to one location in the united states (Iowa) and 3 other tiff files correspond to a totally separate area in the united states (florida). I have a small scala app that takes the tiff files related to iowa and produces a multi spectral band tiff file. It then takes the tiff files related to florida and produces another multi spectral band tiff tile. The code looks like this

 

package tutorial

import geotrellis.proj4.WebMercator
import geotrellis.raster._
import geotrellis.raster.io.geotiff._

object MaskBandsRandGandNIR {
 
val directories = List("Iowa", "Florida")

 
//constants to differentiate which bands to use
 
val R_BAND = 0
 
val G_BAND = 1
 
val NIR_BAND = 2

 
// Path to our landsat band geotiffs.
 
def bandPath(directory: String) = s"data/landsat/${directory}"

 
def main(args: Array[String]): Unit = {
   
directories.map(directory => generateMultibandGeoTiffFile(directory))
  }

 
def generateMultibandGeoTiffFile(directory: String) = {
   
val tiffFiles = new java.io.File(bandPath(directory)).listFiles.map(_.toString)

   
val singleBandGeoTiffArray = tiffFiles.foldLeft(Array[SinglebandGeoTiff]())((acc, el:String) => {
      acc :+ SinglebandGeoTiff(el)
    })

   
val tileArray = ArrayMultibandTile(singleBandGeoTiffArray.map(_.tile))

    println(
s"Writing out $directory multispectral tif")
    MultibandGeoTiff(tileArray
, singleBandGeoTiffArray(0).extent, singleBandGeoTiffArray(0).crs).write(s"data/$directory.tif")
  }
}

 

 

Now what I need to do is take those multi spectral tiff files that were generated and combine them somehow.

 

// Read the geotiff in as a single image RDD,
// using a method implicitly added to SparkContext by
// an implicit class available via the
// "import geotrellis.spark.io.hadoop._ " statement.
val floridaRdd: RDD[(ProjectedExtent, MultibandTile)] = sc.hadoopMultibandGeoTiffRDD(floridaPath)
val iowaRdd: RDD[(ProjectedExtent, MultibandTile)] = sc.hadoopMultibandGeoTiffRDD(iowaPath)

val inputRdd = floridaRdd ++ iowaRdd

// Use the "TileLayerMetadata.fromRdd" call to find the zoom
// level that the closest match to the resolution of our source image,
// and derive information such as the full bounding box and data type.
val (_, rasterMetaData) =
   TileLayerMetadata.fromRdd(inputRdd
, FloatingLayoutScheme(512))
  
//inputRdd.collectMetadata(FloatingLayoutScheme(512))

// Use the Tiler to cut our tiles into tiles that are index to a floating layout scheme.
// We'll repartition it so that there are more partitions to work with, since spark
// likes to work with more, smaller partitions (to a point) over few and large partitions.
val tiled: RDD[(SpatialKey, MultibandTile)] =
  inputRdd
    .tileToLayout(rasterMetaData.cellType
, rasterMetaData.layout, Bilinear)
    .repartition(
100)

// We'll be tiling the images using a zoomed layout scheme
// in the web mercator format (which fits the slippy map tile specification).
// We'll be creating 256 x 256 tiles.
val layoutScheme = ZoomedLayoutScheme(WebMercator, tileSize = 256)

// We need to reproject the tiles to WebMercator
val (zoom, reprojected): (Int, RDD[(SpatialKey, MultibandTile)] with Metadata[TileLayerMetadata[SpatialKey]]) =
  MultibandTileLayerRDD(tiled
, rasterMetaData)
    .reproject(WebMercator
, layoutScheme, Bilinear)

// Create the attributes store that will tell us information about our catalog.
val attributeStore = FileAttributeStore(outputPath)

// Create the writer that we will use to store the tiles in the local catalog.
val writer = FileLayerWriter(attributeStore)

// Pyramiding up the zoom levels, write our tiles out to the local file system.
Pyramid.upLevels(reprojected, layoutScheme, zoom, Bilinear) { (rdd, z) =>
 
val layerId = LayerId("landsat", z)
 
// If the layer exists already, delete it out before writing
 
if(attributeStore.layerExists(layerId)) {
   
new FileLayerManager(attributeStore).delete(layerId)
  }
  writer.write(layerId
, rdd, ZCurveKeyIndexMethod)
}

 

In the code above I combine the iowaRdd and the floridaRdd by using the ‘++’ operator. Now the compiler is complaining that it has more than one CRS. I’m very new to geo-spatial analysis of any sort. Am I even on the right path of doing this?

Eugene Cheipesh

unread,
Nov 25, 2016, 11:25:31 AM11/25/16
to geotrel...@googlegroups.com
Hello James,

I’m guessing you’re getting a runtime exception on val (_, rasterMetaData) =   TileLayerMetadata.fromRdd(inputRdd, FloatingLayoutScheme(512)) that’s thrown from here.

What must be happening in that case is that erroneously or not floridaRdd and iowaRdd  do not share the same projection. When you call tileToLayout you’re essentially creating a regularly tiled raster that happens to be distributed and implicitly every tile in that raster must share the same projection. You can just println(flloridaRdd.head._1.crs) to double check this quick.

If thats whats happening you should be able to solve this by calling reproject(WebMercator, Bilinear)  on floridaRdd and iowaRdd before you attempt to collect metadata and tile them (I think that extension method is available, riffing from memory).

The spark-etl subproject attempts to deal with similar issues and you can do a little code dive here to check out the different strategies of dealing with re-projecting input rasters.

Let me know if this gets you on the right track!

--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

James Vannordstrand

unread,
Nov 25, 2016, 11:44:53 AM11/25/16
to geotrellis-user
Eugene I cannot express how much I love you right now! I reprojected them before using '++' and it worked like a charm! Thank you so much I was grinding forever trying to figure out what I was doing wrong. 

Eugene Cheipesh

unread,
Nov 25, 2016, 11:55:30 AM11/25/16
to geotrellis-user
Glad that did the trick :) Happy tilin’ !

James Vannordstrand

unread,
Nov 25, 2016, 3:10:45 PM11/25/16
to geotrel...@googlegroups.com
One more quick question. I have .tiff files that are around 40MB in size I am storing this images in an S3 bucket. I have noticed that when I am using sparkContext.hadoopMultiBandGeoTiffRDD("Path-To-S3-Bucket")  it takes forever to load one image file. It takes about 20 minutes to run my application locally when I am pulling only a single .tiff file from S3. I feel that I am doing something wrong with that kind of execution time. If I run my application locally and pull the image file directly from my local file system it takes around 1 minute to run. 

When I run the application and it pulls the image file from S3 my console output looks like this: 

16/11/25 14:07:54 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '0'
16/11/25 14:07:55 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '287880'
16/11/25 14:07:55 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '0'
16/11/25 14:07:55 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '304264'
16/11/25 14:07:56 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '0'
16/11/25 14:07:56 INFO NativeS3FileSystem: Opening key 'data/Florida.tif' for reading at position '312842'

it does this until it has loaded the whole image. Are my images files too big at 40MB in size or am I missing something? I have tried zipping the files and using the zipped file but haddopMultiBandGeoTiffRDD doesn't recognize that.

To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "geotrellis-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/geotrellis-user/9im26tWBDCo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to geotrellis-user+unsubscribe@googlegroups.com.

Rob Emanuele

unread,
Nov 26, 2016, 12:13:52 PM11/26/16
to geotrel...@googlegroups.com
Hi James,

20 minutes is way too long. I suspect it's because you are using the s3 filesystem of the Hadoop API to load them in, instead of using are S3 input formats to do this. We found the Hadoop S3 methods to be insufficient for reading rasters off of S3 well, so we created our own methods that directly use the aws-java-sdk client.

What version of GeoTrellis are you using?

If you are using 0.10, these code snippets should provide some context on how to user our S3 input formats to load rasters:
That code loads up single band rasters, but you should use the MultibandGeoTiffS3InputFormat, a la https://github.com/locationtech/geotrellis/blob/_old/0.10/spark-etl/src/main/scala/geotrellis/spark/etl/s3/MultibandGeoTiffS3Input.scala#L13-L15

If you are using 1.0.0-RC1 (which I recommend!), it should be slightly easier, with S3GeoTiffRDD:

Hope this helps, feel free to keep the questions coming! However, we're trying to shift from this mailing list to our LocationTech mailing list, as part of the project move to LocationTech: https://locationtech.org/mailman/listinfo/geotrellis-user, so if we could continue that conversation there, that would be appreciated.

Cheers,
Rob

Robert Emanuele, Tech Lead
Azavea |  990 Spring Garden Street, 5th Floor, Philadelphia, PA
remanuele@azavea.com  | T 215.701.7502  | Web azavea.com  |  @azavea
Reply all
Reply to author
Forward
0 new messages