Hi Luca,
There's no example that exists, but I can help you try to work through it.
Here's the approach I'm thinking:
1. Have source rasters listed in the local file system or HDFS.
2. You'll have use the hadoop InputFormat that we wrapped GDAL in to read with GDAL through the HDFS library, for either local file system or HDFS. Using the GDAL bindings has some implications about what needs to be installed on a system for it to run, but there's a system for setting up worker nodes on a cluster with what it needs.
3. With that you'll have an RDD of tiles. Perhaps you'll want to chunk them up to speed up parallelism, which depends on the resolution and size of your input data.
4. map the RDD to CSV record rows. If you want all the files to map to a single CSV file, then do a foreach and write the CSV files. If you want one big CSV, then you can convert the CSV rows to string and save them off through the spark hadoop api.
Does this sound like a good approach? Let me know if so, and I can start thinking about what the code will look like.
Cheers,
Rob