Batch processing

10 views
Skip to first unread message

nevermind

unread,
Jul 6, 2022, 1:06:50 PM7/6/22
to pytroll
Hi everybody,

I am trying to batch process multiple MODIS scenes (MOD06 cloud product). The goal would be to read the data and export it as TIFFs as a first test - afterwards I want to include the calculation of new rasters, georeferencing and cropping them to a new extent, but I guess I should get this first step working before I start the more complicated stuff. My code looks like this:

=========================
# open all MODIS HDF-files:
folder = "E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06"
list_of_paths = glob.glob(folder + '/*.hdf', recursive=True)

# load MODIS level 2 data:
for i in range(0, np.size(list_of_paths)):
file = [list_of_paths[i]]
data = {"modis_l2": file}
modis = Scene(filenames=data)
modis.load(["cloud_effective_radius"])

for i in range(0, np.size(list_of_paths)):
modis.save_datasets(writer="geotiff",
dtype=np.float32,
enhance=False,
datasets=["cloud_effective_radius"],
filename="{name}_{start_time:%Y%m%d_%H%M%S}.tif",
base_dir="E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06/batch")
=========================

The code runs without problems, but in the end I only get the one TIFF of the last hdf-file in the data folder and the other files seem to have not been exported. I am getting a NotGeoreferenceWarning for every file, though, so it seems like the modis.save_datasets function is processing all the files, just not properly saving them?

Any help is greatly appreciated!
  - Jasper

David Hoese

unread,
Jul 6, 2022, 1:22:21 PM7/6/22
to pyt...@googlegroups.com
Jasper,

Take a look at where you are creating your "modis" Scene object and when
you use it. You have two `for` loops, one where you create it and
overwrite it each time (by doing `modis = `), and then in the second
loop you actually use the `modis` variable. If you remove the second for
loop it should "just work".

MOD06 files are per-granule, right? Do you need a geotiff for each
granule or would one geotiff for a group of granules be OK? If so, you
can provide all of your files to the Scene object at once. The Scene
will concatenate the granules together and make them available as a
single long DataArray object.

The rest of your code looks reasonable and like it should perform we
best it can. Note that in your for loops you can replace the `np.size`
with `len` by doing `range(len(list_of_paths))`. However, there is an
even cleaner way of doing these for loops: `for one_file in list_of_paths:`.

Note that I am actively working on optimizing MODIS processing for my
own work. If you plan on resampling the MODIS data I would highly
recommend using the "ewa" resampler. I plan on releasing a new version
of pyresample today that has some major perform improvements for this
algorithm.

I also hope to release a new version of python-geotiepoints in the next
week which is what satpy uses to interpolate MODIS geolocation from the
lower 5km/1km resolution to the higher resolutions. This new version
will be much faster and memory efficient. So keep an eye for that.

Dave
> --
> You received this message because you are subscribed to the Google
> Groups "pytroll" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pytroll+u...@googlegroups.com
> <mailto:pytroll+u...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/pytroll/af861fc6-b07c-490c-8e41-30a2857f6119n%40googlegroups.com
> <https://groups.google.com/d/msgid/pytroll/af861fc6-b07c-490c-8e41-30a2857f6119n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages