NERSC cutout service?

31 views
Skip to first unread message

Andrew Engel

unread,
May 1, 2025, 9:11:14 AM5/1/25
to DECam Legacy Survey
Howdy,

This summer I'm going to create a cutout dataset of all spectroscopically confirmed galaxies in the ls-optical catalog. I've done this a few times in the past using the cutout service, but I recently got access to NERSC and recall there being a cutout service local to NERSC. I think instead of using the http client, a more effective plan would be to use that NERSC-local cutout service, stream the image data into large assembled chunks of data (~50Gb or so in size) then use globus or some robust service to download the chunks to my institution's cluster. 

Problem is, I can't find the documentation for running the NERSC cutout service. If anybody would be willing to point me in the right direction I would very much appreciate it!

Thanks,
--Andrew 

Dustin Lang

unread,
May 1, 2025, 11:00:35 AM5/1/25
to Andrew Engel, DECam Legacy Survey
Hi Andrew,

Great, you should find much better performance running directly at NERSC.

An example cutout command is

shifter --image docker:dstndstn/cutouts:latest cutout --output=cutout.jpg --ra=180. --dec=10. --size=227 --layer ls-dr9

Help text is available via
shifter --image docker:dstndstn/cutouts:latest cutout -h

If the output filename ends in .fits, FITS output will be produced.

cheers,
dustin

--
You received this message because you are subscribed to the Google Groups "DECam Legacy Survey" group.
To unsubscribe from this group and stop receiving emails from it, send an email to decam-legacy-su...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/decam-legacy-survey/240305cb-4dc5-4ae1-87ec-c535b89ef08fn%40googlegroups.com.

Andrew Engel

unread,
May 21, 2025, 9:03:03 AM5/21/25
to DECam Legacy Survey
Thanks Dustin,

Finally getting some progress on evaluating this-- wanted to reach back out to quickly update you and see if I can get more advice.

As a bit of a change in strategy, it works out better for access that I simply copy over from NERSC the ls cutout data using GLOBUS then stand up your container on my local server. I'm now at the stage where I can use many-cutouts.py on a table of objects and stream files to a scratch space (cool!). Since many-cutouts.py is a different command than you suggested above, I wanted to check with the author to make sure it was the suggested script for doing something like building a cutout database for ~16Million galaxies. 

I wanted to know if there were any performance consideration in that script I should enable. Taking a read through it, there is some caching of files that occur. My plan was to take my all-sky table of spectroscopically confirmed galaxy locations, mask down to objects with RAs that should all be in the 000 brick directory, then sort my DB by declination so that hopefully I'm getting overlaps on bricks and take advantage of the cache.

Typing it all out now, I think a better strategy would be to find a way to mask and sort my DB so that I'm getting all objects that exist on a brick, which should ensure I get the cache.

Regardless, wanted to reach back out and ask if there were other performance/stability considerations? I found that I could generate about 300 cutouts in maybe 30 sec locally.

Andrew Engel

unread,
May 21, 2025, 9:07:09 AM5/21/25
to DECam Legacy Survey
Sorry multiple messages-- to clarify, I was masking down to all objects in the 000 brick directory as part of my initial test. But I did want to ask re. downloading data from LS using globus what the actual critical files were that the cutout service uses? From my eye, it only uses the image and the mask files?

Thanks again
--Andrew

Dustin Lang

unread,
May 21, 2025, 9:36:55 AM5/21/25
to Andrew Engel, DECam Legacy Survey
Hi Andrew,

I had forgotten entirely about many-cutouts.py.  Do notice that it looks like it defaults to ls-dr8 rather than ls-dr9 or ls-dr10.  It's a very simple script - it's just looping over entries in a table that you give it, running the cutouts one at a time.  Depending on your computing setup, you may find some speedups by parallelizing it - maybe just by cutting your input table into pieces and running each one in a separate process.  But it's very likely that your Globus transfer is going to be by far the most time-consuming part.

Yeah, like you said, it looks like only the images are required.  Just test and find out.

If your objects are near a brick boundary, they may also pull in data from a neighboring brick.  So in general, if you're pulling in one-degree RA strips at a time, you'll want to leave yourself a margin -- pull in 359, 0 and 1 before starting on 0, then pull in 2 before starting on 1, etc.

I just spot-checked and the images take about 15% of the space in the "coadd" directory, so for the whole DR9 southern footprint you're looking at about 10 TB.

cheers,
dustin







John Moustakas

unread,
May 21, 2025, 9:50:23 AM5/21/25
to Dustin Lang, Andrew Engel, DECam Legacy Survey
In case it's helpful, this script is MPI+multiprocessing parallelized
over Dustin's cutout script--
https://github.com/desihub/fastspecfit/blob/main/bin/get-cutouts

Please feel free to take whatever is useful (or not!).

Cheers,
-JM

PS. I'm definitely interested in the database of images (+ whatever
other data products you're making) once you have them!
> To view this discussion visit https://groups.google.com/d/msgid/decam-legacy-survey/CA%2BFzHBDrrKb3nfLt%3D2Y5hygheS%3DWHzjKpJ4bwvdz5sydxa_qmQ%40mail.gmail.com.

Andrew Engel

unread,
May 23, 2025, 6:06:33 PM5/23/25
to John Moustakas, Dustin Lang, DECam Legacy Survey
Thanks John + Dustin--

So to circle back on this thread for anybody who might find it useful in the future. I was able to get throughputs of 1000s of images / sec simply by utilizing a slurm array where each member of the array received a chunk of the table that is input to imagine/many-cutouts.py. I had merged my location table with the survey table of brick-ids, then sorted my tables on brick-ids so that roughly each process would be running as much as possible on their own bricks and accessing objects on the same bricks sequentially. I modified a local version of many-cutouts.py to look for command line arguments of rank and worldsize to perform this chunking, and overlay my local version on top of the path where the true many-cutouts.py exist in the image.  

I run 32 processes in parallel on a server with 92 cores; each on only a single core. I assume memory per core depends on server locally, so mileage will vary.

Re. make available, happy to-- my goal is to create an updated image dataset that is cross-matched on spectroscopic features to submit to the MMU: https://github.com/MultimodalUniverse/MultimodalUniverse. I'll come back around when its uploaded to hugging face datasets to share.

Reply all
Reply to author
Forward
0 new messages