Limit on Fido data retrievals? 413 Request Entity too large

Tom Bridgman

unread,

Jan 13, 2022, 7:51:04 AM1/13/22

to SunPy

Doing helio visualization support, I often need to generate movies from large runs of data such as SDO. One of the longest runs is 10 year of 171AIA data sampled every hour. https://svs.gsfc.nasa.gov/4776

There are also various artistic groups interested in generating products that require a large amount of SDO data.

I'm trying to rebuild my movie generation pipelines over from SolarSoft to SunPy and have a similar test case. When trying to retrieve a 7-day run of data at 2 minute sampling (5040 files), Fido.search returned a list of found files, but Fido.fetch failed with a '413 Request Entity too large' message.

I finally got Fido to accept the request when I reduced it to 4 days.

Is there a limit on these data requests?

Is there a community-friendly way to make such large data requests?

Thanks,

Tom

Nabil Freij

unread,

Jan 13, 2022, 1:43:25 PM1/13/22

to SunPy

Hello,

There is no limit within the sunpy code that I am aware of.

So any errors like that would have come from the server giving you the data.

It is possible that the parallel downloader is asking too much of the server, can you try passing in max_conn=1 as a keyword to fetch? It will slow the download but it might complete.

I assume this is a VSO request, could you provide the snippet of code doing the search?

If a download fails, you should be able to call Fido.fetch again and it will skip over any files that it has already acquired (it only does a filename check).

If you need that much data, I wonder if using the JSOC and doing a tar export (https://docs.sunpy.org/en/stable/api/sunpy.net.jsoc.JSOCClient.html) is the best way as they will archive all the data into one bulk download instead of downloading thousand of files.

Cheers,

Nabil

Tom Bridgman

unread,

Jan 14, 2022, 7:26:56 AM1/14/22

to SunPy

Some earlier testing was getting 7+ days in a request. I wonder if someone didn't like me beating so consistently on their server... :^)

Already use max_conn=1 to limit that conflict.

I may explore the JSOC option. I think I last used it under SolarSoft many years ago..

Thanks,

Tom

        remoteFilesFound = Fido.search(timeRange, instrument, wavelengthWindow, timeSample)
        print(remoteFilesFound)
        print(len(glob.glob(os.path.join(fitsL1DataBasePath,'*.fits'))),' files in '+fitsL1DataBasePath)
        input("Ready for data retrieval.  Press [Enter] to continue, [ctrl-c] to exit...")

        # Download data
        if os.path.exists(fitsL1DataBasePath):
            # retrieve files, skipping those already on disk.
            print("Retrieving datasets...")
            resultRecord=Fido.fetch(remoteFilesFound, path=os.path.join(fitsL1DataBasePath,'{file}'),
                        max_conn=1, overwrite=False)

            if options.retry:
                # retry based on the error list here.
                if len(resultRecord.errors)>0:
                    # take a second crack at data retrieval
                    print("%d retrieval errors.  Retry retrieving datasets..."%len(resultRecord.errors))
                    resultRecord=Fido.fetch(resultRecord, path=os.path.join(fitsL1DataBasePath,'{file}'),
                            max_conn=1, overwrite=True)

Reply all

Reply to author

Forward