Comparing Fido.search results with existing downloaded files?

77 views
Skip to first unread message

Tom Bridgman

unread,
Oct 25, 2021, 3:55:58 PM10/25/21
to SunPy
I need to download a large number of SDO files from the VSO.  I already have a existing local archive that contains a significant fraction of the files I want to process.  Is there a way to compare the structure returned from Fido.search with FITS file names in a directory?

remoteFilesFound = Fido.search(timeRange, instrument, wavelengthWindow, timeSample)

The return structure of Fido.search is a collection of tables and I can examine.  I can get the metadata of one file in the first table, say
print(remoteFilesFound.tables[0][0])
and I can get a file identifier from that:
print(remoteFilesFound.tables[0][0]['fileid'])
aia__lev1:171:1350432047

But is there a way to map this back to the file NAME to determine if I've already downloaded it?

I'm looking for something more flexible than the error record returned by Fido.fetch.

SolarSoft had a way of skipping files that already existed in the output directory.  I'm looking for a way to avoid beating on the archives too heavily.

Hopefully I'm missing something trivial...

Thanks,
Tom

DavidPS

unread,
Oct 26, 2021, 6:51:28 AM10/26/21
to SunPy
Maybe, the "easiest" way is using the database package - there you can find instructions about how to add a full directory of fits files.
Comparing the search result from Fido with the database content should be possible. The database should work with VSO directly, but I don't know at the moment the state with Fido directly. Anyway, both return Tables, so in principle you should be able to do a setdiff from the tables.

I think this would be a great example in our gallery!!

David

Tom Bridgman

unread,
Oct 26, 2021, 8:14:46 AM10/26/21
to SunPy
Thanks.  I've experimented with the other JSOC interface (don't really understand the difference between them).  Used to checking the examples but forget to check the User Guide.

I am trying to write something that might be incorporated into SunPy so that may happen.

Tom

Stuart Mumford

unread,
Oct 27, 2021, 6:26:05 AM10/27/21
to su...@googlegroups.com
Hi Tom,

Doing this manually is a little tricky because of the way VSO works, you don't find out the filename until you request the files for download which happens inside Fido.fetch.

Fido.fetch by default will not transfer a file if the path it is trying to download to already exists. So if you construct the correct path= argument to Fido.fetch it should "automatically" skip the files which already exist.

Hope that helps,
Stuart
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Tom Bridgman

unread,
Oct 27, 2021, 8:11:31 AM10/27/21
to SunPy
Stuart,
Interesting.  I don't recall knowing that it would skip if the file already downloaded. 
I'll have to look more closely at the Fido.fetch return record to examine how it records skipped files, and perhaps files that might be partially downloaded.

Thanks,
Tom

William Bridgman

unread,
Oct 27, 2021, 10:24:25 AM10/27/21
to su...@googlegroups.com
Okay, I've found the overwrite flag in Fido.fetch. And I was able to
burrow down into the levels of abstractions to find some of the
variables I sought in the fetch return records.

The bottom line is there appears to be no *reliable* way to compare
the actual files sitting in a local directory with the records
returned by Fido.search(). The only real option for gap-filling
appears to be repeatedly beating on the remote server. Seems rather
uncivilized considering I'm often trying to retrieve 10,000+ files for
any given project. :^)

I can imagine constructing an hypothesized file name from the metadata
returned in the search table to do the comparison, but I do know that
a VSO search by IDL vs SunPy returns different file names for the same
data, so there is no guarantee that will work reliably for any given
server.

Thanks for your assistance. At least I'm in a little better place
than I was before!

Dr. William T."Tom" Bridgman
Scientific Visualization Studio
NASA/Goddard Space Flight Center
Greenbelt, MD 20771
william.t...@nasa.gov
http://svs.gsfc.nasa.gov
Alternate: wtb...@gmail.com
> --
> You received this message because you are subscribed to the Google Groups "SunPy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sunpy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sunpy/7e634ece-7486-4b7c-b64e-9ded2192fe31n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages