VSO Client skip already downloaded data

Joe Llama

unread,

Mar 25, 2016, 8:19:48 AM3/25/16

to SunPy

Hi all,

I'm writing a code that uses about 5 hours worth of AIA data. I want to be able to change my 5 hour window and re-query VSO and then if I haven't already downloaded the data for this time frame download the data. Is there a flag to set in the command res=client.get(qr) to skip re-downloading data? At the moment it seems to just overwrite the files. I can't figure out from the qr how to get the filenames, because then I could just do a search for existing files and exclude them.

Cheers,

Joe

Joe Llama

unread,

Jan 8, 2017, 6:08:31 PM1/8/17

to SunPy

Just wondering if anyone could help me out with this, I never managed to find a solution.

Thanks!

Stuart Mumford

unread,

Jan 9, 2017, 8:15:52 AM1/9/17

to su...@googlegroups.com

Hi Joe,

We have been working on this over the summer. It's not finished yet but you could test it out for us!

The approach we have taken is to maintain a local database of all the files (and records) downloaded from the VSO, and then compare the results of future searches to the contents of that database. This means you use the `sunpy.database` module to search the VSO rather than the usual client.

The caching code is up for review here: https://github.com/sunpy/sunpy/pull/1785

Let me know if you want to test it and need a hand with installing SunPy from git.

Stuart

--
You received this message because you are subscribed to the Google Groups "SunPy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sunpy+un...@googlegroups.com.
To post to this group, send email to su...@googlegroups.com.
Visit this group at https://groups.google.com/group/sunpy.
For more options, visit https://groups.google.com/d/optout.

Joe Hourcle

unread,

Jan 9, 2017, 9:18:28 PM1/9/17

to su...@googlegroups.com, Joe Llama

Blah. I had the wrong ‘From’ address on this, so Google Groups bounced it.

-Joe

> Begin forwarded message:
>
> From: Joe Hourcle <one...@dcr.net>
> Subject: Re: {SunPy} Re: VSO Client skip already downloaded data
> Date: January 9, 2017 at 9:07:25 AM EST
> To: su...@googlegroups.com

>
>
>> On Jan 8, 2017, at 6:08 PM, Joe Llama <joe....@lowell.edu> wrote:
>>
>> Just wondering if anyone could help me out with this, I never managed to find a solution.
>

> I’m not sure how SunPy handles downloading from the VSO, but on the VSO side, it’s really, really messy.
>
> The archive at Stanford stores the data without scientific headers, so a process has to be run to add them. That process also sets the timestamp for the image, which affects what the file gets named.
>
> So you can’t do the normal HTTP process of passing back the Last-Modified header, so you can ask for only updates since that time — not only because you don’t know what the filename is for what you’re about to be requesting, but because process that we pass off to isn’t smart enough to check that. (it wasn’t written to be a CGI).
>
> The only way to get around it is what the IDL client is doing — send a HEAD first, which will get you the file size and filename. (it will *not* have an accurate Last-Modified time, however) … and then compare those.
>
> Of course, if only the header values are modified, or you’re asking for uncompressed data and the data has changed … the file size stays the same.
>
> (and the web server actually has to run the full processing to determine file size, so it can beat down the server if you’re asking for tarballs).
>
> Hopefully in the next month or two, this won’t be a problem for *most* of the AIA data. The SDAC has gotten some additional storage, and I’ve begun generating static files for AIA. (I just haven’t yet set up something to ‘watch’ for changes, and copying from Stanford is not that fast, so it’s going to take me a while to get back to the beginning of the mission).
>
> -Joe

Stuart Mumford

unread,

Jan 10, 2017, 7:39:52 AM1/10/17

to su...@googlegroups.com

Hi Joe(2),

I should point out that the mechanism I was talking about assumes the
data in a VSO record never changes. As and when the VSO gets the ability
to tell us if something changes we should add it into this feature.
Perhaps you can get me on IRC sometime and we can talk about what would
be needed to implement it?

Stuart

Reply all

Reply to author

Forward