JSOC IRIS LEVEL1 query

51 views
Skip to first unread message

jakub....@cfa.harvard.edu

unread,
Feb 13, 2017, 11:25:47 AM2/13/17
to SunPy
When I query iris.lev1 data on JSOC using sunpy (query below), sunpy does not return any fits files. The response returns the expected file list, so the query knows the files exist. However, when I try to download the IRIS files no files are returned and the emailed url contains no data. The JSOC query returns the proper files when I use hmi or aia queries. It would be appreciated if anyone could shed some light on why I am encountering this error. I am using sunpy 0.7.6.  



>>> from sunpy.net import jsoc
>>> client = jsoc.JSOCClient()
>>> response = client.query(jsoc.Time('2017-02-09T00:00:00', '2017-02-09T01:00:00'),jsoc.Series('iris.lev1'), jsoc.Notify("em...@email.org")) #I use my real address
>>> print response

DATE TELESCOP INSTRUME T_OBS WAVELNTH WAVEUNIT
-------------------- -------- -------- ----------------------- --------------- ---------------
2017-02-12T00:00:39Z IRIS FUV 2017-02-08T23:59:40.49Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:41Z IRIS SJI 2017-02-08T23:59:41.79Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:43Z IRIS NUV 2017-02-08T23:59:42.84Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:44Z IRIS FUV 2017-02-08T23:59:43.76Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:46Z IRIS SJI 2017-02-08T23:59:45.76Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:48Z IRIS NUV 2017-02-08T23:59:47.88Z Invalid KeyLink Invalid KeyLink
2017-02-12T00:00:50Z IRIS FUV 2017-02-08T23:59:49.95Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:40Z IRIS SJI 2017-02-09T00:00:21.13Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:43Z IRIS NUV 2017-02-09T00:00:52.20Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:45Z IRIS FUV 2017-02-09T00:01:23.31Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:47Z IRIS SJI 2017-02-09T00:01:24.59Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:49Z IRIS NUV 2017-02-09T00:01:25.86Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:51Z IRIS FUV 2017-02-09T00:01:26.98Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:52Z IRIS SJI 2017-02-09T00:01:28.74Z Invalid KeyLink Invalid KeyLink
2017-02-12T14:04:54Z IRIS NUV 2017-02-09T00:01:30.67Z Invalid KeyLink Invalid KeyLink

>>> res = client.get(response)
Request JSOC_20170213_366_X_IN was exported at 2017.02.13_15:44:21_UT and is ready to download.
1 URLs found for download. Totalling 0MB


Jakub Prchlik

Joe Hourcle

unread,
Feb 13, 2017, 12:35:22 PM2/13/17
to su...@googlegroups.com, jakub....@cfa.harvard.edu

> On Feb 13, 2017, at 11:25 AM, jakub....@cfa.harvard.edu wrote:
>
> When I query iris.lev1 data on JSOC using sunpy (query below), sunpy does not return any fits files. The response returns the expected file list, so the query knows the files exist. However, when I try to download the IRIS files no files are returned and the emailed url contains no data. The JSOC query returns the proper files when I use hmi or aia queries. It would be appreciated if anyone could shed some light on why I am encountering this error. I am using sunpy 0.7.6.


I was under the impression that the JSOC was being used for the long-term archiving of IRIS lev0 data, with the lev1 data being indexed in the HEK, and served directly from LMSAL. (this is how the VSO handles it).

Unfortunately, I only have access to search the ‘DRMS’ database at the JSOC, not the ‘SUMS’ database, so I can’t look to see if the JSOC is archiving that series or not.

(DRMS -> tracks the info to make the FITS headers; SUMS -> tracks where the data is actually stored (online, nearline (tape), offline (tapes not in the robot), or isn’t tracked (i.e., not stored at all))

-Joe

Kolja Glogowski

unread,
Feb 13, 2017, 1:07:09 PM2/13/17
to SunPy
Hi Jakub,

you could try to use the drms Python module:


After installing the package using pip, you should be able to download the files using the following commands:

  >>> import drms
  >>> c = drms.Client(email='na...@example.com', verbose=True)
  >>> r = c.export('iris.lev1[2017-02-09T00:00:00Z-2017-02-09T01:00:00Z]')
  >>> r.download('.')
  Downloading file 1 of 240...
      record: iris.lev1[2017-02-09T00:00:21.13Z][29247043]{image_lev1}
    filename: image_lev1.fits
    -> "iris.lev1.20170209T00002113Z.29247043.image_lev1.fits"
  [...]

For other data series, you might need to add "protocol='fits'" to the export command in order to get FITS files with header keywords. For IRIS data this does not seem to be necessary, as the FITS files apparently already contain all the header entries, when exported using the 'as-is' protocol (which is the default for drms.Client.export()).

NB: When using the 'url_quick'/'as-is' method/protocol (the default), no export URL will be created. If you need this URL, you can specify method='url' when calling drms.Client.export(). The protocol='fits' option implies method='url', but creates a server-side copy of all the requested files, that includes all the FITS header keywords, which is not necessary for IRIS data.

So if you just want a request URL, you can use the following command:

  >>> r = c.export('iris.lev1[2017-02-09T00:00:00Z-2017-02-09T01:00:00Z]',
  ...              method='url')
  >>> print(r.request_url)
  Export request pending. [id="JSOC_20170213_416_X_IN", status=2]
  Waiting for 0 seconds...


If you want to learn more about the drms Python module, you can check out the tutorial:


and have a look at the following HMI Science Nugget:



Cheers,
Kolja

Joe Hourcle

unread,
Feb 13, 2017, 1:16:06 PM2/13/17
to su...@googlegroups.com, Kolja Glogowski

> On Feb 13, 2017, at 1:07 PM, Kolja Glogowski <kol...@gmail.com> wrote:
>
> Hi Jakub,
>
> you could try to use the drms Python module:
>
> https://pypi.python.org/pypi/drms

...

> For other data series, you might need to add "protocol='fits'" to the export command in order to get FITS files with header keywords. For IRIS data this does not seem to be necessary, as the FITS files apparently already contain all the header entries, when exported using the 'as-is' protocol (which is the default for drms.Client.export()).


I would still recommend using the ‘fits’ protocol.

LMSAL generates the files with the scientific headers at the time. If the files are changed such that the headers are changed but the data does not, the files will *not* be updated.

I haven’t done a review of the IRIS data, but for AIA, there’s a 100% chance of the headers having changed for the first year or two of data, but it then tapers off to roughly a 1% chance near the present. (I admit, I ran this analysis last year, so it’s possible it’s changed)

If you generate files ‘as-is’, you won’t get the updated headers, which could possibly lead to mistakes in the analysis.

-Joe

Kolja Glogowski

unread,
Feb 15, 2017, 12:29:08 AM2/15/17
to SunPy, kol...@gmail.com
Hi Joe,

thanks for your comment concerning the FITS headers.

I actually compared the original ('as-is') header from one of the files in Jakub's example with the header of the corresponding file that was obtained using the protocol='fits' option. Both headers were identical in this case, except for CHECKSUM, which probably differed because the date in the comment belonging to the DATASUM keyword had changed during the export process.

You are right that the keyword entries in the JSOC database can change, and that header entries in the original FITS files might be outdated. But the underlying problem (the metadata can be updated independently from the FITS files), is not resolved by simply using the protocol='fits' export option. For example the FITS headers of files that I download today could be already out of date by tomorrow, and AIA files that were downloaded a few years ago, most certainly contain obsolete metadata.

I think the best way to handle this, is to just ignore the metadata from the FITS headers and instead query/store the metadata separately. At KIS I usually don't have to deal with this, because we have our own DRMS server, and all the metadata for data series from JSOC are replicated automatically. But there are exceptions, where I need only a few files from JSOC. In this case I just download the 'as-is' files and use a separate DRMS query for the same dataset to obtain the corresponding metadata which I then usually store as a HDF5 file. I also make sure to store the record number and storage unit number for each file (they can be obtained using the special keywords *recnum* and *sunum*). This way I can always check, if the metadata was updated (recnum changed), or the actual file was replaced by a new one (sunum changed).

Cheers,
Kolja


PS.: The following is just a short explanation on how the JSOC export system works. I find it quite hard to get information about it, so maybe this is helpful to somebody.

FITS files from the HMI instrument are usually stored at JSOC without any header keywords (metadata), while FITS files from other instruments (like AIA or IRIS) may have metadata included in the FITS files from the time when they were imported into the JSOC system. After importing the files, the corresponding metadata are managed by the DRMS and stored in a dedicated database. If only the metadata of a record need to be changed, the DRMS database entries get updated, while the FITS headers of the files stored at JSOC will not be altered.

When an export request is submitted using the protocol='fits' option, the system at JSOC creates new files for this particular request, by using the (image) data from the original FITS files and generating a new FITS header from the current metadata in the DRMS database. This can take some time and might put the JSOC servers under considerable load for large export requests.

The server load (and the wait time) can be significantly reduced using protocol='as-is'. In this case the system does not create new files, but instead returns links to the original (unaltered) FITS files which have no (scientific) metadata stored in their headers (HMI), or may contain possibly outdated metadata (AIA or IRIS). The most recent metadata can be obtained independently, by directly querying the DRMS (for example by calling drms.Client.query() from the drms Python module or by directly using the HTTP/JSON interface from JSOC). If the method='url' option is used, the system also creates a plain-text file containing a table of the most recent metadata for each requested record, which can be easily parsed (for example using pandas.read_table()), and used instead of the (possibly obsolete) FITS header entries.

Joe Hourcle

unread,
Feb 15, 2017, 10:58:26 AM2/15/17
to su...@googlegroups.com

blah … forgot to change my ‘from’ address when sending.



> Begin forwarded message:
>
> From: Joe Hourcle <one...@dcr.net>
> Subject: Re: {SunPy} Re: JSOC IRIS LEVEL1 query
> Date: February 15, 2017 at 10:57:41 AM EST
> To: su...@googlegroups.com
> Cc: Kolja Glogowski <kol...@gmail.com>
>
>
> I recommend that people who haven’t had to manage a DRMS/SUMS instance to not read this. We’ve had a rather high rate of burnout from people who have tried to do it. If you *really* want to go this route, let me know and I’ll see if I can get Niles to set you up with an account on the NSO jabber server … we have a support group on there.
>
>
>> On Feb 15, 2017, at 12:29 AM, Kolja Glogowski <kol...@gmail.com> wrote:
>>
>> Hi Joe,
>>
>> thanks for your comment concerning the FITS headers.
>>
>> I actually compared the original ('as-is') header from one of the files in Jakub's example with the header of the corresponding file that was obtained using the protocol='fits' option. Both headers were identical in this case, except for CHECKSUM, which probably differed because the date in the comment belonging to the DATASUM keyword had changed during the export process.
>
> Yeah … I’m not sure if that’s the best way to do things. (I’m also of the opinion that there should be some information about the version of DRMS that generated the file, so if we discover in the future that NetDRMS 7 was generating improper FITS files, there’d be a way to know that a given file was generated with it, and it needed to be re-generated.
>
> (I think it was 2.7 that fixed a problem with something in HMI … but 7 didn’t insert the DATASUM and CHECKSUM values back into the headers when exporting … which you’ll see if you download data from SDAC, as the attempt to replace it has dragged on for about 2.5 years or so.
>
>
>> You are right that the keyword entries in the JSOC database can change, and that header entries in the original FITS files might be outdated. But the underlying problem (the metadata can be updated independently from the FITS files), is not resolved by simply using the protocol='fits' export option. For example the FITS headers of files that I download today could be already out of date by tomorrow, and AIA files that were downloaded a few years ago, most certainly contain obsolete metadata.
>>
>> I think the best way to handle this, is to just ignore the metadata from the FITS headers and instead query/store the metadata separately. At KIS I usually don't have to deal with this, because we have our own DRMS server, and all the metadata for data series from JSOC are replicated automatically. But there are exceptions, where I need only a few files from JSOC. In this case I just download the 'as-is'files and use a separate DRMS query for the same dataset to obtain the corresponding metadata which I then usually store as a HDF5 file. I also make sure to store the record number and storage unit number for each file (they can be obtained using the special keywords *recnum* and *sunum*). This way I can always check, if the metadata was updated (recnum changed), or the actual file was replaced by a new one (sunum changed).
>
> sunum will give you something that you can check if the data’s changed … but for HMI data (except hmi.S_720s), if you want an actual index to the data you’ll also need ‘slotnum’, which will let you find the underlying file stored in SUMS.
>
> recnum is the prime key for the table of metadata in DRMS, which stores the headers. So if that’s changed, the metadata has changed.
>
> What I had asked for before launch was for the system to insert identifiers for different concepts, so that a researcher had a way to ask DRMS if there had been any updates. The JSOC refused, and said their system was done, and they’re not going to make any changes. (then a couple of months later, changed how the replication was going to be handled, and brought in a contractor to re-write the whole thing and took out all of the security provisions that I had requested (representing the SDAC & VSO).
>
> For the last few years, I’ve been trying to get people interested in two things :
>
> 1. A way to pass FITS headers with a link to the data rather than the actual data attached. So we could pass updates to the FITS files cleanly, without people having to re-download the file. (as best I know, to export from DRMS to get the updated headers, the data must be in the local SUMS. VOTable allows for this, but FITS does not. When I asked how to handle this on the FITSBITS mailing list, they dismissed it, saying it should never be done.**
>
> 2. A system that would allow for researchers to easily ask if data had been updated … so as you’re about to submit a paper, you can check to make sure nothing’s been changed at the archive that would affect your conclusion. I got some interest from the SolarNet folks, but not too many other people. As we haven’t been able to set a standard for identifiers within the FITS files (and trying to get PIs to change their pipeline is near impossible … even when they’re processing for the ‘Final Archive’ to be sent to NASA, before actually checking with the SDAC to see what we want … or even when it’s still before launch), I think the best approach is to write a program that researchers could run against a directory or list of FITS files, and it would analyze them to figure out what dataset they are, then use knowledge of the datasets to determine unique identifiers and edition, and send off a query to the appropriate place to check if it’s up-to-date. ***
>
>
>
> ** After that incident, I did some testing and discovered that some of the IDL routines would read a file truncated after the header if you didn’t tell it to read the data. I haven’t tested it in PDL, PyFITS or IRAF. And I realized that it wouldn’t be the cleanest way to handle things, but I could claim a new compression scheme (“URL”?) so there was notice about what I did …. but then you have to deal with all of the re-mapped ‘Z’ headers.
>
> *** Mind you, this still requires having a way to search for the records … which we can do for many of the sets served by VSO … but all we can do is tell the person to download the file again, we can’t just give them the updates.
>
>
>> PS.: The following is just a short explanation on how the JSOC export system works. I find it quite hard to get information about it, so maybe this is helpful to somebody.
>
> I’m serious. People should stop reading now.
>
>
>> FITS files from the HMI instrument are usually stored at JSOC without any header keywords (metadata), while FITS files from other instruments (like AIA or IRIS) may have metadata included in the FITS files from the time when they were imported into the JSOC system. After importing the files, the corresponding metadata are managed by the DRMS and stored in a dedicated database. If only the metadata of a record need to be changed, the DRMS database entries get updated, while the FITS headers of the files stored at JSOC will not be altered.
>>
>> When an export request is submitted using the protocol='fits' option, the system at JSOC creates new files for this particular request, by using the (image) data from the original FITS files and generating a new FITS header from the current metadata in the DRMS database. This can take some time and might put the JSOC servers under considerable load for large export requests.
>
> This is why the VSO has the network of caching servers. Unless otherwise specified with the ‘site’ keyword when retrieving data, HMI data requests go through NSO, AIA requests go through SDAC. There used to be additional public sites at SAO/CfA, UCLan, ROB and a few others.
>
> IRIS is served from LMSAL if you go through the VSO, not the copy in DRMS.
>
> The problem comes when you start requesting HMI data that’s not at the site you’re downloading from. The issue is that HMI data is stored as multiple files per storage unit (sunum). This is why you need the ‘slotnum’ to determine the sub-directory. The JSOC insists that the sunum is atomic, and said that if we forked SUMS, they wouldn’t support any of the data that was downloaded through our sites.
>
> So, if we don’t have the file you wanted, we have to download a larger block of data. I believe it’s wither 16 or 32 files to download from the JSOC even when we only want one image. (like for the person who just went and downloaded a few different HMI series at a 12hr cadence for multiple years)
>
> We *did* manage to get the AIA data broken down into individual observations per storage unit, because they had planned on packing 8 different wavelengths per unit. (which unfortunately may have resulted in some of the problems w/ DRMS & SUMS, as I don’t believe they were ever tested at the size of tables that this would end up generating.
>
>
>> The server load (and the wait time) can be significantly reduced using protocol='as-is'. In this case the system does not create new files, but instead returns links to the original (unaltered) FITS files which have no (scientific) metadata stored in their headers (HMI), or may contain possibly outdated metadata (AIA or IRIS). The most recent metadata can be obtained independently, by directly querying the DRMS (for example by calling drms.Client.query() from the drms Python module or by directly using the HTTP/JSON interface from JSOC). If the method='url' option is used, the system also creates a plain-text file containing a table of the most recent metadata for each requested record, which can be easily parsed (for example using pandas.read_table()), and used instead of the (possibly obsolete) FITS header entries.
>
> hmm … interesting … I hadn’t thought of that. You’re likely piggybacking off one one of the APIs that the JSOC ‘lookdata’ webpage uses. I’d still want to get it into a more universal form so that people could use the typical tools on.
>
> I was actually hoping to make an OO version of the IDL VSO client … so you’d get a list of objects as the results from vso_search, and you could call general ‘get’, ‘read’, ‘prep’, ‘plot’, etc. on them, rather than having to use different functions depending on which instrument that data came from. … yet another thing that never got done.
>
> -Joe
>
>
>
>

jakub....@cfa.harvard.edu

unread,
Feb 17, 2017, 12:11:51 PM2/17/17
to SunPy
Kolja,

Thank you for recommending the drms module. It works perfectly for what I need.

Jakub Prchlik
Reply all
Reply to author
Forward
0 new messages