JSOC HMI SHARP query

409 views
Skip to first unread message

Jens Pomoell

unread,
Apr 27, 2015, 1:38:05 PM4/27/15
to su...@googlegroups.com
Hi,

  I'm trying to query HMI SHARP data


  (e.g. the hmi.sharp_720s series) data using net.jsoc.JSOCClient(), but I have not found a way to accomplish this. The provided example for the hmi.m_45s

   client = jsoc.JSOCClient()
   response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.m_45s'))
   print response

   seems to return a set of records as expected (although I quite don't understand the time of the records that I am getting, they seem to consistently be different than requested):

        DATE         TELESCOP  INSTRUME  ... WAVELNTH     WAVEUNIT   
-------------------- -------- ---------- ... -------- ---------------
2014-01-05T17:44:53Z  SDO/HMI HMI_FRONT2 ...   6173.0 Invalid KeyLink
2014-01-05T17:46:02Z  SDO/HMI HMI_FRONT2 ...   6173.0 Invalid KeyLink


    For SHARP data, however, one needs to specify the HARP number when using the JSOC web export. It is not clear to me how to accomplish this with JSOCClient. For instance running

   response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s'))

   produces and error:  

     JSONDecodeError: Extra data: line 2 column 1 - line 5 column 1 (char 53 - 471)


   How should one request SHARP data? I'm using version 0.6.dev5642

   Best regards,

      Jens

DVD PS

unread,
Apr 27, 2015, 2:55:08 PM4/27/15
to sunpy
Hi Jens,


 
  I'm trying to query HMI SHARP data


  (e.g. the hmi.sharp_720s series) data using net.jsoc.JSOCClient(), but I have not found a way to accomplish this. The provided example for the hmi.m_45s

   client = jsoc.JSOCClient()
   response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.m_45s'))
   print response

   seems to return a set of records as expected (although I quite don't understand the time of the records that I am getting, they seem to consistently be different than requested):

        DATE         TELESCOP  INSTRUME  ... WAVELNTH     WAVEUNIT   
-------------------- -------- ---------- ... -------- ---------------
2014-01-05T17:44:53Z  SDO/HMI HMI_FRONT2 ...   6173.0 Invalid KeyLink
2014-01-05T17:46:02Z  SDO/HMI HMI_FRONT2 ...   6173.0 Invalid KeyLink

the restults are right, however you don't want to look at the 'DATE' field - which I don't know what it means but to the T_OBS one.
If you do:
print(response.table['T_OBS'])
         T_OBS        
-----------------------
2013.12.31_23:59:52_TAI
2014.01.01_00:00:37_TAI
2014.01.01_00:01:22_TAI
....

shows what you expect for that query  Probably we should on SunPy show T_obs first instead of date...  
However, be aware of this:
    DATE__OBS     (time)    [DATE-OBS] DATE_OBS = T_OBS - EXPTIME/2.0
    T_OBS         (time)    nominal time

 
    For SHARP data, however, one needs to specify the HARP number when using the JSOC web export. It is not clear to me how to accomplish this with JSOCClient. For instance running

   response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s'))
 
   How should one request SHARP data? I'm using version 0.6.dev5642

I don't think at the moment we can actually do it.. but it shouldn't be too difficult to implement... but how do we know such number before hand?...

David

DVD PS

unread,
Apr 27, 2015, 3:49:25 PM4/27/15
to sunpy
    For SHARP data, however, one needs to specify the HARP number when using the JSOC web export. It is not clear to me how to accomplish this with JSOCClient. For instance running

   response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s'))
 

   How should one request SHARP data? I'm using version 0.6.dev5642

I don't think at the moment we can actually do it.. but it shouldn't be too difficult to implement... but how do we know such number before hand?...

Ok, I've managed to make the right query with requests and asking for a particular NOAA AR:

import requests
JSOC_INFO_URL = 'http://jsoc.stanford.edu/cgi-bin/ajax/jsoc_info'
r = requests.get(JSOC_INFO_URL, params={'seg': '**NONE**', 'link': '**NONE**', 'ds': 'hmi.sharp_720s[][2014.01.01_00:00:00_TAI-2014.01.01_01:00:00_TAI][? NOAA_ARS ~ "11936" ?]', 'key': 'DATE,TELESCOP,INSTRUME,T_OBS,WAVELNTH,WAVEUNIT', 'op': 'rs_list'})
r.json()

and you get the records for that active region.  Now, due to JSOC changing their query formats on the ds field depending of the series that you are requesting we need to adapt the way it's build based on what you query... 

anyone up to decipher jsoc formats for all the data products???

David

Jens Pomoell

unread,
Apr 27, 2015, 4:26:16 PM4/27/15
to su...@googlegroups.com
Hi David,




the restults are right, however you don't want to look at the 'DATE' field - which I don't know what it means but to the T_OBS one.


Thanks, I had the feeling I was missing something. 

Jens Pomoell

unread,
Apr 27, 2015, 4:27:21 PM4/27/15
to su...@googlegroups.com
Thats great, thanks!

 

Philip Scherrer

unread,
Apr 28, 2015, 1:52:35 PM4/28/15
to su...@googlegroups.com
First, note that there is no WAVEUNIT keyword in teh SHARP series, that is the reason for the "Invalid Keylink" message.
The times will always be about 36 seconds from what you ask if you specify times in UTC.  All HMI data is observed and computed to
TAI time slots, so just append "_TAI" to the time you request and the result will match.  But that does not explain several hours off of course.
If you send me the requestid I can track down the strange response.

The HARP number is a more difficult problem,  The _make_query_payload function concatenates the series name and a prime key clause configured as
a time range but does not allow other prime key or query clauses to be included.  But it may work to append the other clauses as needed to
the seriesname.  I have not looked deep enough to see it the seriesname is used in other contexts other than building the query.
If only fo the query, then note that a '[...]' prime key clause can have the form [<keyname>=<value>].  If the keyname is omitted then the
oreder of the clauses is used to determine which keyname is implied.  Checkout lookdata.html and note that on the tab after the seriesname is given
it will show the default order of prime key clauses.  The sharp series has HARPNUM as the first primekey so the query must be like:
hmi.sharp_720s[12345][2014-01-01T00:00:00_TAI-2014-01-01T01:00:00_TAI]
But _make_query_payload only knows how to put the time clause in the query.  So I suggest trying
  series = "hmi.sharp_720s[12345]" and time range as is normal, that "should" result in the proper query to the JSOC system.
If you need other clauses you should also be able to append them to the seriesname variable, e.g.
  series = 'hmi.sharp_720s[? NOAA_AR=11949 ?][]" to get the data for NOAA region 11949 which was on the disk in Jan 2014.
The empty clause '[]' is a place marker for the HARPNUM clause since sunpy will append the time clause after the series string.

Play with lookdata.html to make queries and compare the result with the sunpy result.  Any keyword can be used in a query clause.
If not a primekey then name needs to be present in a '[?...?]' clause.  the name can be present in a prime key clause too.

The funny times may be due to the time clause being placed in the HARPNUM place.

Philip Scherrer

unread,
Apr 28, 2015, 1:54:38 PM4/28/15
to su...@googlegroups.com
The DATE keyword is, as per FITS standard, the time the file is made, not the time the data is observed.
So it represents the processing time.


On Monday, April 27, 2015 at 10:38:05 AM UTC-7, Jens Pomoell wrote:

DVD PS

unread,
Apr 28, 2015, 3:13:02 PM4/28/15
to sunpy
Thanks Philip!! that's very helpful!!!  - and really smart!
 
The HARP number is a more difficult problem,  The _make_query_payload function concatenates the series name and a prime key clause configured as
a time range but does not allow other prime key or query clauses to be included.  But it may work to append the other clauses as needed to
the seriesname.  I have not looked deep enough to see it the seriesname is used in other contexts other than building the query.
If only fo the query, then note that a '[...]' prime key clause can have the form [<keyname>=<value>].  If the keyname is omitted then the
oreder of the clauses is used to determine which keyname is implied.  Checkout lookdata.html and note that on the tab after the seriesname is given
it will show the default order of prime key clauses. 

Good to know we can use key, value pairs for the query.  But if used then all have to be provide, so, what I mean is,
I cannot do:
hmi.sharp_720s[2014.01.01_00:00:00_TAI-2014.01.01_01:00:00_TAI][? NOAA_ARS ~ "11936" ?][HARPNUM=3535]'

but I can do:
hmi.sharp_720s[T_REC=2014.01.01_00:00:00_TAI-2014.01.01_01:00:00_TAI][? NOAA_ARS ~ "11936" ?][HARPNUM=3535]'
 
The sharp series has HARPNUM as the first primekey so the query must be like:
hmi.sharp_720s[12345][2014-01-01T00:00:00_TAI-2014-01-01T01:00:00_TAI]
But _make_query_payload only knows how to put the time clause in the query.  So I suggest trying
  series = "hmi.sharp_720s[12345]" and time range as is normal, that "should" result in the proper query to the JSOC system.
If you need other clauses you should also be able to append them to the seriesname variable, e.g.
  series = 'hmi.sharp_720s[? NOAA_AR=11949 ?][]" to get the data for NOAA region 11949 which was on the disk in Jan 2014.
The empty clause '[]' is a place marker for the HARPNUM clause since sunpy will append the time clause after the series string.

Yes, I can confirm both of these example works (I've just used specific numbers):
so, if you know harp number:
response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s[3535]'))

or with the AR as in your second example:

response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s[? NOAA_AR=11936 ?][]'))


or if you want all in that time range:

response = client.query(jsoc.Time('2014-01-01T00:00:00', '2014-01-01T01:00:00'), jsoc.Series('hmi.sharp_720s[]'))


 

Play with lookdata.html to make queries and compare the result with the sunpy result.  Any keyword can be used in a query clause.

If not a primekey then name needs to be present in a '[?...?]' clause.  the name can be present in a prime key clause too.

I would think how to make a better API for being use in SunPy such we don't need to place empty brackets or other confusing ways - or improve the documentation on our side to explain these cases.
 
The funny times may be due to the time clause being placed in the HARPNUM place.

Yes, as you wrote in the next mail the date was not the observing date.

Thanks a lot!!!

David


Phil Scherrer

unread,
Apr 28, 2015, 3:58:09 PM4/28/15
to su...@googlegroups.com
A almost accurate description of DRMS query formats is in http://hmi.stanford.edu/doc/JSOC/DRMS_dataset_names.pdf
The "Catalog" concept was not implemented as described, and the DSDS discussion is now not needed since all
of the old DSDS records have been imported into the JSOC with DRMS style names.

but it is still the only nearly complete discussion.

As with any document, having a description written by a hard-earned experienced non-developer user would help.

It could be that automatically generating a template for the prime keys and allowing an appended extra
set of clauses would allow the generality and power of more complex queries to be made.
Note that lookdata.html does this when a checkbox is filled. So the info needed is in the
jsoc_info op=series_struct json response.

Phil
> --
> You received this message because you are subscribed to a topic in the Google Groups "SunPy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/sunpy/KoB5dVnj-EA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to sunpy+un...@googlegroups.com
> <mailto:sunpy+un...@googlegroups.com>.
> To post to this group, send email to su...@googlegroups.com <mailto:su...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/sunpy.
> For more options, visit https://groups.google.com/d/optout.

DVD PS

unread,
Apr 28, 2015, 6:06:09 PM4/28/15
to sunpy
Jens,  I've made a pull request that solve somehow the problem.
https://github.com/sunpy/sunpy/pull/1400
I still need to write some tests and add more meaningful errors when something is not allowed.  Maybe it will be good to go soon.  For a later future - if I get with some enlightening idea we could have the sql statements too...

Phil,

A almost accurate description of DRMS query formats is in http://hmi.stanford.edu/doc/JSOC/DRMS_dataset_names.pdf
The "Catalog" concept was not implemented as described, and the DSDS discussion is now not needed since all
of the old DSDS records have been imported into the JSOC with DRMS style names.

but it is still the only nearly complete discussion.

That's a nice document! thanks! it helps me to actually understand what's actually happening...
 
As with any document, having a description written by a hard-earned experienced non-developer user would help.

agree :)
 
It could be that automatically generating a template for the prime keys and allowing an appended extra
set of clauses would allow the generality and power of more complex queries to be made.
Note that lookdata.html does this when a checkbox is filled.  So the info needed is in the
jsoc_info op=series_struct json response.

Yes, that would be really useful... I've put it on the PR as a item to do.  It may have been useful to have even a single function which actually find and list all the possible parameters possible.

Thanks Phil!
David




Phil Scherrer

unread,
Apr 28, 2015, 6:48:00 PM4/28/15
to su...@googlegroups.com
David,
In my last-fall start as a JSOC Python interface before I knew that SunPy had one,
I made code that acts like the front end of lookdata, it ended up with a
table with all the keywords, types, etc. including paths sufficient to fetch
the fits files directly. No need to export them since the metadata comes from
the database via jsoc_info anyway. So can bypass the whole fits header part
and bypass jsoc_fetch. I wanted to clean it up and send it in prior to yesterday's
meeting, but proposal writing got in the way.
I was going to use the astropy Table format and have a special type for the segment information.
So far it works to e.g. get the sound travel time through the Sun by doing autocorrelation
of power spectrum of p-modes as an example of what can be done with just the metadata.
I am happy to send it along if there is interest in having a lookdata clone as well as
an exportdata clone in sunpy.
Phil

Stuart Mumford

unread,
Apr 29, 2015, 9:25:53 AM4/29/15
to su...@googlegroups.com
Hi,

Sounds interesting Phil, the version of the JSOC client in the 0.6 dev
branch does talk to lookdata to preview the records a export request
will generate, but that is about as far as I got.

Stuart

Erkka Lumme

unread,
May 19, 2015, 1:46:18 PM5/19/15
to su...@googlegroups.com
Hi,

I'm quite new to SunPy (and to Python in general), so please filter my comments accordingly. I have also wrestled with the JSOCClient and HARP-numbers and I've found a few issues which have not been mentioned here.

1) Adding the HARP-number to the endpart of the series-variable like: series = "hmi.sharp_720s[12345]" results in the correct response of the "_query", but fails in the later phase of the data retrieval when "_get" or "_request_data" + "_get_data" are called. The FITS-files are not downloaded because the filenames are incorrect. When "_make_query_payload" writes the names of the FITS-files to the payload  (lines 510-519 in "jsoc.py") (SORRY ABOUT THE MESSY CODE FORMAT, CODE HIGHLIGHTNING DIDN'T WORK FOR SOME REASON):

# Build full POST payload
        payload = {'ds': dataset,
                   'format': 'json',
                   'method': 'url',
                   'notify': notify,
                   'op': 'exp_request',
                   'process': 'n=0|no_op',
                   'protocol': jprotocol,
                   'requestor': 'none',
                   'filenamefmt': '{0}.{{T_REC:A}}.{{CAMERA}}.{{segment}}'.format(series)}

 the "series"-part of the "filenamefmt" contains also the HARP-number ("hmi.sharp_720s[12345]"). When this is fixed so that only "hmi.sharp_720s" is used, download is successful.

2) The update provided by David (https://github.com/sunpy/sunpy/pull/1400) solves the problem of 1) quite elegantly and is very useful altogether.

However, there seems to be another bug there also. The problem arises when "_query" for some data is run two times. Example:

response1 = client.query(jsoc.Time('2010-05-01T00:20:00', '2010-05-01T00:38:00'), jsoc.Series('hmi.sharp_cea_720s'), jsoc.PrimaryKey('HARPNUM','1'))
response2 = client.query(jsoc.Time('2011-03-10T01:20:00', '2011-03-10T01:28:00'), jsoc.Series('hmi.sharp_cea_720s'), jsoc.PrimaryKey('HARPNUM','401'))

On the first run there's no problem but the second ends up in error. The reason is the wrong name of the PrimaryKey-object. The name is formed in "attrs.py" (lines 86-93):

class PrimaryKey(_VSOSimpleAttr):
    """
    Extra key,values pair produced by the user to query jsoc
    """
    def __init__(self, key, value):
        Attr.__init__(self)
        self.__class__.__name__ += '_{}'.format(key)
        self.value = value

On the first run the key (e.g. "HARPNUM") is added to the class_name resulting in name "PrimaryKey_HARPNUM". On the second run this results in "PrimaryKey_HARPNUM_HARPNUM" which then causes the error. I don't understand why the old name ("PrimaryKey_HARPNUM")  is saved as the class_name to the second run also. For my own using I put a plaster:

self.__class__.__name__ = 'PrimaryKey_{}'.format(key)

which fixes the bug. This might be quite trivial bug, particularly when considering that the update is not finished, but I still decided to mention this here.

Anyway, thanks a lot to David for his update and to other contributors here for clarifying multiple things for me.

Best regards,
Erkka Lumme
Reply all
Reply to author
Forward
0 new messages