HASH subscripting

114 views
Skip to first unread message

KH

unread,
Jun 21, 2021, 1:52:49 PM6/21/21
to idl-pvwave
Hello,

I am trying to learn more about how to work with HASH (and ORDEREDHASH) data and I was wondering if there is a good resource or two (beyond the IDL help) that could be useful.  One aspect I am trying to figure out are subscripts, both within the data and the keys so that I can update or add more data to the HASH.

I work with data that are produced daily (near-real time - NRT) and then the NRT data are updated ~2-3 weeks later.  So, I need to be able to 1) add new NRT data to the data array stored in a HASH and 2) replace the NRT data with the updated data.  In order to do this I need to be able to subscript the data in the HASH.  Most of the data are in arrays, but there is also a structure containing the database information.  I have done some searching, but have not found a good resource that hows how to modify different types of data in a HASH.

I have had some success modifying arrays, but I am struggling with structures in the HASH. For example, how can I modify the HASH data that is stored in a structure?
H = HASH('str',CREATE_STRUCT('a',1,'b',2))
IDL> PRINT, H['str'].a
           1
IDL> H['str'].a=0
% Attempt to store into an expression: Structure reference.

Regarding "key" subscripts, my plan is to create the HASH dynamically and so some key names may or may not be included in a given HASH depending on the data product.  For example, one statistical HASH may have MIN, MAX and MEAN variables, but a different one may have MIN, MAX and MEDIAN.  What I would like to do is search for the key (e.g. MEAN) and if it exists, update the data.  I could do it for all possible key names, but the code would be more streamlined if I could find out "where" the data are and use the subscript instead of the key name to update the data.  I know how to do it with a structure, but don't know how to do something similar to the code below for a HASH.

OK = WHERE(TAG_NAMES(STRUCT) EQ 'MEAN',COUNT)
IF COUNT GT 0 THEN STRUCT[OK] = 1

KEYS = HASH.KEYS()
OK = WHERE(KEYS EQ MEAN,COUNT)
??? How do I use the subscript OK to indicate which data in the HASH to edit???

I would greatly appreciate any advice or resources on working with HASH data.
Thanks,
KH

John Correira

unread,
Jun 21, 2021, 2:20:28 PM6/21/21
to idl-pvwave
Are you constrained to using a structure? Seems like a nested HASH would be more flexible:

IDL> H = HASH('str',CREATE_STRUCT('a',1,'b',2), /extract, /fold_case)
IDL> PRINT, H['str','a']
           1
IDL> H['str','a']=0

The DICTIONARY data type might be worth a look as well.

For the second question I would use the HasKey method and do something like

if hash.HasKey('mean') then hash['mean'] = 1

John

KH

unread,
Jun 21, 2021, 3:26:07 PM6/21/21
to idl-pvwave
Thank you John,

The nested HASH could work for my structure needs.  I am still learning HASHs so it didn't occur to me to nest it.

Concerning the subscripts for keys, I would like to avoid a large number if IF THEN (or CASE) statements to cover all of the possible key names.  For example, if there are 30 possible key names, I would prefer to loop on the keys versus writing out 30 different IF THEN statements.  It also reduces potential future issues if new keys are added to the HASH that may not have been listed in the IF THEN statements.  Are subscripts not an option with HASH keys?

Thanks again
KH

Message has been deleted

John Correira

unread,
Jun 21, 2021, 3:38:43 PM6/21/21
to idl-pvwave
Hmm, hard to say for sure without knowing more, what about adding the keys to a list (only one place to edit in future) and then looping over that?

keys = ['a', 'b', 'c', 'd']
foreach key, keys if hash.HasKey(key) then hash[key] = 1

You can use a number as a key, but you can't subscript a hash the same way you would an array.

John

KH

unread,
Jun 21, 2021, 4:07:28 PM6/21/21
to idl-pvwave
Thanks John,

My plan is to read monthly data files that have 2D arrays of multiple statistical products (e.g. MEAN, MIN, MAX).  Depending on the product (temperature, salinity, chlorophyll, etc.) there may be different stat products in the individual files.  What I want to create are stacked data arrays of each of the stat product (i.e. the data from all 12 months in a year) in a single file so the file would have 1) a database with the file information, 2) a stacked array of the MEAN, 3) a stacked array of the MIN, 4) a stacked array of the MAX and 5) any other statistical variables called for.  I'm trying to make the code dynamic and flexible without hard coding in all of the various stat types.  So for example, if I decide to add the MEDIAN to my individual stat files, the code would see that new stat product and add it to the HASH as another key variable, but I don't want to have to manually add lines of code on how to modify the 'MEDIAN' variable when new data become available.  I have figured out how to dynamically create the key names based on my stat files, and now I am trying to figure out how to modify the various stacked arrays based on the stat variable.  

All that being said, I think I found the answer to my question regarding key subscripts:
H = HASH('A',1,'B',2,/FOLD_CASE)
KEYS = H.KEYS()
SUB = KEYS[WHERE(KEYS EQ 'A')]
PRINT, H[SUB]

I was not aware of the DICTIONARY function.  I see that it can be more easily manipulated than a structure, but how does it differ from a HASH?  Which would be more efficient for the description above.

Thank you again for your assistance,
Kim

markus.sc...@gmail.com

unread,
Jun 23, 2021, 7:58:51 AM6/23/21
to idl-pvwave
a DICTIONARY is like a HASH, but the keys have to be valid IDL variable names, and can be accessed like a HASH or with the "dot" notation.
If you don't want to hard code the key names, you can't use the "dot" notation and I don't see an advantage to using a DICTIONARY.

John is right about using double hashes to collect data from the monthly files. To stack the data into arrays you would then invert the hash and finally concatenate them to arrays. Here some simple functions that might help:

function file2hash, file
  Osav=obj_new('IDL_savefile',file)
  Names=Osav.Names()
  Osav.restore,Names
  out=hash('filename',file)
  foreach n,names do out[n]=scope_varfetch(n)
  return,out
end

pro readUpdate, mainHash, file_list, key_list
  if n_elements(mainHash) eq 0 then mainHash=hash()
  foreach file,file_list,i do mainHash[key_list[i]]=file2hash(file)
end

function getValIfEx, ha, key
  if ha.hasKey(key) eq 0 then return, !null
  return,ha[key]
end

function invertDoubleHash, ha
  allKeys=(ha.reduce(lambda('y,x:y+x.keys()'),value=list())).toArray()
  allKeys=allKeys[uniq(allKeys,sort(allKeys))]
  out=hash()
  foreach key,allKeys do begin
    oneVal=ha.map('getValIfEx',key)
    out[key]=oneVal.filter(lambda('x:x ne !null'))
  endforeach
  return, out
end

For the concatenation you then probably do another REDUCE.

I hope this helps, Markus
Reply all
Reply to author
Forward
0 new messages