Amazon Web Service interface

Brian McNoldy

unread,

Mar 1, 2018, 8:12:57 AM3/1/18

to idl-pvwave

Has anyone used IDL to access data hosted by Amazon Web Service (AWS)? Their buckets and chunks aren't quite as straightforward to grab as traditional directories and subdirectories. It *seems* like it needs a specific interface, and I was not able to find such an interface in the documentation. I'm still using v8.4, but my question could apply to any version.

Thanks!

Brian

Jim Pendleton

unread,

Mar 1, 2018, 10:14:06 AM3/1/18

to idl-pvwave

Is there a specific bucket setup that's introducing an issue?

There's example syntax for accessing Landsat 8 TIFF data on an S3 server in the docs for the new IDLAsyncBridgeJob class in IDL 8.7.

Jim P.

Brian McNoldy

unread,

Mar 1, 2018, 10:27:29 AM3/1/18

to idl-pvwave

Well, I suppose there are multiple issues... I only have v8.4, I am brand new to the syntax and methods of using AWS (even the example was confusing), and I don't know the filename I want to get ahead of time (I only want the most recent file, which is stored in year/date directories and timestamped).

But, for the sake of others who may be curious, I am looking into real-time Level 2 NEXRAD radar data. There is some help and description available at https://aws.amazon.com/public-datasets/nexrad/.

If I understand correctly, AWS is not accessible using IDL without this new class?

Thank you,

Brian

markus.sc...@gmail.com

unread,

Mar 1, 2018, 10:42:17 AM3/1/18

to idl-pvwave

The only thing the new class makes better is to asynchronously download the data, downloading it serially can also be done with older IDL versions.

Figuring out which files to download is a separate problem, but I don't know anything about AWS so I can't help there.

Markus

helder.m...@gmail.com

unread,

Mar 1, 2018, 10:47:36 AM3/1/18

to idl-pvwave

Hi Brian,

I just had a quick look at Jim's suggestion. The class described there is only a method to do that in a separate IDL job (multitasking).

I think you should try the sequence of commands described in the bridgeCmd:

u = IDLnetURL()

!null = u.get(URL=s3Url, FILE=localFile)

obj_destroy, u

Before doing that, you should set s3Url and localFile to the variables given in the loop. Try the first value with suf = suffixes[0] and see how it goes.

Then what you really need to look into is IDLnetURL and see if you can get the directory structure... I don't have time for that now, but can try that tomorrow.

Cheers,

Helder

Jim Pendleton

unread,

Mar 1, 2018, 10:56:15 AM3/1/18

to idl-pvwave

You guys are correct, I was just using it as an example for the syntax.

You don't need to use the IDLAsyncBridge object. The core of the request is the IDLnetURL call, which is available in IDL 8.4.

You'll need to parse some XML from the NEXRAD data to get to the data you want.

IDL> url = 'https://noaa-nexrad-level2.s3.amazonaws.com'

IDL> u = idlneturl()

% Loaded DLM: URL.

IDL> out = u.get(url = url, /buffer)

IDL> help, our

OUT BYTE = Array[375285]

IDL> print, string(our)

(a lot of XML here)

You may want to use one of the XML parsing objects to get the paths to the files. Or it may be simple enough to parse "by hand".

Then you download the image of interest.

Looking through the XML, I ran across this string "1991/06/05/KTLX/KTLX19910605_205753.gz". I tacked it onto the base URL.

IDL> url = 'https://noaa-nexrad-level2.s3.amazonaws.com/1991/06/05/KTLX/KTLX19910605_205753.gz'

IDL> gz = u.get(url = url, /buffer)

IDL> help, gz

GZ BYTE = Array[1898985]

IDL> gz = u.get(url = url, file='c:\temp\test.gz')

IDL> file_gunzip, 'c:\temp\test.gz', buffer = b

Now what you do with the data at this point is up to you...

Brian McNoldy

unread,

Mar 1, 2018, 1:12:41 PM3/1/18

to idl-pvwave

Thank you all for your input! Playing around with it, I see that the request brings in only the first 1000 "keys"... and there are MANY times that number. So I can't parse the result because it doesn't even include what I need. To get to the file of interest, it seems like one needs to know the key name, which is date and time specific. I don't think I have time to dig deeper into this now, but it will be brewing in the back of my brain for a while!

Cheers,

Brian

helder.m...@gmail.com

unread,

Mar 1, 2018, 2:21:52 PM3/1/18

to idl-pvwave

Hi Brian,

I tried the same as Jim showed, but instead saved the data to a file:

IDL> url = 'https://noaa-nexrad-level2.s3.amazonaws.com'

IDL> u = idlneturl()

% Loaded DLM: URL.

IDL> out = u.get(url = url, filename='C:\Users\Helder\Desktop\testFile.txt')

I looked at the webpage described in https://noaa-nexrad-level2.s3.amazonaws.com and it is identical to the one I downloaded.

I hope it helps.

Cheers,

Helder

Paulo Penteado

unread,

Mar 16, 2018, 11:11:01 AM3/16/18

to idl-pvwave

Hello,

It sounds like you are hitting the 1000 key limit on AWS:

IDL> url='https://noaa-nexrad-level2.s3.amazonaws.com'

IDL> u=idlneturl()

IDL> out=u.get(url=url,/buffer,/string)

IDL> print,stregex(out,'<IsTruncated>[[:alpha:]]+</IsTruncated>',/extract)

The list is separated into pages, each with a maximum of 1000 keys. So, one approach is to use the marker parameter to get the next page of up to 1000 keys, and repeat until you get to the end (when IsTruncated becomes false):

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

This is not the neatest code, but it does the job of retrieving all keys:

baseurl='https://noaa-nexrad-level2.s3.amazonaws.com'

u=idlneturl()

page=0L

lastkeys=list()

firstkeys=list()

lastkey=''

repeat begin

url=baseurl+'/?marker='+lastkey

print,'retrieving page ',strtrim(page,2)

out=u.get(url=url,/buffer,/string)

istruncated=(stregex(out[1],'<IsTruncated>([[:alpha:]]+)</IsTruncated>',/subexpr,/extract))[-1]

outs='<'+strsplit(out[1],'<',/extract)

keys=outs[where(stregex(outs,'<Key>.+',/boolean))]

keys=reform((stregex(keys,'<Key>(.*)',/extract,/subexpr))[1,*])

lastkey=keys[-1]

page+=1

lastkeys.add,lastkey

firstkeys.add,keys[0]

print,'first key: ',keys[0]

print,'last key: ',lastkey

endrep until istruncated eq 'false'

print,'Found ',strtrim(page,2),' pages'

print,'Last key: ',lastkeys[-1]

I just ran it, and for that particular list it is not really practical to get to end of the list. It starts with 1991/06/05/KTLX/KTLX19910605_162126.gz, and, after 1000 pages, only gets to 1995/04/09/KMLB/KMLB19950409_230634.gz. It will take a long time (days, I guess) to get to the end in 2018.

Since you only want the most recent one, you could save time by using the prefix parameter, to filter the list to keys starting with that prefix. If I set a prefix of 2018/03/, I get 470 pages just for these 16 days in March/2018. You could set a prefix of yesterday's date, and look for the last one (yesterday instead of today, to avoid problems with timezones). A prefix can be added by changing the url, as in