wget 3hlry data

73 views
Skip to first unread message

Rebecca Ross

unread,
Nov 4, 2016, 12:22:09 PM11/4/16
to HYCOM.org Forum
Hi again,

Thank you so much for sharing wget scripts in the past they have helped a lot with gathering large volumes of data. I was just wondering if you could help with modifying this script for downloading the 3hrly reanalysis data.

Here is a previous script I had working last year (only minorly modified from one you shared in this forum previously):

#!/bin/bash

WGET='/usr/bin/wget'

YEAR='2007'
MONTH='01'
DAY='04'
StartSeq='0'
EndSeq='465'

NCSS='http://ncss.hycom.org/thredds/ncss/grid'
MODEL='GLBu0.08'
EXPT='expt_19.1'

VARS="var=water_u,water_v"
SPATIAL='spatial=bb'
NORTH='north=70'
SOUTH='south=40'
EAST='east=0'
WEST='west=-30'

for PlusDay in `seq $StartSeq $EndSeq`; do
 
  MyTime=`date -d "$YEAR-$MONTH-$DAY +$PlusDay days" +%Y-%m-%dT%H:%M:%SZ`
  TimeStart="time_start=$MyTime"
  TimeEnd="time_end=$MyTime"
 
  OutFile=$MODEL"_"$EXPT"_`echo $MyTime | cut -d 'T' -f 1`T00Z.nc"
 
  URL="$NCSS/$MODEL/$EXPT?$VARS&$SPATIAL&$NORTH&$SOUTH&$EAST&$WEST&$TimeStart&$TimeEnd"
 
  if [ -s $OutFile ]; then
      echo "[warning] File $OutFile exists (skipping)"
  else
      wget -O $OutFile "$URL"
  fi
done
 
I am not very familiar with bash, but can see that the EXPT could just be changed to 'expt_19.1/3hrly' to start with (or a new variable added for the '3hrly' and the URL jigsaw modified). The main issue I can see is that there will be an issue with using the PlusDay function given that we need to iterate through 3 hourly chunks before using PlusDay so presumably will need to nest loops?

I did wonder if I could fudge it and leave it doing the PlusDay script but have a way of changing the hour, so you just have to re-run the script 8 times just changing an hour somewhere. But I also have not yet figured out how to specify the hour in MyTime as a variable.

Can you help?

All the best,
Bex

Michael McDonald

unread,
Nov 4, 2016, 1:41:37 PM11/4/16
to HYCOM.org Forum
Step 1: go to the catalog,


step 2: look at the "access" methods and click the one you want



it is basically the same base URL as the t000 only data but now with "/3hrly" appended

So you could append this to the "EXPT" variable,

EXPT='expt_19.1/3hrly'

or make a new variable "3hrly" and then inject this in the for loop where the wget URL is constructed. 

URL="..."

The trick is getting the for loop sequence to skip 3 hrs at a time within a day value, so this will be a nested for loop.

Michael McDonald

unread,
Nov 4, 2016, 1:42:56 PM11/4/16
to HYCOM.org Forum

Rebecca Ross

unread,
Nov 8, 2016, 4:37:35 AM11/8/16
to HYCOM.org Forum
Hi Michael and others,

Thank you for taking the time to confirm my logic as to what needs to change and for providing an example url. It is the new scripting for the 3hrly nested loop that creates those urls that is going to take me a while to work out, so was hoping for some hints there. But I can appreciate you are busy, and would prefer I work it out myself. Unfortunately I would prefer this too but I am also busy so was hoping for a bit more guidance from someone who is a dab hand at bash.

I'll keep trying to crack it and see where I get and will try to be more specific with my question if I need to come back here.

Meanwhile if any good Samaritan has a script that is already created for this task, I would much appreciate a look to help put me out of my misery, and I'm sure many others would be appreciative of an example to work from too. 

If I do manage to work it out myself by some miracle, I'll post it here for others to use in the future.

All the best,
Bex

P.S. Americans, good luck today! We are watching with baited breath from across the pond! Please, please remember to use your vote!

On Friday, November 4, 2016 at 5:42:56 PM UTC, Michael McDonald wrote:

Michael McDonald

unread,
Nov 8, 2016, 9:47:35 AM11/8/16
to Rebecca Ross, HYCOM.org Forum
Bex,

Do you mind if all 3-hrly data for a 24hr period are grouped into a single file?

if not, then this becomes trivial (in hindsight) and does not require
a nested loop (for the 3 hrly skipping).

NCSS can already serve back "a range" (time_duration) of dates/times
in an NCSS query.

http://www.unidata.ucar.edu/software/thredds/current/tds/reference/NetcdfSubsetServiceReference.html#Temporal

in this part of the code

MyTime=`date -d "$YEAR-$MONTH-$DAY +$PlusDay days" +%Y-%m-%dT%H:%M:%SZ`
TimeStart="time_start=$MyTime"
TimeEnd="time_end=$MyTime"

The TimeStart = TimeEnd, i.e., a single snapshot at 00z

If you set TimeEnd to a duration else,

TimeEnd="time_duration=P1D"

then all the 3hrly data for a given t000/00Z starting point (within 1
day & excluding the next day at 00z) will be provided. This is "safer"
as there are missing 3hrly values
(https://hycom.org/faqs/451-did-you-know-there-are-missing-days-in-the-global-reanalysis),
so NCSS will automatically skip these and not give an error.

https://www.dropbox.com/s/def6e5tmc1nf31v/Screenshot%202016-11-08%2009.02.53.png?dl=0
ncss-3hrly-example-for-Bex.sh

Rebecca Ross

unread,
Nov 8, 2016, 10:14:34 AM11/8/16
to HYCOM.org Forum, bexe...@gmail.com
Hi Michael,

Ultimately I will need the data in individual time snapshots, but if I have the data in day files with multiple snapshots in each I can probably figure something out with NCO later to split them back up again.

Thank you very much for this!

I have spent hours on this today (it is 3pm here in the UK): as with any language it is one thing understanding what is written, and a whole other thing writing something new with correct syntax and vocabulary!

I'll do a trial run and see if I can ncks the files apart once I have them. If that works then I'll post the ncks code here in case others need it, rejoice at a problem solved with your help (!), then move onto getting my project going.

Huge thanks for your time and help,

All the best,
Bex
Reply all
Reply to author
Forward
0 new messages