Hello and some questions on ERDDAP Install

Graeme Diack

unread,

Mar 23, 2020, 8:22:00 AM3/23/20

to ERDDAP

Hi All,

I've recently joined this group, I'm looking forward to getting stuck into using ERDDAP!

I'm looking into using the ERDDAP network as either a data source only, or possibly hosting our own node too.

I'm new to web application deployment, but managed to successfully install the web app using a combination of Bob's guide and a post from Ricardo https://groups.google.com/d/topic/erddap/Pq41r6RGC-Q/discussion

I've got a test server up and running, Ubuntu Server 18.04, Java 11 and Tomcat 9, I understand that these are not the recommended version of Java and Tomcat but I'm content that it works as PoC for now.

I've got a few questions that I'm sure are silly but best getting them out the way asap:

1. When ERDDAP builds up those default datasets when you load the page for the first time, are they getting downloaded and stored locally or just indexed?

2. Are ERDDAP nodes expected to hold local copies of data from other ERDDAP nodes, i.e. is this practice frowned upon or encouraged?

3. My key use of ERDDAP will be as a data source for finding datasets based on spatial queries, with the spatial basis of my queries stored as geographic information in a POSTGIS enabled postgresql database.

Is anyone currently doing this already?

4. I'm thinking about ways to automate the ERDDAP install, has anyone already done this?

My trials and tribulations will be recorded here, until I give up and delete the repo :) https://github.com/graemediack/erddap_config

Thanks,

Graeme

Bob Simons

unread,

Mar 23, 2020, 11:40:00 AM3/23/20

to ERDDAP

1) None of the sample datasets download data to make a local copy. They all re-serve remote data on-the-fly.

You can tell (reasonably well) by looking at each dataset's "type", which is right after each "<dataset " in datasets.xml. Some types get data from local files (the type ends in "Files", e.g., EDDGridFromNcFiles) or a local database or Cassandra ("EDDTableFromDatabase", "EDDTableFromCassandra"). But other types get data from a remote server (e.g., "...FromDap", "...FromERDDAP", "EDDTableFromSOS"). If you aren't sure, you can look up the type of dataset at https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#datasetTypes and read the details.

That said, a "EDD...From...Files" dataset can be based on remote files but be set up to make a partial or full local copy of those files. But none of the sample datasets do that. It would just be trouble because it would presume that you had a ton disk space available. Also, it would take a long time to download an entire large dataset. I wouldn't inflict that on newbie ERDDAP admins.

2) Nodes are not expected to hold local copies of data from other ERDDAP nodes. It would take a ton of disk space to replicate many of these datasets. In most cases, there is no need for the other nodes to maintain copies of the data. That's an intentional design feature of federations of ERDDAPs. See https://coastwatch.pfeg.noaa.gov/erddap/download/grids.html (which is geared to one institution, but the idea is the same).

The exception would be if you know that users of your node will be making very high use of a given dataset. Then, if you maintain a local copy of the data, it takes a lot of burden of the source ERDDAP and also makes access faster for your users (since the source is closer). But I'm not aware of anyone actually doing that or needing to do that. If you want to make a partial or full local copy, see https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#cacheFromUrl

3) I'll let other people reply to #3.

I will say: you can do spatial queries to find datasets (in a simplistic way) via the advanced search option in my

https://coastwatch.pfeg.noaa.gov/erddap/download/SearchMultipleERDDAPs.html

which searches about 40 other ERDDAPs for matching datasets.

It isn't a great interface (you have to type in the lat and lon values), but it works.

4) If you're talking about automating the Java + Tomcat + ERDDAP installation, I think the problem is that every setup is different (different OS, different needs/goals). A partial answer to that is Docker, but Docker is yet another tool to learn and obscures some details that may be important. If you are already familiar with Docker and/or have a need for multiple ERDDAPs, Docker is probably the way to go.

If you're talking about the setup.xml and datasets.xml, well, I've tried to make that easier with recent changes/reorganizations (at the expense of current admins (sorry)). The trade off is always: more features leads to more effort (which can be minimized but still ...). I continue to put tons of effort into GenerateDatasetXml (which is has a ton of hueristics) and DasDds, but setting up datasets will always involve some effort. It takes time and effort to learn to use tools well. I think people forget how long it took to learn to drive, type, use Matlab or R, learn their profession, etc. This effort can be minimized, but not eliminated. Interestingly. THREDDS was originally designed to be a pass through system (point it at a bunch of files (almost no effort) and it will serve those files, as is, as separate datasets). ERDDAP took a different approach (always encouraging aggregation and improvement of the metadata). ERDDAP's more active approach will always require more effort, but I think it is worth it. Until the singularity arrives, there are things that a human can do better than a computer.

I hope that helps.

Best wishes.

Bob Simons

unread,

Mar 23, 2020, 12:04:25 PM3/23/20

to ERDDAP

Let me add to #4 / Java/Tomcat/ERDDAP: Java + Tomcat + Apache is probably the most widely used system for non-static web sites. The Java install is pretty easy. The Tomcat install is more work, but that just reflects different OS's, different setup options, different security needs, etc. I'm sure the Tomcat creators work to make the standard installation as easy as possible. I applaud their successful efforts to minimize changes for users (administrators) between versions. ERDDAP is at the interface of IT, data science, and (other) science. A person coming from science might struggle with the IT part. A person coming from IT might struggle with the science part. Nobody is already experienced with all aspects of this (if you include the ERDDAP part) (even me. Ask Roy, my boss, how often I ask him for help with certain IT realms). Plus, everything is always changing/evolving. C'est la vie.

Someone could write and publish simpler instructions if they were applicable to just one situation. People have done this, but I discourage it (the publishing part) because those instructions are in danger of being out-of-date over time if the author doesn't keep up with changes to the original ERDDAP instructions (which are frequent). I don't have a good solution for this other than continually trying to make the original ERDDAP instructions as easy to follow as possible.

I hope that helps.

Best wishes.

.

Graeme Diack

unread,

Mar 23, 2020, 12:33:20 PM3/23/20

to ERDDAP

Hi,

Thanks for the thorough replies!

1 and 2 are kind of what I expected but thought I should ask. Thanks for pointing out the types and where to find them, I know I'll get there eventually but so far I've not looked at modifying the datasets bit yet. Storing specific subsets of data might be something I'm interested in but I'd only do it if I felt the performance would improve.

3. I've had a look at SearchMultipleERDDAPs.html. At the moment I'm not sure which erddaps serve the data we're looking for, so this manual search will certainly be used to establish that.

4. I've had a peek at the docker stuff and decided against it for now because it takes me off on another learning path!

I was focusing on the deployment to ubuntu, so pretty niche automation I'll admit. I'm doing it mainly because I'm not sure where our erddap, if we deploy one, will live but once we make the decision I'd like to be confident I can roll it out in a reasonable timescale.

Maintenance of the repo/keeping up with changes is certainly food for thought.

Regards,

Graeme

Reply all

Reply to author

Forward