Best practices to version control datasets.xml

Kim Hyde - NOAA Federal

unread,

Apr 26, 2022, 12:40:27 PM4/26/22

to ERDDAP

Hello,

We are in the process of setting up a new ERDDAP server and multiple people will have access to add their specific datasets. As we develop our SOP, we want to include steps that will version control the important files such as datasets.xml. Our plan is to use git/github, but before we get started I wanted to see what others do for version control. I'm new to both ERDDAP and GitHub so any and all information would be greatly appreciated.

Thank you,

Kim

Bob Simons

unread,

Apr 26, 2022, 3:00:26 PM4/26/22

to ERDDAP

GitHub is always an excellent solution for version control when there are multiple people involved.

Roy Mendelssohn - NOAA Federal

unread,

Apr 26, 2022, 4:02:57 PM4/26/22

to Kim Hyde - NOAA Federal, ERDDAP

Hi Kim:

I am no certain what you are envisioning but for security reasons as well as practical reasons you want as few people as possible with access to $TOMCAT_HOME/content/erddap/datasets.xml. I know the way I set up my personal ERDDAP (and I believe it is how Bob does our main ERDDAP), is datasets.xml has a bunch of "header" stuff before any datasets are defined. There is also a one-line ender. I put each of these into a separate file. Then for each grouping of datasets (however you want to define the grouping) I have a separate file. The I use to the "cat" command to join the files into one. This make it much easier to edit and test.

Just remember if you use github and a public account, that setup.xml may contain sensitive information.

HTH,

-Roy

> --
> You received this message because you are subscribed to the Google Groups "ERDDAP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/d3fc97e7-242d-4837-90be-0e1f2eb46be0n%40googlegroups.com.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Men...@noaa.gov www: https://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

Nate Rosenstock

unread,

Apr 26, 2022, 5:27:14 PM4/26/22

to Kim Hyde - NOAA Federal, ERDDAP

Hi Kim,

At Hakai we use Github to version our datasets.xml snippets, we use one xml file per dataset and they get combined on the server when there are changes. Since some of our datasets are from databases, we hide the database connection information with the text "hakai_erddap_sourceUrl", this gets replaced on the server with the real connection string. This method has been working well for us, and we use Github issues and pull requests to discuss new or changing datasets.

See https://github.com/HakaiInstitute/hakai-datasets

Nate

--

You received this message because you are subscribed to the Google Groups "ERDDAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/d3fc97e7-242d-4837-90be-0e1f2eb46be0n%40googlegroups.com.

--

Nate Rosenstock

Software Developer

CIOOS Pacific / Hakai Institute / Tula Foundation

Tylar Murray

unread,

Mar 20, 2023, 11:30:03 PM3/20/23

to ERDDAP

I am testing a system that separates chunks of datasets.xml into separate files for each dataset, then merges them together using a github action.

I am liking this system so far and am curious if anyone else wants to give it a shot. If interested please reply here or email me directly (murray...@gmail.com) so we can set a meeting.

Or: if you want to try it yourself from the documentation:

1) Create a new github repo by clicking the "Use this template" button https://github.com/7yl4r/erddap-config-template

2) Modify the repo according to the instructions in the README

3) pull & deploy the configuration on your ERDDAP (ideally using docker compose)

bobsimons2.00

unread,

Mar 21, 2023, 8:41:04 AM3/21/23

to ERDDAP

What are the advantages of this over Roy's simple but effective system of creating and maintaining separate files and then using cat to combine them into one datasets.xml file?

Does your system have a way to specify the order of datasets in datasets.xml, which is often useful?

Wilcox, Kyle

unread,

Mar 21, 2023, 10:12:51 AM3/21/23

to bobsimons2.00, ERDDAP

If you are using the Docker container from https://github.com/axiom-data-science/docker-erddap there is a datasets.d mode that allows the ordering and concatting of many XML files into a single datasets.xml file. See the documentation here: https://github.com/axiom-data-science/docker-erddap#datasetsd-mode---experimental.

From: erd...@googlegroups.com <erd...@googlegroups.com> on behalf of bobsimons2.00 <bobsim...@gmail.com>
Sent: Tuesday, March 21, 2023 8:41 AM
To: ERDDAP <erd...@googlegroups.com>
Subject: Re: [ERDDAP] Best practices to version control datasets.xml

To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/38ec71a1-f305-4c6c-84d8-c95ce2fdcebfn%40googlegroups.com.

Tylar Murray

unread,

Mar 21, 2023, 10:23:39 PM3/21/23

to ERDDAP

## pros/cons vs using `cat`:
This is functionally the same as using the `cat` program to concatenate the files in bash. I added a layer of python so I could more easily add fancy logging, a CLI, and future features. The python could be cut out of this system by putting the cat command into entrypoint.sh instead of calling the python script.

What this offers above just smashing the files together is the ability to keep dataset documentation in git alongside the xml for that dataset, have researchers edit it using github's web GUI, the changes get automatically built into a new datasets.xml, and the server pulls it. This does give my users the power to break the ERDDAP server, but when that becomes a problem I would like to add a "linter" to the `erddap-datasetsxml-builder`; I have some code to read and validate xml in python laying around here somewhere...

## putting datsets into order:

I was planning to order things alphabetically, but a variable that puts the files into a custom order can be added with a couple hours of work.

## RE datasets.d

I had not seen this. We should combine efforts.

The [datasets.d.sh](https://github.com/axiom-data-science/docker-erddap/blob/main/datasets.d.sh) file that does this is pretty simple. I would like to add functionality to it, but I want to rely on several python dependencies. Other than those concerns the `build_datasetsxml.py` CLI can be a drop-in replacement for the bash.

Alternatively, maybe the github template repo can be set up to work with datasets.d.

bobsimons2.00

unread,

Mar 31, 2023, 4:41:33 PM3/31/23

to ERDDAP

@Tylar, Ah. I didn't know/think about using GitHub's editing capabilities. I see now how having the researchers use GitHub's GUI editor to edit the files, plus your Google script, makes for a very efficient system for researchers to edit the files then have the changes be saved via GitHub and pulled to the ERDDAP installation. That's very nice.

Reply all

Reply to author

Forward