OFFICIAL
Hi,
Thanks for the talks today. I couldn't work out a coherent comment/question during the meeting, so I’ll post now.
Responding to queries by the speakers today, I’d like to say that I would certainly appreciate resources (examples/tutorials/templates/schemas) on how vocabularies should be implemented in various formats (e.g. geotiff, geojson, netcdf,
zarr, parquet). For myself but more so that I could direct researchers to them. Implementation that would be interoperable with standards, libraries (Python, web), etc. Which, presumably would make it accessible to AI bots...
At a grassroots level, if a scientist’s dataset is simply published to a repository (as opposed to aggregated into a collection or federated via some web platform/service that does some of the uplift to translate terms to standard vocab elements), what is the
best way to annotate, format, structure it, so that it has a chance of been compatible with things that know about vocabs...?
If we can demonstrate what wins they get from that and if they can build that into how they capture and work with data it has a better chance. We get data published in whatever structure/format that the individual was using to do their work. It is fairly common
that, however willing, a researcher would not have time to re-format their data for publication.
Doesn’t need an answer. Just letting you know what I would certainly appreciate as an implementer. :)
In case it is of interest, an example of how we get scientists to embed BODC vocabularies in their model outputs can be seen with this collection: https://thredds.nci.org.au/thredds/catalog/catalogs/fx3/catalog.html?dataset=GBR4_H4p0_ABARRAr2_OBRAN2020_FG2Gv3_B4p2_Cq5b_Dhnd.
Looking at the OpenDAP response lists all the dataset variables and their attributes. ( https://thredds.nci.org.au/thredds/dodsC/fx3/GBR4_H4p0_ABARRAr2_OBRAN2020_FG2Gv3_B4p2_Cq5b_Dhnd.ncml.html )
Many have units and parameters from the BODC PUV ontology vocabularies.
E.g.

The datasets are used on a web mapping platform (https://portal.ereefs.info/map) (and other tools and libraries) but having the terms embedded in the data (at creation) means they will always be there regardless of how the data is curated in the future.
Thanks,
Erin.
Erin Kenna
Coastal data operations
Environment
CSIRO
Office days: Monday to Thursday
P +61 7 3833 5712 M +61 4 1085 0448
Street Address: EcoScience Precinct, 41 Boggo Road, Dutton Park, QLD 4102
Mail Address: GPO Box 2583, Brisbane, QLD 4001