Hi Edoardo, sorry for the late reply, we had a busy day here.
Really excited to hear about your work! and also very happy you are learning step by step how to manage an Archipelago.
It is alright to do some trial and error while you figure out things. AMI is extremely powerful, in the sense that it gives you complete control on how your FLAT (CSV) data is mapped/transformed and reshaped to match your desired RAW JSON (basically to make it have the same structure as if you would had generated it via one of the webforms) that will be ingested, but also requires to dive a bit deeper into documentation. The one thing is true. Except if you are going for really generic ADOs you will need to generate a NEW Metadata Display Entity (a Twig template similar to
http://localhost:8001/metadatadisplay/11 (NAMED
AMI Ingest JSON Template) that will take the CSV as input (each ROW) and output during the ingest step (process) correct JSON to feed your new ADOs. If you open that twig template you will notice a very simple pattern that you can extend while working on a specific to your use case.
Key concepts:
First: It outputs JSON. And you can preview The output by pressing "edit" and selecting an AMI set (you can try with your own, the failed ingest one) and then a ROW number (start with ROW 2, ROW 1 is the header). And then iterate, test your learnings there.
Second. It takes a "data" structure (data contains a single complete ROW during ingest) and gives you access via properties to each cell using the header as access point. So if you have a "label" column, the {{ data.label|json_encode|raw}} will output the content of that cell encoded for JSON between " ", ready to be used. Twig is a Pseudo language that gives you many options of modifying/mangling/iterating/validating your data. All that is explained in detail (with examples) here
So going back to your Taxa needs. Archipelago provides you with the option of creating your own schema/organization of metadata, but sometimes re-inventing the wheel is not needed. I personally like a lot Darwin Core (which once you have the data in, you can recreate and allow people to download via another Template that does the inverse, takes your RAW JSON and generates either XML, JSON-LD, JSON, etc) but as ingest structure / taxonomic info the GBIF API responses are very very good and probably easier to manage as RAW JSON (basically what you will store) e.g
Or directly
This is the response. Already JSON !that can be cleaned/stripped down/adapted *or serve as inspiration * to your Archipelago needs
[
{
"key": 2753786,
"nameKey": 479993,
"kingdom": "Plantae",
"phylum": "Tracheophyta",
"order": "Liliales",
"family": "Alstroemeriaceae",
"genus": "Alstroemeria",
"species": "Alstroemeria aurea",
"kingdomKey": 6,
"phylumKey": 7707728,
"classKey": 196,
"orderKey": 1172,
"familyKey": 7695,
"genusKey": 2753642,
"speciesKey": 2753786,
"parent": "Alstroemeria",
"parentKey": 2753642,
"nubKey": 2753786,
"scientificName": "Alstroemeria aurea Graham",
"canonicalName": "Alstroemeria aurea",
"rank": "SPECIES",
"status": "ACCEPTED",
"higherClassificationMap": {
"6": "Plantae",
"196": "Liliopsida",
"1172": "Liliales",
"7695": "Alstroemeriaceae",
"2753642": "Alstroemeria",
"7707728": "Tracheophyta"
},
"synonym": false,
"class": "Liliopsida"
},
So, to make this into a CSV -to ADO JSON workflow, you could either:
- Have one column/header for each hierarchy/tree entry. "scientificName", etc and the values under them just as strings.
- Or you can have directly a key named "taxonomy" with a complete (ready) JSON in the cell.
If you go the first way you would do then in your template the same basics first + a few lines of examples here..
{
"label": {{ data.label|json_encode|raw }},
"type": {{ data.type|json_encode|raw }},
"standard_taxonomy": {
"scientificName": {{ data.scientificName|json_encode|raw }},
"phylum" :{{ data.phylum|json_encode|raw }},
..... etc etc
}
}
This example here brings all taxa into a sub-key instead of ingesting it into the top document. Why? My personal choice only, because maybe you want to keep parallel classifications, the traditional one in taxonomy, maybe another one for a modern phylogenetic taxonomy/cladistic too (side by side) and maybe you need to then update also in the future something because of
https://pr2-database.org/documentation/pr2-taxonomy-9-levels/ (taxonomy is wild!)
This is a lot for a single post. if you ever want/need to have a call please send us an email and we can arrange a discussion session but also (more practical) you can join our slack where things/discussions happen more fluidly and sharing files is easier. You would be also happy to know there are Archipelago users (and even a vendor) in Italy!
Hugs
Diego Pino