Problems with Ami sets

48 views
Skip to first unread message

Edoardo Di Russo

unread,
Sep 2, 2024, 4:10:07 AM9/2/24
to archipelago commons
Dear all,
I'm having problems publishing ami sets. I have already created a Collection and published two Ados in it so the Archipelago installation should be successful. I downloaded the example csv to load a series of Ados but I'm encountering some problems because it isn't processed. I replaced the data in the cells with information from my samples. Can I ask someone?
Thank you
Edoardo

dp...@metro.org

unread,
Sep 2, 2024, 10:31:49 AM9/2/24
to archipelago commons
Good day Edoardo,

Thanks for reaching out. Happy to read your local deployment is working and you could ingest ADOs directly using the individual/Webform approach. You probably have seen this already but just to be sure I will share with you the AMI documentation. 
AMI is a very powerful module but also because of the extra flexibility provided there are many ways of doing a batch ingest and thus many ways something can be not correctly be setup and end failing. If you have not yet read the docs in their entirety I would encourage to do so.
The first step of debugging a failed processed AMI set is to review both the AMI set specific Reports tab and also/combined the Drupal Reports/logs. AMI set reports tab can be found on each AMI set and will produce a long report (CSV made of JSON entries) that might shine some light to what failed during processing. Sometimes it might be a missing parent ADO (in the ismemberof/ispartof columns), sometimes it might be a referenced File that could not be fetched and sometimes it might be a mismatch between the CSV data and the lack/presence/choice of a Twig Template (Metadata Display entity) to be used to transform the Rows (per column) data into the proper JSON structure of an ADO during ingest. And Metadata Display Entities also provide a way of previewing/testing an AMI set against them. Because the actual Setup choices of your AMI set define a lot of the processing mechanic, the documentation itself might provide you with a better understanding on making the right choices based on your CSV, templates and source for Files attached to it.

When you say you downloaded the example CSV, which CSV are you referencing? Could you share the link to that CSV? 1.4.0 (and previous releases) ship with an initial AMI set (a Create/Direct ingest that does not use Templates to create the ADOs) that provides an example set of ADOs. Did that one run/was processed correctly?

If you can please check the report tab to your AMI set and also Drupal logs and share with us any errored/warning output/entries there, so we can guide you in getting a successful ingest working, that would be great. Also to aid in the debugging/replicating the issue you can share the AMI set configuration itself (you can see the RAW JSON each AMI set stores in the "View" tab for an AMI set, that JSON stores all your choices during the setup steps). With that JSON and your CSV we can further debug/ replicate your issue and hopefully help you get it working.

Thanks!

All the best

Diego Pino

dp...@metro.org

unread,
Sep 4, 2024, 2:39:55 PM9/4/24
to archipelago commons
Dear Edoardo,

Thanks for providing extra information. I'm adding my reply back to the group so others can benefit of it.

The error message(s) you shared are telling you that  cell values under at least one column header marked as being a source for an Attached file could not produce a File  and because the processing Option (which is good and safe)   Skip ADO processing on missing File  is enabled , any ADO, under that criteria will fail to be ingested to avoid you having to delete/update manually afterwards.

it is always safe and OK to leave a column that is mapped to contain files (e.g "documents" ... this is a setup step and depends on your choices of course, you can define where/which columns will have references to files) with empty (none at all) values, but any text entry under any of those will be evaluated and an attempt to either connect a file from a local/remote storage source/attached ZIP file or fetch ( in the case of a remote URL ) will be made. If that is not possible (the file does not exist in any of the locations) then the row will be skipped and that error will be logged.

This is all documented in the first link I shared but the specific section on how to reference files is here

If your the files for your ADOs are inside an attached ZIP file (attached to the AMI set) you can use a direct path without a "/" at the beginning. e.g the ZIP contains 

image1.jpeg
folder/image2.jpeg

then under the "images" columns you will add image1.jpeg;folder/image2.jpeg

Hope this helps and you manage to get your set ingested.

Please let us know if you have any other questions and need further help with this. And please feel free to share your use cases and what you are working on anytime.

All the best

Diego

Hi Diego,
Thanks for your reply. I downloaded this csv (https://github.com/esmero/archipelago-deployment/blob/1.0.0/d8content/ami_set_entity_01.csv) and replaced the fields with the information from my Ados, leaving the columns for which I have no data blank. The example AMI set provided 1.3.0 was processed correctly and I can see all the Digital Objects in My Content. When I process my csv this is the same message for all my records: Skipping ADO with UUID:8e50f2cd-95a9-4520-a4fa-cf8a48a78d31 because one or more files could not be processed and Skip ADO processing on missing File is enabled. 
Thank you
Edoardo

Diego Pino

Edoardo Di Russo

unread,
Sep 9, 2024, 3:18:02 AM9/9/24
to archipelago commons
Dear Diego,
I am working with natural science collections held in the Institute of Marine Sciences of Venice. I'm learning the basics of using docker and drupal and your list of steps was very helpful. 
Unfortunately I'm going ahead with trial and error and now I have the doubt that I have not installed well (or I do not have at all) a twig template, which should associate the fields of my csv with the metadata schema to be accepted by archipelago. I also have to see how to add metadata that are not currently provided by the demo template to insert all the information about my records, for example the entire taxonomy hierarchy from phylum to species.
Thank you
Edoardo

dp...@metro.org

unread,
Sep 10, 2024, 6:51:05 PM9/10/24
to archipelago commons
Hi Edoardo, sorry for the late reply, we had a busy day here.

Really excited to hear about your work! and also very happy you are learning step by step how to manage an Archipelago.

It is alright to do some trial and error while you figure out things. AMI is extremely powerful, in the sense that it gives you complete control on how your FLAT (CSV) data is mapped/transformed and reshaped to match your desired RAW JSON (basically to make it have the same structure as if you would had generated it via one of the webforms) that will be ingested, but also requires to dive a bit deeper into documentation. The one thing is true. Except if you are going for really generic ADOs you will need to generate a NEW Metadata Display Entity (a Twig template similar to http://localhost:8001/metadatadisplay/11 (NAMED AMI Ingest JSON Template) that will take the CSV as input (each ROW) and output during the ingest step (process) correct JSON to feed your new ADOs. If you open that twig template you will notice a very simple pattern that you can extend while working on a specific to your use case.

Key concepts: 
First: It outputs JSON. And you can preview The output by pressing "edit" and selecting an AMI set (you can try with your own, the failed ingest one) and then a ROW number (start with ROW 2, ROW 1 is the header). And then iterate, test your learnings there.
Second. It takes a "data" structure (data contains a single complete ROW during ingest) and gives you access via properties to each cell using the header as access point. So if you have a "label" column, the {{ data.label|json_encode|raw}} will output the content of that cell encoded for JSON between " ", ready to be used. Twig is a Pseudo language that gives you many options of modifying/mangling/iterating/validating your data.  All that is explained in detail (with examples) here

So going back to your Taxa needs. Archipelago provides you with the option of creating your own schema/organization of metadata, but sometimes re-inventing the wheel is not needed. I personally like a lot Darwin Core (which once you have the data in, you can recreate and allow people to download via another Template that does the inverse, takes your RAW JSON and generates either XML, JSON-LD, JSON, etc) but as ingest structure / taxonomic info the GBIF API responses are very very good and probably easier to manage as RAW JSON (basically what you will store) e.g

Or directly

This is the response. Already JSON !that can be cleaned/stripped down/adapted *or serve as inspiration * to your Archipelago needs

[ { "key": 2753786, "nameKey": 479993, "kingdom": "Plantae", "phylum": "Tracheophyta", "order": "Liliales", "family": "Alstroemeriaceae", "genus": "Alstroemeria", "species": "Alstroemeria aurea", "kingdomKey": 6, "phylumKey": 7707728, "classKey": 196, "orderKey": 1172, "familyKey": 7695, "genusKey": 2753642, "speciesKey": 2753786, "parent": "Alstroemeria", "parentKey": 2753642, "nubKey": 2753786, "scientificName": "Alstroemeria aurea Graham", "canonicalName": "Alstroemeria aurea", "rank": "SPECIES", "status": "ACCEPTED", "higherClassificationMap": { "6": "Plantae", "196": "Liliopsida", "1172": "Liliales", "7695": "Alstroemeriaceae", "2753642": "Alstroemeria", "7707728": "Tracheophyta" }, "synonym": false, "class": "Liliopsida" },
 
So, to make this into a CSV -to ADO JSON  workflow, you could either:
- Have one column/header for each hierarchy/tree entry. "scientificName", etc and the values under them just as strings. 
- Or you can have directly a key named "taxonomy" with a complete (ready) JSON in the cell.

If you go the first way you would do then in your template the same basics first + a few lines of examples here..
{
"label": {{ data.label|json_encode|raw }},
"type": {{ data.type|json_encode|raw }},
"standard_taxonomy": {
     "scientificName":  {{ data.scientificName|json_encode|raw }},
     "phylum" :{{ data.phylum|json_encode|raw }},
..... etc etc
   }
}
This example here brings all taxa into a sub-key instead of ingesting it into the top document. Why? My personal choice only, because maybe you want to keep parallel classifications, the traditional one in taxonomy, maybe another one for a modern phylogenetic taxonomy/cladistic too (side by side) and maybe you need to then update also in the future something because of  https://pr2-database.org/documentation/pr2-taxonomy-9-levels/ (taxonomy is wild!)

This is a lot for a single post. if you ever want/need to have a call please send us an email and we can arrange a discussion session but also (more practical)  you can join our slack where things/discussions happen more fluidly and sharing files is easier. You would be also happy to know there are Archipelago users (and even a vendor) in Italy!


Hugs

Diego Pino 

Reply all
Reply to author
Forward
0 new messages