Populate search index with data from DIP upload

47 views
Skip to first unread message

Apostolof

unread,
Mar 26, 2020, 6:12:18 AM3/26/20
to AtoM Users
Hello everyone,

I have setup both Archivematica and AtoM, I've made the integration and I am uploading DIPs from AM to AtoM.

I would expect that when a DIP is uploaded to AtoM:
    1) the files that are generated by different services would also be visible inside AtoM or at least indexed for search
    2) the METS content would also be indexed for searching

To further clarify:
1) When the transfer contains TIF file(s) AM produces an OCR transcript. I want the transcript to be available in AtoM (perhaps as metadata of the original transfer) and indexed so that I can search for the archival description using a word or phrase that appears inside the TIF.

2) I have setup a few custom commands and created rules for them to run for different types of files. The new commands are executed during Archivematica's Characterization step and the results are XMLs stored in the METS file. Since the METS file is included both in the AIP and the DIP uploaded in AtoM, I though that AtoM would index that information for searching.

Currently, in the archival description, I can only see the files originally included in the transfer (see screen-shot attached). Information included in the METS file are not used in the search. Nevertheless, the DIP uploaded contains all the files mentioned:

vagrant@ubuntu-xenial:~$ tree /tmp/meta_test_6-98b46a9d-9c07-49d6-9a78-ad179c04e8f0/
/tmp/meta_test_6-98b46a9d-9c07-49d6-9a78-ad179c04e8f0/
├── METS.98b46a9d-9c07-49d6-9a78-ad179c04e8f0.xml
├── objects
│   └── ffb3d724-99c3-41fd-98e5-16120290eae0-tif_test_file.jpg
├── OCRfiles
│   └── ffb3d724-99c3-41fd-98e5-16120290eae0-tif_test_file-64b17003-a6cb-4bce-b9e0-0ab08b1592c7.txt
├── processingMCP.xml
└── thumbnails
    └── ffb3d724-99c3-41fd-98e5-16120290eae0.jpg

3 directories, 5 files

I am using the latest versions of both tools, AM 1.11.0 and AtoM 2.5.
Any ideas on how I can achieve this?

Thanks,
Apostolof
Screenshot_2020-03-26 tif_test_file tif - TEST(1).png

apos...@datascouting.com

unread,
Mar 27, 2020, 12:07:57 PM3/27/20
to AtoM Users
After a lot of searching I stumbled on this post:
https://groups.google.com/forum/#!searchin/ica-atom-users/$20image$20transcript$20view|sort:date/ica-atom-users/_hWwxJUpAzo/mxRUNOgAAwAJ

It seems to me that using multiple levels of description is the only way to do what I want. I am not sure if (and how) it can be done in Archivematica, but I guess that is a question for the AM group.

Dan Gillean

unread,
Mar 27, 2020, 4:28:33 PM3/27/20
to ICA-AtoM Users
Hi there Apostolof, 

Unfortunately, I suspect it will require new development to be able to implement the functionality that you hope to see. 

There are many places we would like to improve and enhance the integration between Archivematica and AtoM. One of the disadvantages of our business model (in which we give everything away under open licences, and rely on paid support services to maintain the company and paid or community-submitted development to enhance our core projects) has been that, while WE have many great ideas about how we could improve the integration between the two applications, it often depends on the priorities of our clients who are willing to sponsor work. 

For a long time institutions who were using AtoM weren't using Archivematica, and vice versa - meaning that the incentive pay for integration enhancement development was limited or nonexistent for these clients. That's changing now more and more - and additionally, Artefactual is restructuring internally to try to better integrate our teams and provide more room for internal development - so we hope to see changes in the future. 

For now however, AtoM only extracts some of the relevant technical metadata from the METS file. 

The Archivematica team could confirm this better than myself, but I believe that currently, the one exception to this is when a user adds DC metadata via the web form in Archivematica, during Transfer. In this case, Archivematica can pass this metadata to a new parent level of description, beneath which all DIP objects will be attached. As such, you can only add aggregate metadata about a transfer and have it show up in AtoM - not individual object metadata. See: 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/98bb64df-5eb1-4325-a83d-cc3009425a37%40googlegroups.com.

apos...@datascouting.com

unread,
May 5, 2020, 4:34:56 AM5/5/20
to AtoM Users
Hello Dan,

thank you for your response and for trying to help. I finally got a working solution to my problem, I'll describe it here in case it helps anyone.

I created new tools, commands and rules that I attached to Characterization (running during Transfer). The commands generate new metadata and programmatically add them in a metadata.csv in the appropriate folder of the SIP being created. Later, Archivematica parses the csv and my metadata are indexed and uploaded to AtoM, just like any metadata added using the Dublin Core metadata upload feature of Archivematica.

The main problem with this, apart from being a bit of a hack and difficult to maintain, is that you are confined by the DC tags supported by the integration. So for example I can create and upload subject access points to AtoM, but they all go in the general subject taxonomy. I can not specify that a subject is a place/genre/name.

Anyway, thank you again for the support. I hope both AM and AtoM continue to grow and add new features.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-ato...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages