Subject terms do not appear in subject taxonomy

220 views
Skip to first unread message

jthi...@bethelks.edu

unread,
Nov 11, 2019, 3:26:02 PM11/11/19
to AtoM Users
Using Atom 2.2.0

I imported several xml EAD files. The descriptions imported OK but the subject terms that show up in the archival description do not show up in Subject Browse. And if I try to edit a subject from the archival description, I get a completely different set of fields than if I edit a subject directly from the subject browse list. What gives?

Dan Gillean

unread,
Nov 12, 2019, 5:59:47 PM11/12/19
to ICA-AtoM Users
Hi John, 

There are a couple reasons why subjects in an EAD XML file import might not show up in AtoM. The first reason might simply be an indexing issue - I would suggest you try reindexing your site, and clearing the application cache. From AtoM's root installation directory (generally /usr/share/nginx/atom if you followed our recommended installation instructions), run: 
  • php symfony cc
  • sudo service php5-fpm restart
  • php symfony search:populate
Don't forget that your browser also has a cache, so before checking, either clear your browser cache or else test in an incognito/private browser window, where the cache is generally disabled by default. 

If that doesn't fix the issue, then it may depend on how your EAD XML is structured. EAD 2002 is a very flexible standard, meaning there are often several different ways to represent the same metadata - all of which are valid via the EAD 2002 standard, but not all of which are necessarily supported by AtoM. 

The best way to see what AtoM expects on import for EAD mappings is to create a dummy record in your AtoM instance, filling in every field, and then export it. I sometimes do this by adding the standard rule name and number to each field, to make the crosswalking easier when looking at the EAD output. 

We have added information on the mappings here as well: 
So, for subject access points in an EAD 2002 XML file, AtoM expects to find them like so: 

<controlaccess>
   <subject>subject term goes here</subject>
   <geogname>place term would go here</geogname>
</controlaccess>

If your subject terms are structured differently in the EAD file, they may not have imported as expected. 

As for why you have different fields when navigating to Subjects and entering edit mode, as opposed to clicking edit on an archival description: 

Terms and Information objects (more commonly known as Descriptions) are two of many different entity types in AtoM. So a term is a different entity in AtoM than a description, and has different fields. You can learn more about common entity types in AtoM here: 


The fields in the archival description are drawn from the relevant standard - the default template is ISAD(G), so unless you've changed the default, that's where most of the fields in the description edit page are taken from. Subjects (and all other terms) are modeled after the fields found in the SKOS standard. 

When you enter an access point on a description, you are either linking to an existing term in the Subjects taxonomy, or (if no match is found), creating a new stub term on the fly - which is linked to the description as an access point. By "stub" I mean that in the description template (or in a CSV template, or an EAD file for that matter), the only field you get is the authorized form of name field for the term - but if you navigate to the entity itself (in this case, the term view page in particular taxonomy), you get the full field list. 

The same is true for creators and other entities - a creator name (or a name access point) in an EAD file will either link to an existing authority record, or it will create a new stub record, with only the authorized form of name populated, so you can supplement it later (actually with an EAD import, you might also get entity type and history populated, depending on your EAD). You can read more about EAD imports and authority records here: 
Hope that helps! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


On Mon, Nov 11, 2019 at 3:26 PM <jthi...@bethelks.edu> wrote:
Using Atom 2.2.0

I imported several xml EAD files. The descriptions imported OK but the subject terms that show up in the archival description do not show up in Subject Browse. And if I try to edit a subject from the archival description, I get a completely different set of fields than if I edit a subject directly from the subject browse list. What gives?

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/bf5f1ae0-b10c-456a-a389-2ae5c0af2719%40googlegroups.com.

John Thiesen

unread,
Nov 14, 2019, 5:10:58 PM11/14/19
to ica-ato...@googlegroups.com
I'll have to study this some more. I think some of the "subject" terms that I was looking at were actually in the Authority listing because they were organizations. So, a personal papers collection contains papers about X Mennonite Church, so X Mennonite Church is a subject heading for that collection. But X Mennonite Church is an organization, so it doesn't show up in the subject list but in the Authority list. Does that make sense?

But I should also do the re-indexing.


John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS


You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/skEzQUr5PJ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZJuXVGm9HnShK6ooCgRGFf%3DcEj9Zh3XqpFrXVdRePyTgQ%40mail.gmail.com.

Dan Gillean

unread,
Nov 14, 2019, 5:32:09 PM11/14/19
to ICA-AtoM Users
Hi John, 

Now that I've seen your response in your other thread and I understand that you're importing EAD files generated by Archon - then yes, this is another one of those particularities. 

In AtoM, authority records are separate entities from descriptions, intended to be reused. So a name access point in AtoM (i.e. when a name is linked as a subject, rather than as a creator) is still a link to an authority record - so users can learn more about that person, family, or organization, and see other records where they are linked as either a subject or creator. 

With Archon, there appears to be no relation between creator records and name access points - names are treated as subjects. However, in the EAD export, they are added to the <controlaccess> element using EAD name tags, such as <persname>, <famname>, <corpname>, or just <name>. 

On import into AtoM, AtoM will see the name tags, and will look for a matching authority record - if one is not found, it will create a new stub authority record, which can them be supplemented later. You can read more about AtoM 2.2's default behavior with names and EAD imports here: 
During a migration, the difference between Archon and AtoM's treatment is compounded, because Archon users rarely use the exact same form of name for their creator records and their name (subject) access points. Consequently, you can end up with duplicate authority records in AtoM - for example, a "Smith, John" authority record from the creator name, and a second "John Smith" from the name access point. 

Ideally, you would at least normalize the names across your Archon subjects and creators prior to migration, to reduce this duplication. Some institutions have chosen to suppress the import of name access points during migrations, choosing to review and manually re-add them later, once they have cleaned up their authority records. Others have chosent to change all those name elements in the <controlaccess> section of the EAD record into <subject> elements, so they import directly into the Subjects taxonomy (which at least avoids authority record duplication). 

Hopefully with this information, you can find your data, and be better equipped for a migration to AtoM, should you choose to go that way! 

In the meantime, I recommend that you take a moment to make yourself a crosswalk. Fill out every field in AtoM with the field name and related standard number, and export. Do the same thing in Archon. Then you have 2 docs you can compare to see how each application maps its data in EAD, and where you might need to make changes prior to impot. EAD 2002 is a very flexible standard, so there's more than 1 way to represent data that is valid against the spec - but both Archon and AtoM have chosen one specific way to implement the mappings. Being aware of the differences will save you a lot of headaches later! 

Good luck,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Nov 25, 2019, 2:43:18 PM11/25/19
to ica-ato...@googlegroups.com
I've been studying this some more and I think I more or less understand what I was seeing in my original question, about terms not showing up in the subject taxonomy.

But I also have a different problem related to subject terms. I uploaded 10 small EAD files as a batch through the command line. For each of those, the subject entries cumulated. EAD1 uploaded with a couple of subjects, EAD2 uploaded and shows its own subjects plus those of EAD1. And so on, so that EAD10 shows the subjects for all of EAD1-9.

I've cleared the cache and and run search:populate.

Any suggestions about this?


John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Nov 27, 2019, 11:26:22 AM11/27/19
to ICA-AtoM Users
Hi John, 

I have searched through our issue tracker (and my memory), and I couldn't find or recall any bug reports related to this specific issue. That said, we have fixed a lot of bugs and added a lot of enhancements since the 2.2 release, so if you are able to upgrade, that would be my first suggestion. 

 Any further details on the exact steps you have followed to reproduce this behavior would be helpful.  For example, did you import each individually, or did you point the command-line task at a directory that had all 10 files in it? Did you use any options with the task? Did the console provide any warnings or error messages during the import?

Have you looked in the EAD files themselves? I am aware of an Archon bug that, when the Administrator's bulk export option is used, every subject is applied to all exports. So, there's a chance that the problem is the EAD files themselves, although this sounds a bit different.

One more task you could try running is the build nested sets task. It's possible that the nested set (used to maintain hierarchical order and relations) became corrupted at some point, which is affecting the display. Try: 
I would recommend you run the other tasks (clear cache; restart PHP-FPM; repopulate the search index) again after, just to be sure. 

Finally, don't forget to clear your browser cache as well! It's possible that this is affecting what you're seeing. You can always test in a new Private / Incognito browser window, where the browser cache is usually disabled by default. 

If that still doesn't help, then I wonder if I might take a look at the EAD files themselves, to see if there's anything specific in them causing the behavior. If yes, feel free to message me with some samples off-list, and I can try out a local test to see if I can reproduce what you're seeing. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Nov 27, 2019, 3:46:04 PM11/27/19
to ica-ato...@googlegroups.com
Yes, I discovered a short while ago that it's the Archon bug you mention here. Do you know if there's any way around that?

 In my case, it didn't apply every subject to all exports, it just cumulated them. But the problem does show up in the xml files upon output from Archon using the administrator bulk export.

John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Nov 27, 2019, 4:02:15 PM11/27/19
to ICA-AtoM Users
Hi John, 

Apparently some of the other export options, such as using the command-line, or else individually exporting via the UI, do not have this same bug, so you might try those. 

One of the clients we did a migration for created their own export scripts. If nothing else works for you, let me know - I can try reaching out to them and seeing if they are willing to share the script for your use. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Nov 27, 2019, 4:07:30 PM11/27/19
to ica-ato...@googlegroups.com
Yes I've tried the individual UI export and already know that it works cleanly. I have potentially hundreds of collections to export, so I'll need some kind of bulk export. I'll look into command line options. But I would be happy to see somebody's export scripts, too, for potential ideas. Thanks for your help.

John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

John Thiesen

unread,
Dec 3, 2019, 5:45:30 PM12/3/19
to ica-ato...@googlegroups.com
I couldn't find a command line EAD export from Archon. But the individual EAD export works, so I wrote myself a Python program that calls that individual export script over and over to bulk export EADs. I tried a batch of 100 and it seemed to work, as far as I can tell so far. You might want to recommend this route to anyone you run into in the future who wants to migrate from Archon.


John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Dec 4, 2019, 11:02:46 AM12/4/19
to ICA-AtoM Users
Hi John, 

That's great! I'm still coordinating the hand-off of the script I was talking about but it was very similar - it was a script written in Go that would essentially repeatedly run a single EAD export for each collection, though I think it did so via the user interface if I recall correctly. 

Through the migrations we have carried out for Archon users recently, Artefactual has been assembling a series of scripts to transform the exported EAD XML to better conform to AtoM's expectations. We are still doing some clean up, but hope to make this publicly available soon, via our Artefactual Labs GitHub account. We're calling it the AtoM Archon Toolkit. 

My hope is to add the Go script there, once we've gotten a copy and our devs have had a chance to review it and add some simple instructions. However, if you would like to share your Python script with us for possible inclusion in the Toolkit as well, that would be very welcome! You could also put it up somewhere (in one of your own code repositories, or as a gist) and we could link out to it, if preferred. 

Either way, I'm glad to hear you've been making progress. I will update this thread when I have more news on the availability of the Toolkit. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Dec 6, 2019, 11:38:37 AM12/6/19
to ica-ato...@googlegroups.com
here's the Python script. You can place wherever is easiest to find. Are there any notes somewhere about what aspects of Archon EAD don't transfer well to Atom?


Exported from Notepad++
# try to automate getting EAD in bulk out of Archon import requests #need a loop thru the id numbers, count them, stop if you accumulate 10 bad ones coll_count = 1 bad_count = 0 write_count = 0 while (coll_count < 101 and bad_count < 10): #this does 100 collection ids; change to do more or fewer url = "http://mac.libraryhost.com/?p=collections/ead&id=" + str(coll_count) + "&templateset=ead&disabletheme=1&output=collection" + str(coll_count) response = requests.get(url) if (response.status_code == requests.codes.ok): f = open("collection" + str(coll_count) + ".xml", "wb") f.write(response.content) f.close() write_count += 1 #print (response.content) else: print ("Collection "+ str(coll_count) + " returns status code " + response.status_code + "\n") bad_count += 1 coll_count += 1 print (str(coll_count - 1) + " collections processed\n") print (str(write_count) + " xml files written\n") print (str(bad_count) + " bad id numbers\n")

















John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Dec 6, 2019, 4:45:17 PM12/6/19
to ICA-AtoM Users
Hi John, 

Thanks so much for sharing this! I'll have our developers take a look. I'm not a programmer myself, but one change I can see us easily making is adding a variable for the URL, which the user can enter at the top of the script. Otherwise, this looks very helpful. 

We don't have a compiled list of known issues for Archon to AtoM EAD migrations, but I will try to summarize some of the ones I can remember, that you should watch out for. I've already outlined one important difference in this thread (around how name access points are handled in Archon vs AtoM), and I believe in another thread with you I've shared some links to previous posts about Archon in the forum that relate to physical storage data - check those out if you haven't already. In general I'd suggest you  try searching "Archon" in our forum and reading some of the threads from the last couple of years - anything older than that is probably out of date at this point. 

Some other issues of note: 

Differences in EAD header information that can prevent import

AtoM uses the DTD maintained by the Library of Congress for EAD validation, while Archon appears to use the XSD, stored locally in Archon. This can sometimes cause AtoM to fail importing an Archon record without editing the EAD header. 

An Archon export example: 

<ead audience="external" 
     xmlns="urn:isbn:1-931666-22-9"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://some-archon-installation.example.com/guides/packages/collections/lib/

AtoM's expected EAD header: 

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd">
<ead>


The most important part is stripping out the attributes from the opening <ead> element. AtoM keeps a copy of the EAD 2002 DTD stored locally, so even without the opening xml element and doctype declaration, the file should import. 

<eadid> @id value importing as the description identifier in AtoM

I think this one might be something we should change in AtoM's import mappings, but it is currently an unresolved issue, so worth noting. Archon puts the collection identifier into the EAD header in the <eadid> element using the @identifier attribute. On import, AtoM adds this in the Control area as the description identifier. In some cases this was an Archon internal ID, so if you don't want this added to your descriptions, you might want to strip it out of the header. 

Archon internal database IDs exporting as lower-level identifiers

It seems that many Archon users don't add identifiers to their item and file level descriptions. However, on export, Archon includes the internal database ID - and it codes these in the EAD XML as <unitid> values, which AtoM interprets as identifiers, meaning they are imported and shown to end users. 

Blank intermediary levels of description

Some Archon exports we've seen (but not all - it may depend on how Archon is used internally) will organize all item level records under a blank <c> level node with nothing but a title or identifier value beneath it, which can lead to extra and unnecessary levels of description in AtoM. A generalized example: 

 <dsc type="combined">

   <head>Detailed List of Contents</head>

   <c01 level="Item list">

     <did>

       <unitid label="ArchonID" audience="internal">id12344</unitid>

       <unittitle>ID001</unittitle>

      </did>

   <c02 level="Item">

     <did>

       <unitid label="ArchonID" audience="internal">id12345</unitid>

       <container type="Item">ID001</container>

       <unittitle>Some photo title here</unittitle>

       <unitdate>ca. 1931</unitdate>

       <unitid label="UnitID">ID001</unitid>

<physdesc label="Physical Description">Photographic print : b&amp;w ; 21 x 26 cm.</physdesc>

     </did>

  </c02>


These would be fine to leave in, but can be confusing for end users when they show up in search / browse results

No controlled date values at lower levels

Take a look at the <unitdate> in the example above. AtoM imports this as a free-text date - but AtoM also has 2 controlled fields for start and end date, which expect ISO 8601 formatted values (i.e. YYYY-MM-DD, YYYY-MM, or YYYY), and which AtoM uses to support date range searching. Archon does not seem to include these - AtoM looks for them in the @normal attribute of the <unitdate> element, like so: 

       <unitdate normal="1931-01-01/1931-12-31" encodinganalog="3.1.3">ca. 1931</unitdate>


Again, not having these won't cause an import to fail, but it will reduce the utility of searching in AtoM - you can't do date range searches if there are no controlled values added. 

Collection level dates nested in the title

At the collection level, Archon seems to nest the <unitdate> element inside of the collection <unittitle>. AtoM will import the whole string, but is not expecting to find the dates in the title, so essentially the dates are appended as string data to part of the title, and no collection date is added. You need to move the <unitdate> element outside of the title (or copy it if you like having the dates of creation as part of the title) for the collection-level dates to import properly. 

Series level descriptions sometimes not included in the Archon EAD export

We've seen some cases where metadata that appears in the Archon user interface for intermediary levels, such as a scope and content statement for a series, do not appear in the exported EAD at all. Make sure you check some samples of your records for this! 

No preferred citation mapping

Archon adds citation data to <prefercite> - but because AtoM was originally based around ISAD(G) and ISAD doesn't have a specific field for this, there's no import mapping for this in AtoM and it fails to import. During migrations we typically concatenate this into another field, such as <userestrict> (which maps to DACS  4.4.5 Conditions governing reproduction and use), with a line break and a label preceding it such as: "Preferred citation: "

There are likely more, but those are some of the key ones I can think of! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Dec 10, 2019, 7:18:42 PM12/10/19
to ica-ato...@googlegroups.com
Today all of a sudden I'm getting the following when I click on almost all links: Any advice as to what might be wrong? I haven't changed anything that I can tell. It was working fine a couple of days ago and today when I went to do some more work studying uploads of EAD, I got these messages immediately when I clicked on archival descriptions, Authority records, archival institutions, etc. Since this is just a test instance, I'm the only one using it and like I said, it worked fine the last time I accessed it and today no longer works.

'

 Oops! An Error Occurred

Sorry, something went wrong.
The server returned a 500 Internal Server Error.

Try again a little later or ask in the discussion group.
Back to previous page.




John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Dec 11, 2019, 10:22:48 AM12/11/19
to ICA-AtoM Users
Hi John, 

Any time you encounter a 500 error, the first thing we recommend doing is checking the webserver error logs for more information. If you've followed our recommended installation instructions  and used Nginx, you can do so with the following command: 
Feel free to share any relevant error message you find here in the forum - hopefully it will provide us with a bit more insight as to what's happening. Any other information you can provide (remind me of your installation environment details; any actions you were taking before the errors started, etc) would help as well! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Dec 11, 2019, 10:37:09 AM12/11/19
to ica-ato...@googlegroups.com
Here's what it shows:

2019/12/11 14:41:19 [error] 1159#0: *851438 FastCGI sent in stderr: "PHP message: Couldn't connect to host, Elasticsearch down?" while reading response header from upstream, client: 66.249.79.156, server: _, request: "GET /index.php/mennonites-ukraine?sf_culture=en&listPage=2&languages=en&onlyDirect=1&limit=10&sort=alphabetic&listLimit=10 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.atom.sock:", host: "52.33.10.2"
2019/12/11 15:00:15 [error] 1159#0: *851440 FastCGI sent in stderr: "PHP message: Couldn't connect to host, Elasticsearch down?" while reading response header from upstream, client: 66.249.79.157, server: _, request: "GET /index.php/mennonites-canada?listPage=2&sort=lastUpdated&sf_culture=en&onlyDirect=1&limit=10&listLimit=10 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.atom.sock:", host: "52.33.10.2"
2019/12/11 15:09:43 [error] 1159#0: *851442 FastCGI sent in stderr: "PHP message: Couldn't connect to host, Elasticsearch down?" while reading response header from upstream, client: 66.249.79.155, server: _, request: "GET /index.php/mennonites-louisiana?sort=lastUpdated&listPage=2&sf_culture=pt&limit=10&listLimit=10 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.atom.sock:", host: "52.33.10.2"
2019/12/11 15:53:50 [error] 1159#0: *851464 FastCGI sent in stderr: "PHP message: Couldn't connect to host, Elasticsearch down?" while reading response header from upstream, client: 70.183.128.246, server: _, request: "GET /index.php/informationobject/browse HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.atom.sock:", host: "52.33.10.2", referrer: "http://52.33.10.2/"



John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Dec 11, 2019, 11:43:02 AM12/11/19
to ICA-AtoM Users
Hi John, 

I'd guess that Elasticsearch has crashed. Below I'll share some suggestions for checking ES status, and potential troubleshooting - I've tried to gather up all the info we've previously shared about Elasticsearch troubleshooting into one reply for you. Let's start by gathering some information. 

You can check if it's running with the following, as well as stop and restart it, in Ubuntu 16.04 and 18.04 installations: 
  • sudo systemctl stop elasticsearch
  • sudo systemctl enable elasticsearch
  • sudo systemctl start elasticsearch
  • sudo systemctl status elasticsearch
  • sudo systemctl restart elasticsearch
I would suggest first running the status command to see if ES is running or not. You could also try stopping and restarting ES to see if that helps. 

If you are running AtoM 2.5 or later with Elasticsearch 5.6, you can also run the following command from AtoM's root installation directory (/usr/share/nginx/atom if you have followed our recommended installation instructions) to get more information about your ES instance. Please share the task output if running 2.5:
  • php symfony search:status
The following commands can also be used to check the version and status of Elasticsearch directly.  Please share the output of the following commands - these commands use cURL (so you can run sudo apt-get install curl to install it if it's not already installed).  Run from the server where ES is installed and using the default port, you could try: 
Note that AtoM installations will generally show a yellow (warning) status for shard health. For now, I wouldn't worry too much about the yellow health status - I think it's primarily because we're only using one node, and ES is warning you that this could involve data loss if the node goes down. According to the ES docs, "yellow means that the primary shard is allocated but replicas are not." Since ES is not our primary data store and it's easy to repopulate, I don't think this is critical.  See: 
We might as well also check to make sure we have the correct ES and Java versions installed. With AtoM 2.5 on Ubuntu 16.04, this should be: 
  • Elasticsearch 5.6
  • openjdk version "1.8.0_212"
You can check what version of Java you have installed with the following: 
  • java -version
When I run this in my Vagrant box, here is what I get in return:

openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)


If you don't have the right Java version, you may need to look into how to fully remove the old version, but here is how you can install the Java 8 version we need: 
  • sudo add-apt-repository ppa:openjdk-r/ppa
  • sudo apt-get update
  • sudoapt install openjdk-8-jre-headless software-properties-common
Note that Java version 1.8.0_131 or later is recommended for ES 5.6, as per this documentation

Finally, if none of the above has helped resolve the issue, the next thing you could try is to see if the ES logs have any more information than what you've found in your stack trace. In our recommended Ubuntu installation instructions, the Elasticsearch (ES) log is normally located in /var/log/elasticsearch/elasticsearch.log and you could check there to see if there's more information available. Try doing some web searches with the error message to see if you can find further suggestions online -  and remember to make sure they are for the correct version of ES! 

Now, the main issue:

In most cases when we see issues with Elasticsearch crashing, it is due to a lack of available server memory. Specifically, ES starts with a default Java heap size value of 2GB, and this value should never be more than half of your available system memory - so your AtoM server should have at least 4GB of memory avaiable. Our requirements page in the documentation includes some notes on what we use when deploying to hosted instances (such as via OVH or DigitalOcean, two external hosting providers we have used in the past for AtoM). There, we list the following recommendations:
  • Processor: 2 vCPUs @ 2.3GHz
  • Memory: 7GB
  • Disk space (processing): 50GB at a minimum for AtoM’s core stack plus more storage would be required for supporting any substantial number of digital objects.
If you can allocate more RAM to your server, this might help solve the issue. In some cases, we've still seen people need to adjust the default heap size - but again, remember, this shouldn't be set higher than half of the total available RAM on the server. If you do want to try to adjust this, I've found the following links to help guide you with tuning ES:
There is also this StackOverflow thread on adjusting heap size: 
Note that when I asked one of our developers about the best way to make JVM heapsize changes, they told me the following: 

"You can try setting the JVM heap size, but we have never had any luck with actually reducing the memory requirements of ES. Increasing the memory available to ES to at least 2GB or more has always been the better solution in our experience."  

As i've previously said, I think this should be minimum 4GB or more total memory, since the default JVM heap size in ES is 2GB. 

Some other forum threads that might be handy: 

If you need to remove and reinstall ES (for example, if you have the wrong version, or multiple versions running: 
In this thread, a user shared how they adjusted the heap size: 

Finally, once you've gathered info, restarted the ES service, and added more RAM if needed, you may want to clear the application cache and repopulate the search index. You can do so with the following, run from AtoM's root installation directory: 
If it is ever needed (for example, if the search:populate command doesn't work for some reason), you can also manually delete AtoM's search index with the following (assuming you have named your ES index atom as we recommend in our installation docs): 
  • curl -XDELETE 'localhost:9200/atom'
You could then try clearing the cache and running the search:populate task again. If it still doesn't work, it may be because ES is down - and again, that is most likely to be a memory issue, so look into that first if you've tried restarting the service and it won't stay up!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

John Thiesen

unread,
Dec 12, 2019, 4:19:43 PM12/12/19
to ica-ato...@googlegroups.com
It was the case that Elasticsearch was stopped and it was because of not enough RAM. I restarted it and got it to stop again by trying to delete a really large finding aid I had uploaded from EAD. I decided not to try to re-size my Amazon EC2 instance at this point because I'm not sure if that would cause me to have to reinstall Atom. (It appears that in some cases re-sizing just leaves you with a new blank Ubuntu instance.) I'll wait until I'm done figuring out the various quirks in the EAD upload before I try to move to a new instance. Thanks for your help.

John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

John Thiesen

unread,
Dec 12, 2019, 4:54:37 PM12/12/19
to ica-ato...@googlegroups.com
Thanks for sending this list. It's helpful to look over. I've noted several of these already, so this gives me some confidence that maybe there aren't other significant quirks in the Archon-EAD-Atom path that I need to compensate for yet. 

The most noticeable one in my opinion is the Archon internal database ID showing up in <unitid>. I'm working on a Python program to fix that and maybe I'll throw in a few of these others also, such as the collection-level unitdate issue.


John D. Thiesen

Archivist, Co-director of Libraries

Mennonite Library and Archives

Bethel College

North Newton, KS

Dan Gillean

unread,
Dec 13, 2019, 11:07:41 AM12/13/19
to ICA-AtoM Users
Hi John, 

It's a little late for your needs, but I did manage to get access to the client-created script I mention - he's helpfully put it up with documentation in the following repository: 
Hopefully this will be useful to others - we'll be referencing it in our Archon Toolkit when we are able to make that publicly available. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

Reply all
Reply to author
Forward
0 new messages