Re-installing the Storage Service with existing AIPs

Andrew Berger

unread,

Aug 20, 2014, 4:58:34 PM8/20/14

to archiv...@googlegroups.com

Hi all,

As part of my testing of various failure scenarios, I am trying to reinstall Archivematica with a set of existing AIPs. The idea here is to see if we could get it up and running again if we lost everything from an existing installation except for the files in the AIP store. I've successfully reinstalled both Archivematica and the Storage Service and copied the AIPs into the new AIP store. I've also been able to rebuild the AIP and transfer indexes for the Archival Storage tab within Archivematica.

However, the AIPs don't appear in the list of packages in the Storage Service and I can't download them or request deletion of them in Archivematica. Is there a script that will (re)generate the AIP information in the Storage Service? Or are there alternative steps I should run for a reinstall? I could re-ingest each package individually, but I'd like to avoid doing this.

Apologies if this has come up before but I've checked the wiki and google group archives and haven't seen anything about restoring an existing AIP store in 1.1.

Thanks,

Andrew

Anthony Cocciolo

unread,

Aug 25, 2014, 4:33:49 PM8/25/14

to archiv...@googlegroups.com

This is something I would be interested in as well if anyone has rebuilt their installation from scratch.

All the best,

Anthony Cocciolo

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To post to this group, send email to archiv...@googlegroups.com.
Visit this group at http://groups.google.com/group/archivematica.
For more options, visit https://groups.google.com/d/optout.

--
Anthony Cocciolo, Ed.D.
Assistant Professor
Pratt Institute, School of Information and Library Science
144 West 14th Street, Room 604D
New York, NY, 10011-7301
+1 212-647-7702
acoc...@pratt.edu
http://www.thinkingprojects.org

Justin Simpson

unread,

Aug 27, 2014, 1:15:45 AM8/27/14

to archiv...@googlegroups.com

Hi Andrew,

There isn't a script in the Archivematica codebase yet to do what you have described. I can suggest a method to use, if you already have or can make a backup before re-installating the Storage Service.

The storage service keeps a sqlite database with all the packages listed. This is stored by default at /var/archivematica/storage-service/.storage.db . Take a backup of this file, before doing a re-install. Also backup the pointer files stored by the Storage Service. The default location is /var/archivematica/storage_service/. You will find a set of nested directories, named after the uuids of the aips, in those directories there is one pointer.xml file per aip.

Assuming you have those two things backed up, and assuming you are re-installing the same version of the storage service (0.3.0), once it is installed, stop the storage service and restore the backup db and backup pointer files and restart.

to make the original backup, something like:

rsync -av /var/archivematica/storage_service /backup/location/storage_service
rsync -av /var/archivematica/storage-service/storage.db /backup/location/storage.db

Then, after you have a new storage service instance:

sudo /etc/init.d/uwsgi stop

sudo /etc/init.d/nginx stop

rsync -av /backup/location/storage.db /var/archivematica/storage-service/storage.db

rsync -av /backup/location/storage_service /var/archivematica/storage_service
sudo /etc/init.d/nginx start

sudo /etc/init.d/uwsgi start

Now when you go to the storage service, you will have the exact same internal database that your original installation had. This means that the pipeline(s) listed there will have the original pipeline uuid's. If you have reinstalled the Archivematica pipeline (dashboard, MCPServer MCPClient) you will need to update your new storage service with the correct uuid for the pipeline. Go to the Archivematica dashboard, under Administration, look at the General section and copy the pipeline uuid. Go to the Storage Service, find the pipeline and update the uuid.

This procedure is not exactly the same thing as recovering from a scenario where all you have is the AIP store. You should be backing up your pointer files as well as the AIP store. If you did not have a backup of the storage.db file, it is possible to script an update of the storage service, using the locations.models module (https://github.com/artefactual/archivematica-storage-service/blob/stable/0.3.x/storage_service/locations/models.py). It would also be possible to use sql commands to pull data from a backed up storage.db and insert it into a new storage.db.

The first approach (creating a script to rebuild a storage service db using the locations.models module) is something we would like to see happen at some point, it is not funded work at the moment, so no one at Artefactual is actively working on that, we are relying on backups of the storage service for disaster recovery.

Justin Simpson
Director of Archivematica Technical Services
www.artefactual.com
604-527-2056

--

Anthony Cocciolo

unread,

Sep 2, 2014, 1:35:44 PM9/2/14

to archiv...@googlegroups.com

Hi Justin,

Thank you for this; this is helpful. I am attempting to put together some documentation on restoring Archivematica should a failed upgrade happen. The one problem that I have is that using the below instructions on some test VMs, I end up with "Archival Storage" tab saying "Archival Storage is empty." The archival packages show up in the Storage service, just not the Archivematica dashboard, and the pipelines are the same in the storage service and in the dashboard. Maybe there is something obvious that I have overlooked.

Thanks,

Anthony

Procedures

1. Export MySQL MCP database

mysqldump --user=root --passsword=password MCP > MCP.sql

2. Backup the Storage service database and pointers (/var/archivematica directory).

sudo rsync -av /var/archivematica /backup/location/archivematica

3. Install a fresh copy of Archivematica using the instructions available from their website. Verify that the fresh installation is working.

4. Restore Archivematica files. For example:

sudo rsync -rtv /backup/location/archivematica/ /var/archivematica/

Ensure that file owner and group owner is Archivematica. Use chgrp –R or chown –R if this is not the case. Make sure read/write permissions are as they were before, and use chmod if not.

5. Restore MySQL database. For example:

mysql –user=root –password=password MCP < MCP.sql

--

Anthony Cocciolo, Ed.D.
Associate Professor

Justin Simpson

unread,

Sep 2, 2014, 2:11:39 PM9/2/14

to archiv...@googlegroups.com

Hi Anthony,

There is a step that Andrew Berger mentioned in passing, in the original email that started this thread, that is required to repopulate the the ElasticSearch indexes used by the Archivematica dashboard.

On a machine that has Archivematica 1.1.0 installed, you will find this script installed:

/usr/lib/archivematica/archivematicaCommon/utilities/rebuild-elasticsearch-aip-index-from-files

Run this script and pass it the path to your aip storage location (optionally you can pass the path to a single aip). This will rebuild the ElasticSearch index called aips. The script will fetch the aip, extract the mets file from the aip, and read the mets file, and populate your ES index.

At Artefactual, we are working on publishing the Archivematica documentation as a new git repository. It is not on github yet, but when it is, if you (or anyone else) has documentation like this, that you are working on, it would be great to get a Pull Request back, so we can include it in the public docs.

One minor caution, the location of the rebuild-elasticsearch* python scripts will change with the 1.2.0 release. We have moved those scripts and a few other useful scripts and tools into a new git repo called archivematica-devtools: https://github.com/artefactual/archivematica-devtools . We will include a packaged version of this repo with the 1.2.0 release. You can also clone this repo right now and follow the instructions in that repos readme file, for installing and using the tools. That is the better long term solution.

Justin Simpson
Director of Archivematica Technical Services
www.artefactual.com
604-527-2056

Andrew Berger

unread,

Sep 2, 2014, 10:16:41 PM9/2/14

to archiv...@googlegroups.com

Hi Justin,

I've finally been able to get back to this today and I've been running into problems with the pipeline UUIDs. I did the following:

1. Back up storage service and AIP store:

rsync -av /var/archivematica/storage_service /backup/location/storage_service

rsync -av /var/archivematica/storage-service/storage.db /backup/location/storage.db

rsync -av /var/archivematica/sharedDirectory/www/AIPsStore /backup/location/AIPsStore

2. New installation of Archivematica and Storage Service on different virtual machine. This included creating a new default pipeline with a new UUID.

3. Copy Storage Service and AIPstore files from backup to new installation. This included starting and stopping uwsgi and nginx.

rsync -av /backup/location/storage_service/ /var/archivematica/storage_service/

rsync -av /backup/location/storage.db /var/archivematica/storage-service/storage.db

rsync -av /backup/location/AIPsStore /var/archivematica/sharedDirectory/www/AIPsStore

4. Change permissions with chown so that everything is owned by archivematica.

5. Rebuild elasticsearch indexes.

At this point the Storage Service showed the original packages along with the original pipeline UUID. I only ran into a problem when I tried to update the pipeline UUID with the UUID of the new Archivematica instance. The steps I followed here were:

1. Copy the UUID from Archivematica /administration/general/
2. Edit the Pipeline UUID to replace the original UUID with the new one from Archivematica.

After making this edit, the Packages tab in the storage service stopped working and all I got was a page that said "Whoops!" I tried stopping and restarting uwsgi and nginx again but that didn't make a difference. I also re-edited the Pipeline UUID back to the original value but that didn't fix the error. Almost everything seems to work fine in Archivematica with the new UUID, though: I can download the original packages from archival storage and even ingest new packages. Only the deletion requests don't work, but that makes sense since they connect to the Storage Service.

With Debug mode turned on the Storage Service packages page reports NoReverseMatch at /packages/ . Here is the traceback:

Environment:

Request Method: GET
Request URL: http://localhost:8000/packages/

Django Version: 1.5.4
Python Version: 2.7.3
Installed Applications:
('django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.admin',
'south',
'tastypie',
'administration',
'common',
'locations')
Installed Middleware:
('django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'common.middleware.LoginRequiredMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware')

Template error:
In template /usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/storage_service/templates/snippets/packages_table.html, error at line 16
   Reverse for 'pipeline_detail' with arguments '('',)' and keyword arguments '{}' not found.
   6 :         <th>Current Location</th>
   7 :         <th>Size</th>
   8 :         <th>Type</th>
   9 :         <th>Status</th>
   10 :       </tr>
   11 :     </thead>
   12 :     <tbody>
   13 :     {% for package in packages %}
   14 :       <tr>
   15 :         <td>{{ package.uuid }}</td>
   16 :         <td><a href=" {% url 'pipeline_detail' package.origin_pipeline.uuid %} ">{{ package.origin_pipeline }}</a></td>
   17 :         <td>{{ package.full_path }}</td>
   18 :         <td>{{ package.size|filesizeformat }}</td>
   19 :         <td>{{ package.get_package_type_display }}</td>
   20 :         <td>{{ package.get_status_display }}</td>
   21 :       </tr>
   22 :     {% endfor %}
   23 :     </tbody>
   24 :   </table>
   25 :

Traceback:
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
115.                         response = callback(request, *callback_args, **callback_kwargs)
File "./locations/views.py" in package_list
45.     return render(request, 'locations/package_list.html', locals())
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/shortcuts/__init__.py" in render
53.     return HttpResponse(loader.render_to_string(*args, **kwargs),
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/loader.py" in render_to_string
177.         return t.render(context_instance)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
140.             return self._render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in _render
134.         return self.nodelist.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
830.                 bit = self.render_node(node, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/debug.py" in render_node
74.             return node.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/loader_tags.py" in render
124.         return compiled_parent._render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in _render
134.         return self.nodelist.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
830.                 bit = self.render_node(node, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/debug.py" in render_node
74.             return node.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/loader_tags.py" in render
63.             result = block.nodelist.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
830.                 bit = self.render_node(node, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/debug.py" in render_node
74.             return node.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/defaulttags.py" in render
285.                 return nodelist.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
830.                 bit = self.render_node(node, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/debug.py" in render_node
74.             return node.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/loader_tags.py" in render
156.         return self.render_template(self.template, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/loader_tags.py" in render_template
138.         output = template.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
140.             return self._render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in _render
134.         return self.nodelist.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/base.py" in render
830.                 bit = self.render_node(node, context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/debug.py" in render_node
74.             return node.render(context)
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/defaulttags.py" in render
189.                         nodelist.append(node.render(context))
File "/usr/share/python/archivematica-storage-service/lib/python2.7/site-packages/django/template/defaulttags.py" in render
426.                         raise e

Exception Type: NoReverseMatch at /packages/
Exception Value: Reverse for 'pipeline_detail' with arguments '('',)' and keyword arguments '{}' not found.

Should I be creating a new pipeline for the new Archivematica UUID instead of trying to update the original one? I went through these same steps a second time with a different installation but in this case I waited until after I restored the Storage Service and AIP store to set up the new default Archivematica pipeline. In that case I ended up with two pipeline UUIDs in the Storage Service - the original one for the original files and then a new one for new packages - but everything seems to work. I could see a preservation case for keeping the original files associated with the original pipeline UUID (even if it's no longer in use), since disaster recovery is the kind of event you would want to have a record of and that's one way of keeping track of which packages came from an earlier system. But of course it's possible to record that information outside of the system as well.

Alternatively, should we be doing what Anthony is doing and restoring from a backup of everything, including the original MCP database? In a real world case we are likely to have a full backup since we won't be deliberately excluding the MCP database or anything in /var/archivematica from backup; I've just been testing scenarios where that hasn't happened.

Thanks,
Andrew

Anthony Cocciolo

unread,

Sep 5, 2014, 11:10:45 AM9/5/14

to archiv...@googlegroups.com

Hi Justin,

Thank you, that is very helpful.

All the best,

Anthony

Sarah Romkey

unread,

Sep 5, 2014, 1:01:03 PM9/5/14

to archiv...@googlegroups.com

Hello Andrew and Anthony,

This issue sparked some internal discussion about how we think the Storage Service should behave in these situations.

We would not consider it good practice to apply a new UUID for an existing pipeline the storage service. The METS file currently does not store the UUID of the pipeline that an AIP was processed through, which we could consider for a future development. In the meantime, it seems like it would be best practice to maintain "old" pipelines even when they are not in use any longer, and then use new a new pipeline with the corresponding new UUID from Archivematica to continue processing. Our reasons for this recommendation are:

1. There will be scenarios where institutions have multiple pipelines going at one time to accommodate different processing configurations, and if they start to get renamed it may be hard to know what configuration was used for which AIPs. Further, it would be good practice for the institution to maintain a record outside of Archivematica of what pipeline UUIDs are used for what kind of processing.

2. It wouldn't be great for users to head in to the storage service and change the UUID of a pipeline for no particular reason. This is a policy issue for institutions to manage but we feel it's opening the doors to potential mishandling.

We're going to give some thought to this at Artefactual for future releases of the storage service, and of course would be interested to hear thoughts from community members.

Cheers,

Sarah Romkey, MAS,MLIS

Systems Archivist
Artefactual Systems
604-527-2056

@ArchivesSarah

Andrew Berger

unread,

Sep 15, 2014, 3:25:00 PM9/15/14

to archiv...@googlegroups.com

Hi Sarah,

(Apologies for the delay in my response.) Thank you all for taking the time to consider the possible approaches institutions could take with the storage service here. I agree that it makes the most sense to reserve the "old" pipeline UUID for the recovered AIPs and then use a new pipleine UUID to resume processing, for the reasons you've given.