Dear Dan,
I am grateful to inform that we have been able to integrate AtoM and Archivematica in docker!
I'll document the steps we've taken here. There's a lot of room for improvement, but I feel I should document this procedure as-is nonetheless, since it was our first successful attempt out of many many failures.
First, a side note regarding Karl's reply: we are not using Alpine Linux as host system, we are using Ubuntu 16. When we mentioned Alpine Linux before, we were referring to the image on which AtoM's dockerfile is based, as stated here.
The steps below should be performed immediately before creating the containers (before "docker-compose up -d"), that is, after cloning the repositories and their submodules.
1. Create a bridged network to connect AtoM and Archivematica:
docker network create -d bridge atom-am
2. Create a volume to send DIP packages from Archivematica to AtoM:
docker volume create dip-uploads
3. Change file am/src/archivematica/MCPServer.Dockerfile, adding this to the end, before "USER archivematica":
RUN set -ex \
&& mkdir -p /var/dip-uploads \
&& chown -R archivematica:archivematica /var/dip-uploads
Note: this is necessary because we'll mount the "dip-uploads" volume on this folder, and if this action is not performed then docker mounts the volume having root as owner. Since Archivematica runs as unpriviledged "archivematica" user, in that case it is not able to write to the mounted volume (even thought the volume is mounted with "read-write" mode).
4. Change file am/compose/docker-compose.yml adding the external network and volume. Complete docker-compose.yml file below, with changes highlighted (please see notes below regarding changes):
---
version: "2.1"
volumes:
# Internal named volumes.
# These are not accessible outside of the docker host and are maintained by
# Docker.
mysql_data:
elasticsearch_data:
archivematica_storage_service_staging_data:
# External named volumes.
# These are intended to be accessible beyond the docker host (e.g. via NFS).
# They use bind mounts to mount a specific "local" directory on the docker
# host - the expectation being that these directories are actually mounted
# filesystems from elsewhere.
archivematica_pipeline_data:
external:
name: "am-pipeline-data"
archivematica_storage_service_location_data:
external:
name: "ss-location-data"
dip_uploads:
external:
name: "dip-uploads"
networks:
common:
atom_am:
external:
name: "atom-am"
services:
mysql:
image: "percona:5.6"
user: "mysql"
environment:
MYSQL_ROOT_PASSWORD: "12345"
volumes:
- "mysql_data:/var/lib/mysql"
ports:
- "127.0.0.1:62001:3306"
networks:
- "common"
elasticsearch:
environment:
- network.host=0.0.0.0
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- "elasticsearch_data:/usr/share/elasticsearch/data"
ports:
- "127.0.0.1:62002:9200"
networks:
- "common"
redis:
image: "redis:3.2-alpine"
command: '--save "" --appendonly no' # Persistency disabled
user: "redis"
ports:
- "127.0.0.1:62003:6379"
networks:
- "common"
gearmand:
image: "artefactual/gearmand:1.1.17-alpine"
command: "--queue-type=redis --redis-server=redis --redis-port=6379"
user: "gearman"
ports:
- "127.0.0.1:62004:4730"
links:
- "redis"
networks:
- "common"
fits:
image: "artefactual/fits-ngserver:0.8.4"
ports:
- "127.0.0.1:62005:2113"
volumes:
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:rw" # Read and write needed!
networks:
- "common"
clamavd:
image: "artefactual/clamav:latest"
environment:
CLAMAV_MAX_FILE_SIZE: "${CLAMAV_MAX_FILE_SIZE}"
CLAMAV_MAX_SCAN_SIZE: "${CLAMAV_MAX_SCAN_SIZE}"
CLAMAV_MAX_STREAM_LENGTH: "${CLAMAV_MAX_STREAM_LENGTH}"
ports:
- "127.0.0.1:62006:3310"
volumes:
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:ro"
networks:
- "common"
nginx:
image: "nginx:stable-alpine"
volumes:
- "./etc/nginx/nginx.conf:/etc/nginx/nginx.conf:ro"
- "./etc/nginx/conf.d/archivematica.conf:/etc/nginx/conf.d/archivematica.conf:ro"
- "./etc/nginx/conf.d/default.conf:/etc/nginx/conf.d/default.conf:ro"
ports:
- "62080:80"
- "62081:8000"
networks:
- "common"
archivematica-mcp-server:
build:
context: "../src/archivematica/src"
dockerfile: "MCPServer.Dockerfile"
environment:
DJANGO_SECRET_KEY: "12345"
DJANGO_SETTINGS_MODULE: "settings.common"
ARCHIVEMATICA_MCPSERVER_CLIENT_USER: "archivematica"
ARCHIVEMATICA_MCPSERVER_CLIENT_PASSWORD: "demo"
ARCHIVEMATICA_MCPSERVER_CLIENT_HOST: "mysql"
ARCHIVEMATICA_MCPSERVER_CLIENT_DATABASE: "MCP"
ARCHIVEMATICA_MCPSERVER_MCPSERVER_MCPARCHIVEMATICASERVER: "gearmand:4730"
ARCHIVEMATICA_MCPSERVER_SEARCH_ENABLED: "${AM_SEARCH_ENABLED:-true}"
volumes:
- "../src/archivematica/src/archivematicaCommon/:/src/archivematicaCommon/"
- "../src/archivematica/src/dashboard/:/src/dashboard/"
- "../src/archivematica/src/MCPServer/:/src/MCPServer/"
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:rw"
- "dip_uploads:/var/dip-uploads:rw"
links:
- "mysql"
- "gearmand"
networks:
- "common"
- "atom_am"
archivematica-mcp-client:
build:
context: "../src/archivematica/src"
dockerfile: "MCPClient.Dockerfile"
environment:
DJANGO_SECRET_KEY: "12345"
DJANGO_SETTINGS_MODULE: "settings.common"
NAILGUN_SERVER: "fits"
NAILGUN_PORT: "2113"
ARCHIVEMATICA_MCPCLIENT_CLIENT_USER: "archivematica"
ARCHIVEMATICA_MCPCLIENT_CLIENT_PASSWORD: "demo"
ARCHIVEMATICA_MCPCLIENT_CLIENT_HOST: "mysql"
ARCHIVEMATICA_MCPCLIENT_CLIENT_DATABASE: "MCP"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_ELASTICSEARCHSERVER: "elasticsearch:9200"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_MCPARCHIVEMATICASERVER: "gearmand:4730"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_SEARCH_ENABLED: "${AM_SEARCH_ENABLED:-true}"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CAPTURE_CLIENT_SCRIPT_OUTPUT: "${AM_CAPTURE_CLIENT_SCRIPT_OUTPUT:-true}"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_SERVER: "clamavd:3310"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_CLIENT_MAX_FILE_SIZE: "${CLAMAV_MAX_FILE_SIZE}"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_CLIENT_MAX_SCAN_SIZE: "${CLAMAV_MAX_SCAN_SIZE}"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_CLIENT_MAX_STREAM_LENGTH: "${CLAMAV_MAX_STREAM_LENGTH}"
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_CLIENT_BACKEND: "clamdscanner" # Option: clamdscanner or clamscan;
volumes:
- "../src/archivematica/src/archivematicaCommon/:/src/archivematicaCommon/"
- "../src/archivematica/src/dashboard/:/src/dashboard/"
- "../src/archivematica/src/MCPClient/:/src/MCPClient/"
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:rw"
- "dip_uploads:/var/dip-uploads:rw"
links:
- "fits"
- "clamavd"
- "mysql"
- "gearmand"
- "elasticsearch"
- "archivematica-storage-service"
networks:
- "common"
- "atom_am"
archivematica-dashboard:
build:
context: "../src/archivematica/src"
dockerfile: "dashboard.Dockerfile"
environment:
FORWARDED_ALLOW_IPS: "*"
AM_GUNICORN_ACCESSLOG: "/dev/null"
AM_GUNICORN_RELOAD: "true"
AM_GUNICORN_RELOAD_ENGINE: "auto"
DJANGO_SETTINGS_MODULE: "settings.local"
ARCHIVEMATICA_DASHBOARD_DASHBOARD_GEARMAN_SERVER: "gearmand:4730"
ARCHIVEMATICA_DASHBOARD_DASHBOARD_ELASTICSEARCH_SERVER: "elasticsearch:9200"
ARCHIVEMATICA_DASHBOARD_CLIENT_USER: "archivematica"
ARCHIVEMATICA_DASHBOARD_CLIENT_PASSWORD: "demo"
ARCHIVEMATICA_DASHBOARD_CLIENT_HOST: "mysql"
ARCHIVEMATICA_DASHBOARD_CLIENT_DATABASE: "MCP"
ARCHIVEMATICA_DASHBOARD_SEARCH_ENABLED: "${AM_SEARCH_ENABLED:-true}"
volumes:
- "../src/archivematica/src/archivematicaCommon/:/src/archivematicaCommon/"
- "../src/archivematica/src/dashboard/:/src/dashboard/"
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:rw"
- "dip_uploads:/var/dip-uploads:rw"
links:
- "mysql"
- "gearmand"
- "elasticsearch"
- "archivematica-storage-service"
networks:
- "common"
- "atom_am"
archivematica-storage-service:
build:
context: "../src/archivematica-storage-service"
environment:
FORWARDED_ALLOW_IPS: "*"
SS_GUNICORN_ACCESSLOG: "/dev/null"
SS_GUNICORN_RELOAD: "true"
SS_GUNICORN_RELOAD_ENGINE: "auto"
DJANGO_SETTINGS_MODULE: "storage_service.settings.local"
SS_DB_URL: "mysql://archivematica:demo@mysql/SS"
SS_GNUPG_HOME_PATH: "/var/archivematica/storage_service/.gnupg"
volumes:
- "../src/archivematica-storage-service/:/src/"
- "../src/archivematica-sampledata/:/home/archivematica/archivematica-sampledata/:ro"
- "archivematica_pipeline_data:/var/archivematica/sharedDirectory:rw"
- "archivematica_storage_service_staging_data:/var/archivematica/storage_service:rw"
- "archivematica_storage_service_location_data:/home:rw"
- "dip_uploads:/var/dip-uploads:rw"
links:
- "mysql"
networks:
- "common"
- "atom_am"
Notes:
- Apparently, if the "networks" section is altogether omitted, like in the original docker-compose.yml file, all containers are placed on the same "implicit" network. Because of this, we had to change the docker-compose.yml file much more than we'd like, adding a clause to each container putting it on a "common" network (otherwise containers wouldn't communicate after adding our external network). Perhaps there's another way to declare the external network that doesn't cause this, maybe inside the "links" section?
- Since we weren't sure which of the containers -- archivematica-dashboard, archivematica-mcp-server or archivematica-mcp-client -- actually call AtoM's API, we connected the external volume and network to all 3 containers. We plan to pinpoint which containers actually participate in the communication, so as to reduce the necessary changes to the docker-compose.yml file
5. Change file atom/docker/docker-compose.dev.yml adding the external volume and network. Complete file below, changes highlighted (a lot less changes on this file, thankfully):
---
version: "2"
volumes:
elasticsearch_data:
driver: "local"
percona_data:
driver: "local"
dip_uploads:
external:
name: "dip-uploads"
networks:
net_cache:
net_db:
net_jobs:
net_http:
net_search:
atom_am:
external:
name: "atom-am"
services:
elasticsearch:
image: "elasticsearch:1.7"
# chown only seems to be solving a problem only happeing with osx+boot2docker and vboxsf
command: "bash -c 'chown -R elasticsearch:elasticsearch /elasticsearch-data && elasticsearch -Des.network.host=0.0.0.0 -Des.path.data=/elasticsearch-data'"
volumes:
- "elasticsearch_data:/elasticsearch-data:rw"
ports:
- "63002:9200"
expose:
- "9300"
networks:
- "net_search"
percona:
image: "percona:5.6"
environment:
- "MYSQL_ROOT_PASSWORD=my-secret-pw"
- "MYSQL_DATABASE=atom"
- "MYSQL_USER=atom"
- "MYSQL_PASSWORD=atom_12345"
volumes:
- "percona_data:/var/lib/mysql:rw"
- "./etc/mysql/conf.d/:/etc/mysql/conf.d:ro"
expose:
- "3306"
networks:
- "net_db"
memcached:
image: "memcached"
command: "-p 11211 -m 128 -u memcache"
expose:
- "11211"
networks:
- "net_cache"
- "net_jobs"
gearmand:
image: "artefactual/gearmand"
expose:
- "4730"
networks:
- "net_cache"
- "net_jobs"
atom:
build:
context: "../"
dockerfile: "./docker/Dockerfile"
command: "fpm"
volumes:
- "../:/atom/src:rw"
- "dip_uploads:/var/dip-uploads:rw"
networks:
- "net_cache"
- "net_db"
- "net_http"
- "net_jobs"
- "net_search"
environment:
- "ATOM_DEVELOPMENT_MODE=on"
- "ATOM_ELASTICSEARCH_HOST=elasticsearch"
- "ATOM_MEMCACHED_HOST=memcached"
- "ATOM_GEARMAND_HOST=gearmand"
- "ATOM_MYSQL_DSN=mysql:host=percona;port=3306;dbname=atom;charset=utf8"
- "ATOM_MYSQL_USERNAME=atom"
- "ATOM_MYSQL_PASSWORD=atom_12345"
- "ATOM_DEBUG_IP=172.22.0.1"
ports:
- "63022:22"
atom_worker:
build:
context: "../"
dockerfile: "./docker/Dockerfile"
command: "worker"
volumes:
- "../:/atom/src:rw"
- "dip_uploads:/var/dip-uploads:rw"
networks:
- "net_cache"
- "net_db"
- "net_jobs"
- "net_search"
environment:
- "ATOM_DEVELOPMENT_MODE=on"
- "ATOM_ELASTICSEARCH_HOST=elasticsearch"
- "ATOM_MEMCACHED_HOST=memcached"
- "ATOM_GEARMAND_HOST=gearmand"
- "ATOM_MYSQL_DSN=mysql:host=percona;port=3306;dbname=atom;charset=utf8"
- "ATOM_MYSQL_USERNAME=atom"
- "ATOM_MYSQL_PASSWORD=atom_12345"
nginx:
image: "nginx:latest"
ports:
- "63001:80"
volumes:
- "../:/atom/src:ro"
- "./etc/nginx/prod.conf:/etc/nginx/nginx.conf:ro"
networks:
- "net_http"
- "atom_am"
depends_on:
- "atom"
Notes:
- Since the API calls will be directed to the nginx container, we only needed to connect that container to the external network
- We guessed it was only necessary to connect the atom_worker container to the external volume, but ended up connecting the atom container to it as well
6. Proceed with instructions for creating Archviematica and AtoM containers ("docker-compose up -d" and the rest of the instructions for each)
7. Test HTTP communication from Archivematica's MCP Server container to AtoM's nginx container:
Expected output: AtoM's index HTML page, showing that from Archivematica we can get to AtoM referring only to its container name (no local ephemeral IP addresses involved!)
8. Test access to the shared volume:
docker exec -u archivematica compose_archivematica-mcp-server_1 touch /var/dip-uploads/foo
docker exec docker_atom_worker_1 rm /var/dip-uploads/foo
Expected output: none (no errors means the user "archivematica" was able to create a file on the /var/dip-uploads directory from the MCP Server container and that from the atom worker container we can read and remove said file).
9. Workaround for problem on atom worker with qtSwordPlugin plugin (this was the really challenging part)
Check your atom_worker's log:
docker-compose logs atom_worker
Look for this warning message:
Ability not defined: qtSwordPluginWorker. Please ensure the job is in the lib/task/job directory or that the plugin is enabled.
If you don't see the message above, but see the message "New ability: qtSwordPluginWorker" (which is unlikely at this point), then you can skip the rest of this step.
If you do see the warning message above, you must first solve it. The integration will not work while this issue is not addressed. If you continue with the integration, you'll later encounter an error when Archivematica calls AtoM stating that no worker is able to process the request (I can't find the exact message, but something along those lines).
Searching the web for the warning message above, we found it on several log outputs posted by users. Stranger still, we noticed that:
- Restarting the container (docker-compose restart atom_worker) does not solve the issue
- If we repeat the entrypoint command while the container is running, the warning comes up again: docker exec docker_atom_worker_1 php symfony jobs:worker
- If we first access the plugins configuration page on AtoM's dashboard and then repeat the command above the warning goes away
Repeating: merely accessing the plugins configuration page causes the qtSwordPluginWorker to be loaded correctly! We noticed this by accessing the page to ensure the plugin was enabled (per Dan's recommendation on several related threads). The plugin is enabled by default on the docker, but merely accessing the page is enough to get the qtSwordPlugin loaded (clicking the "Save" button is not required).
This, of course, does not solve the problem for our container, since we need the worker started by the initial entrypoint command to succeed, not an additional worker executed "on the side".
By looking at the source code for the
sfPluginAdminPlugin, responsible for showing the plugin configuration page, we guessed that the method
sfPluginAdminPluginConfiguration.initialize(), executed when the plugin configuration page is first accessed, was making the difference -- specially
this part. For some reason, it seems the
qtSwordPluginWorker is not loaded if only
php symfony jobs:worker is executed. However, some actions during the normal operation of AtoM's dashboard cause it to be loaded. This might explain why so many log outputs seem to have that warning, and why a simple restart of the worker usually solves the issue.
Anyway, we were painstakingly able to come up with a workaround for this issue. I must say it is quite ugly and we do not expect this to be a permanent solution at all!
The fix consists of manually changing the file ~/atom/lib/task/jobs/jobWorkerTask.class.php on the host machine with the following highlighted lines:
<?php
// Part of the workaround for qtSwordPlugin autoload problem
require_once "/atom/src/plugins/qtSwordPlugin/config/qtSwordPluginConfiguration.class.php";
(...)
protected function execute($arguments = array(), $options = array())
{
$configuration = ProjectConfiguration::getApplicationConfiguration($options['application'], $options['env'], false);
$context = sfContext::createInstance($configuration);
// Part of the workaround for qtSwordPlugin autoload problem
$pluginConfig = new qtSwordPluginConfiguration($configuration, "/atom/src/plugins/qtSwordPlugin", "qtSwordPlugin");
$pluginConfig->initializeAutoload();
$pluginConfig->initialize();
// Using the current context, get the event dispatcher and suscribe an event in it
$context->getEventDispatcher()->connect('gearman.worker.log', array($this, 'gearmanWorkerLogger'));
(...)
PS: Did I mention it was ugly?
After saving that file, the warning is gone:
docker-compose restart atom_worker
docker-compose logs atom_worker
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arFindingAidJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arInheritRightsJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arObjectMoveJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arInformationObjectCsvExportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: qtSwordPluginWorker
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arUpdatePublicationStatusJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arFileImportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arInformationObjectXmlExportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arXmlExportSingleFileJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arGenerateReportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arActorCsvExportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arActorXmlExportJob
atom_worker_1 | 2019-02-18 19:52:37 > New ability: arRepositoryCsvExportJob
atom_worker_1 | 2019-02-18 19:52:37 > Running worker...
atom_worker_1 | 2019-02-18 19:52:37 > PID 12
10. Configure the DIP upload on Archivematica with the following settings:
- Rsync target: /var/dip-uploads
- Rsync command: (empty)
The rest of the settings (login email, password, REST API key) should be filled out as usual with credentials for AtoM.
11. Configure AtoM
The default setting is for AtoM to look for DIP uploads on the "/tmp" directory. We must change that on Admin > Settings > Global > SWORD deposit directory. Set it to "/var/dip-uploads"
12. Restart atom_worker
docker-compose restart atom_worker
Note: atom worker reads the setting above when it starts, so we must restart it once after changing the setting
That's it, now the DIP upload should work. Phew!
Like I said, this is very much a work in progress. IMO the main steps that need improving are:
- Reduce the "customization footprint" on Archivemetica's docker-compose.yml file by better understanding the whole "implicit networking" configuration, which seem to be related to the usage of the "links" configuration instead of "network"
- Fix the mentioned issue with the loading of qtSwordPlugin on atom worker (Dan, do you think we should go ahead and report this as a bug?)
- Find a way to avoid needing to change the file MCPServer.Dockerfile (perhaps if we mount the volume to /archivematica/dip-uploads it will be created having archivematica as owner?)
That's it for now. Please let me know if any of this can be of use for you guys at Artefactual, and if you need anything from my part.
Cheers,
Tatiana Canelhas