More questions about docker

112 views
Skip to first unread message

Tatiana Canelhas

unread,
Apr 4, 2019, 7:48:52 AM4/4/19
to AtoM Users
Hi, group, everything good? 

The company I work for wants to install AtoM and Archivematica on Docker. They are aware that the current instructions are not production-ready, but nonetheless they wish to try and make it work, identifying production-related issues and researching for solutions (and sharing with the community, as we've recently done with the case of integrating atom and am on Docker).

The following issue was thus raised: If you lose one entire instance (all containers), how do you recover? How would we backup the relevant data? Moreover, can the Docker installations be used on a failover-ready scenario, on which you have one instance ready to be deployed if the other fails, with all relevant data synchronized?

I believe these issues are related to how would one keep the data that needs to be persisted out of the docker containers themselves, or at least how would one go about backing that data up. Which volumes on each docker-compose file need to be persisted across instances, or backed up? Does anyone foresee any other items that need to be persisted, which would be required by the new or failover instance? If there is some unforeseen data that should have been backed up but wasn’t (for instance plugin installation files), would atom and am indicate this in any way upon installation of the new instances with the (partially) restored data?

Also, is this a problem we’d also encounter on conventional VM installations, or is this specific to Docker installations?  

Cheers, and thank you. 

Tatiana Canelhas

sebast...@gmail.com

unread,
Apr 16, 2019, 4:54:05 AM4/16/19
to AtoM Users
Hello everyone,

I am also currently in the process of evaluating running AtoM with Docker in a production environment and arrived at similar questions.

The documentation states that "There are a few directories under AtoM that must be writable by the web server" which seems to implicate that I would have to configure several Docker volumes that need to be mounted to several destinations inside the AtoM container. It is quite obvious that the uploads folder needs to be persisted (in addition to the percona and elasticsearch data directories). However I am afraid that other data written by AtoM might get lost when reinstantiating the image.

Could someone point me to the directories that need to be persisted?


Thank you,
Sebastian Cuy

José Raddaoui

unread,
Apr 22, 2019, 1:41:50 PM4/22/19
to AtoM Users
Hi Tatiana and Sebastian,

Those are great questions! I'm afraid I don't have the answer for most of them but I hope that someone in the community can help you further.

In terms of Atom and what needs to be persisted, as Sebastian mentions, the Percona and Elasticsearch data volumes should be maintained. It's less important to keep the Elasticsearch data as the index can be recreated from the database using a Symfony task. The directories that need to be accessed by the web server inside the AtoM folder are mostly the "uploads" and "downloads" directories, this are already persistent in the Docker host as part of the entire AtoM folder, which is a named volume in the Docker Compose file for development. However, you should take a look to the bootstrap script executed in the entry point of the "atom" and "atom_worker" containers, as it may modify some of the files inside that volume.

I hope that helps a bit. Best regards.

Ricardo Pinho

unread,
Apr 26, 2019, 6:53:56 PM4/26/19
to ica-ato...@googlegroups.com
Hi everyone, and thank you Tatiana for restarting this "docker for production ready" debate!

Yes, I fully agree with the company you work for, the need and interest on a docker production ready install for AtoM and Archivematica.
It should start from a isolated docker installation for each product and go on to a third integrated docker production ready solution.

But, to achieve this goal, we should all, community users and Artefactual, go together and follow a common strategy.
So, at this point I would suggest to ask Artefactual some questions:

1. Does Artefactual agree on the need of a Docker production ready installation for the products (AtoM & Archivematica)?

2. Can Artefactual describe the requirements and strategy for defining and building an official docker production ready installation for the products (AtoM & Archivematica)?

3. What are the resources and contributions from the users that Artefactual consider necessary for building a production ready installation?
(examples: funding, programmers, testing, documentation, etc?)

For a kickstart initiative, I may suggest to start working on a docker install for the next stable released of AtoM, version 2.5.0!
Following the present docker building instructions, results on this list of 7 containers:
docker-compose ps
         Name                       Command               State                 Ports
---------------------------------------------------------------------------------------------------
docker_atom_1            /atom/src/docker/entrypoin ...   Up      9000/tcp
docker_atom_worker_1     /atom/src/docker/entrypoin ...   Up      9000/tcp
docker_elasticsearch_1   /bin/bash bin/es-docker          Up      0.0.0.0:63002->9200/tcp, 9300/tcp
docker_gearmand_1        docker-entrypoint.sh gearmand    Up      4730/tcp
docker_memcached_1       docker-entrypoint.sh -p 11 ...   Up      11211/tcp
docker_nginx_1           nginx -g daemon off;             Up      0.0.0.0:63001->80/tcp
docker_percona_1         /docker-entrypoint.sh mysqld     Up      3306/tcp

What should be changed to turn this on an Production Ready installation?

Finally, I must say that you can count on my contribution, for working and collaborating for this project, if Artefactual decides to start and promote it.

Thank you!
Best regards,
Ricardo Pinho

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/a303a727-4849-4c49-badc-e73db01a4c41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Ricardo Pinho

Dan Gillean

unread,
May 7, 2019, 2:57:40 PM5/7/19
to ICA-AtoM Users
Hi Ricardo,

Thanks for your thoughts on this interesting topic. I have had the opportunity to get some responses from our Director of Technical Services, and have added a few thoughts of my own, below. 

1. Does Artefactual agree on the need of a Docker production ready installation for the products (AtoM & Archivematica)?

Input from our Technical Director:

For AtoM we haven’t had any requests that I am aware of from paid hosting or maintenance clients to deploy AtoM with Docker in a production or development environment.  In talking to some of the Archivematica team it sounds like running Archivematica in Docker has been more of a hot topic, and at least a few organizations are already deploying Archivematica with Docker.

I’m curious what is most compelling to you and Tatiana about deploying AtoM and Archivematica in Docker?  Are you interested mostly interested in lighter VMs and more efficient use of resources, or process isolation, or are you looking for orchestration and horizontal scaling?  What is your business need for Docker?  Have you identified blockers for your organization to running Archivematica and AtoM on Docker in production? Are there blockers for using the recommended Ubuntu solution?

I think that the questions Tatiana has raised about how to recover from the loss of Docker containers and synchronizing data across containers need to be managed at the operational strategy level, and will require a substantial amount of work to automate.  I don’t think Docker container configuration or orchestration tools are going to make this kind of replication and recovery automation quick and easy, though I don’t have any practical experience in Docker container infrastructure or tooling so I may be wrong about this.
 
My additional thoughts: 

Beyond the sound business reasons that David has shared above, there are also some open implementation questions that Artefactual can't spend the time to investigate without support. From what I can gather doing some quick research on using Docker in production, there are some known issues that can make using Docker in production time consuming, and potentially risky if you don’t know what you are doing. For example, this article outlines how, from a security perspective, Docker is not ready to use in production out of the box - you must take further steps to ensure it is ready. 

Additionally, though some more recent Docker releases seem to have addressed some of the concerns raised in these articles, they still raise some valid points about some of the potential downsides or risks involved in using Docker:



To add to this, my understanding is that using Docker does not necessarily help us with virtualization on non-Linux servers, such as a Windows or Mac machine. Linux and Windows containers are not the same in Docker, and require different versions of Docker to run. Apparently Microsoft is now making it possible to run Linux containers on Windows, but the feature is still considered experimental and for that reason we wouldn’t recommend relying on this as a production-ready strategy. Ultimately, for a stable, production-ready version this starts to look like maintaining multiple different versions for different OSes, something we do not want to commit to doing. Virtualizing Linux on a Windows server and following the default installation instructions is equally as efficient, and likely less difficult to troubleshoot.
  


All of these issues are surmountable of course - but it requires analysis and testing to determine what the proper solution should be. 


3. What are the resources and contributions from the users that Artefactual consider necessary for building a production ready installation? (examples: funding, programmers, testing, documentation, etc?)

From our Technical Director:

Money, code, testing, and documentation contributions would all be welcome in developing production ready Docker containers for Archivematica and AtoM.  However, I believe the most critical requirement for creating production Docker containers is a long term plan for maintenance of the containers.  Our experience in developing open source software is that finding resources for the development of new features is always easier than securing ongoing resources for maintaining those new features.  Without a business plan to fund the ongoing maintenance costs of Archivematica and AtoM Docker containers, Artefactual can’t afford to take on this additional maintenance work.

For Archivematica and AtoM over the last several years we’ve invested in Ansible automation for deploying the software to traditional virtual machines (running on a hypervisor) running Ubuntu or CentOS.  So far we haven’t had a compelling business case to invest in Docker containerization or deployment of AtoM.  For Archivematica more work has been done in creating and maintaining Docker containers as the recommended development environment, but I’m not clear how much extra work would be required to make those containers ready for production use.

My additions: 

We would be happy to offer what general guidance we can, but for the funding and maintenance issues that David has outlined above, it is unlikely that Artefactual will support an official, production-ready Docker option any time soon. As David has mentioned, our system administrators have been fine-tuning our deployment scripts for Ubuntu (and now CentOS as well) for years, and we intend to continue using these for our hosted clients. We have only one developer who regularly uses the Docker Compose development environment (Radda, who has previously responded in this thread), and have honestly considered removing this from the public-facing documentation due to a lack of resources for testing, maintenance, and documentation. 

If you and other community members successfully create a production-ready Docker instance, we would be happy to help promote it, and list it on the AtoM wiki. If, over time, we see it is well-maintained by the community, and broadly used, we may reach out and consider adding official documentation for it. In the meantime, I think we would need to see a compelling use case for focusing on this deployment option, and some sponsorship to help us do the analysis, documentation, testing, and maintenance work that would be required for us to maintain this over time, before we consider adding this as an officially supported, production-ready installation option. 

As for your second question (can Artefactual describe the requirements and strategy for defining and building an official docker production ready installation for the products?), we haven't answered this directly, because a) the requirements are tied up in gaining sponsorship support for the analysis, testing, documentation, and ongoing maintenance of such an option, and b) because until we have the means and opportunity to do some of the research and analysis, we are not sure what to recommend! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


Ricardo Pinho

unread,
May 9, 2019, 4:08:08 PM5/9/19
to ica-ato...@googlegroups.com
Dear Dan,

Thank you for all the time and effort put on this amazing detailed reply.
Sorry I can't now give the same attention, but here it goes my personal point of view (nothing more than that!):

> I’m curious what is most compelling to you and Tatiana about deploying AtoM and Archivematica in Docker?
I can only say on my behalf, I see Docker (container) technology use has a benefit both for developers and users.
Easy and flexibility of installation, deployment, updating and, yes, scaling.

> Virtualizing Linux on a Windows server and following the default installation instructions is equally as efficient, and likely less difficult to troubleshoot
I'm a virtualization fan for a long time, and even contributed to introduced it in the GIS Open World in 2007/9!
Your doubts about docker, are relative the same as the ones about VM those days.
I completely understand it, but I really believe that we must take those steps today so tomorrow we are not left behind and help to build a better future!
I've been struggling with Docker for almost 2 years, even if I am still much more comfortable with VM's!

> Money, code, testing, and documentation contributions would all be welcome in developing production ready Docker containers for Archivematica and AtoM. Money and code, we don't have and can't give it to you! Portugal spent it all in proprietary and close source licensing and development.
Even the money we get from European Structural funding for development is spent on buying licenses.
On the rest, I guess we can make a significant contribution if we have the chance to gather a strong Portugueses community, together with our Brasilian colleagues.

> b) because until we have the means and opportunity to do some of the research and analysis, we are not sure what to recommend! 
Fair enough! I really understand.

And what about ATOM 3? and https://accesstomemoryfoundation.org/
I'm aware of some detailded  research, analysis, design and proposals, toghether with some partners.
What can we expect in the near future from it? Will it be docker based? eheheh

Chears,
Ricardo Pinho

For more options, visit https://groups.google.com/d/optout.


--
Ricardo Pinho

Dan Gillean

unread,
May 9, 2019, 5:21:59 PM5/9/19
to ICA-AtoM Users
Hi Ricardo, 

Thank you as always for your understanding, and for helping to create such interesting and relevant conversations. 

I think that you are right in seeing an opportunity in AtoM3 - it's possible that we could make everything Docker-based from the beginning, so it is only a matter of scaling up or down resources for testing and development vs production. Some of my concerns about exactly how production-ready Docker is still stand, as does our need to find developers experienced with Docker in production who can help advise us, but it is most certainly worth considering, and could lead to less maintenance in the long run if there is only one distribution to maintain. Since it will likely still be a while before AtoM3 is anywhere near production ready, we also have some time for Docker to mature and see broader adoption. Time will tell! 

The other thing that is different, of course, is that Artefactual will not necessarily be at the center of decision-making for AtoM3. We will certainly bring our experience and opinions to the table, but part of the purpose of the Foundation is to allow the community to have more say over the direction of the project! 

With that in mind, if you (or anyone else reading this!) have not yet completed the AtoM3 survey that the Foundation is currently circulating, now would be a great time! If I recall correctly, there are some free-text responses near the end of the survey where you could add your thoughts on this. 

One of the current member of the Foundation's Board posted in the forum about this a while ago, here. You can find the survey (as well as a Proof of Concept proposal whitepaper [PDF] that Artefactual and several other companies produced last year, to help guide the discussion and offer a method of proving some of our assertions and testing an initial technology stack before we begin active development), on the Foundation website, here: 
The survey is open to all, even if you are not currently a Foundation member! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


Reply all
Reply to author
Forward
0 new messages