To follow up on this and the discussion from the general mailing list: a few thoughts on design considerations for Kubernetes deployment of Arches:
Questions to be answered:
- the Python project code-base needs to be pre-built into a
per-Arches-instance docker image - could this be a standard, e.g.,
Gitlab CI file for doing this (which users can modify for other CI)? This would mean that when a project repo is created it'll automatically build a usable image.
- how do we best handle static assets? (we have a current approach that broadly seems to work for this, but optionally pushing to a CDN would be nicer)
- is any state (e.g. session state) managed outside of the obvious
locations - postgres and couchdb? (the Arches 5 docker-compose looks
like that's a no)
- what maintenance tasks/jobs
need run periodically - either manually or automatically? What one-off
jobs/commands should be accommodated?
- how should
back-ups be approached? Should this be a Kubernetes design point, or is
this overextending scope, provided clear direction is provided?
- what default storage types, sizes make sense? E.g. should we default to S3/minio/Blob storage for uploads?
- initialization - what options should be at the infrastructure level,
how do we differentiate essential first-run steps (for any deployment)
from optional set-up that may need to be customized
- what are the essential-to-parameterize aspects for quick set-up e.g. initial username
- how are upgrades to the underlying project or to the Helm chart handled?
None of those are blockers to getting a rough testable base - I've got one running, using out-of-the-box Arches 5, on our internal test cluster (prototype code:
- still a couple fixes needed to be properly testable, and a redeploy to check it still works here)
In terms of usability and uptake, I would suggest tidying up that reference chart, so it is as simple and quick to pick up and play with as possible for someone basically familiar with Kubernetes, even it doesn't describe a full recipe for a production-ready system. At the same time, we can then grow a parallel fork with any the relevant features for enterprise-level deployment as we progressively incorporate them - e.g. linking with common Kubernetes plugins for additional network security, scalable Kubernetes-side storage, third-party integration, Kubernetes level access controls and network/process isolation, for instance. Charts that I've seen attempt both in one codebase can be quite hard to get to grips with and try out locally or on a dev cluster.