I am toying around with Anduril and Docker and was wondering if the Anduril workflow engine should provide some additional functionality to allow running components inside containers.
Here is the idea:
- When instantiating components, there is an additional annotation named "@docker" that is assigned the name of a docker image, inside which the component should execute (e.g. "anduril/HTSeqBam2Count"). This could maybe default to "anduril/<component-name>" or "anduril/<bundle-name>".
- If the @docker annotation is present, the Anduril engine prefixes the execution command with a user-defined docker string followed by the value of @docker (i.e. "<docker_prefix> <image_name> <command>").
- The docker prefix can be provided either globally via the 'anduril run' command line parameter, or made host-specific via hosts.conf.
- Docker prefixes can be used in both local and remote execution mode. In remote execution mode, the docker prefix is sandwiched between the SSH call and the execution command, e.g. "ssh myserver docker run anduril/HTSeqBam2Count <command>". (This actually works, I have ran my workflows with it).
- Docker images for components will be provided by component developers and made publicly available via Docker Hub. With Docker Hub, docker images get installed "automagically" on execution hosts if not present.
- Not every component needs to have its own docker image. A component can run in "classic mode" (without using docker at all), inside a generic docker image provided by Anduril (e.g. bundle-specific images), or inside their own component-specific image provided by the component developer. Note that the granularity of the association between component and image is completely up to the component and workflow developers.
This solution will provide several advantages over the status quo:
- Ease of installation. No need for the user to install any third-party software on any of their execution hosts (except docker of course). Docker images are automatically downloaded and fired-up upon execution of the workflow. Deploying an existing pipeline on a user's system will therefore become very easy.
- Guaranteed execution. Because a component will be tested with a specific version of a docker image, it's proper execution on the user's system can be guaranteed by the developer.
- Version control. Components can no longer fail because the wrong version of a third-party software is installed. Also, version conflicts can no longer occur. For example, if component A needs python2 and component B needs python3, they can still run side-by-side. Thus, components become completely independent of each other, at least with respect to their execution environment.
- Increased reproducibility. Because, with the above solution, a workflow now defines not only the execution steps but also its execution environment (via the @docker tags), it is guaranteed that re-execution of a workflow somewhere else or later in time yields exactly the same results.
- Control over resource allocation. Via docker prefixes a sysadmin can precisely control how much resources (CPU, memory) are granted to Anduril on a execution host. Sysadmins could therefore be less reluctant in providing computing resources to Anduril deployments.
- Slim base system. Installation of Anduril itself can be kept very slim, because software required to execute workflows (e.g. Latex, R packages) comes only with components that actually need them.
- Easier cloud deployments. Because of the above, elastic cloud deployments of Anduril workflows will presumably become easier. But I admit that I have not completely thought this through yet.
To be clear, dockerized execution can already be accomplished with the current functionality of Anduril, for example by providing docker prefixes via 'prefix' execution mode or via customized RemoteExecute strings (I actually tried both and it works). However, the main functionality currently missing in Anduril (at least I could not find it) is
component-specific command prefixes. Without them, there is no elegant way to specify which component should run inside which execution environment (i.e. which docker image).
Marko suggested that one could think of 'hijacking' the @host annotation feature for this, i.e. we point @host to a virtual Docker computing host that defines its custom RemoteExecute command to fire-up docker. The main problem with this approach is that Anduril gets confused with the total available computing slots, because it is unaware that multiple virtual Docker hosts share the same physical host. Thus, physical hosts might get overallocated with jobs by the Anduril engine.
I am fairly new to Anduril, so it is quite possible that the solution proposed above is problematic, or that I am overlooking the obvious and all of this is already possible within Anduril's current feature set.
Anyways, I am open to suggestions and I would be happy to see a lively discussion on this topic.