Container Service Swarm mode not working?

Jens Petersen

unread,

Jan 22, 2018, 12:31:36 PM1/22/18

to xnat_discussion

Hey Flavin (et al.),

I tried using the Swarm mode with container service version 1.3.2 in XNAT 1.7.4.1 (Vagrant), but for some reason my image/container is not executed. I set up the swarm using

docker swarm init --advertise-addr XNAT_IP

and used the resulting command to join my computer as a worker. I don't require mounts, so I skipped the docker-machine stuff. Both nodes are listed and everything looks ok as far as I can tell. When I start a container (via Command Automation) a service is created, but not executed? I'm new to Swarm, so I don't really know how to inspect this. What I can see is:

sudo docker service ls

jxlye1mtbknj laughing_carson replicated 0/1 jenspetersen/glioblastoma_segmentation:latest

sudo docker service ps laughing_carson

yqu3zfzq6toy laughing_carson.1 jenspetersen/glioblastoma_segmentation:latest MY_COMPUTER Shutdown Complete 24 minutes ago

sudo docker service logs laughing_carson

->empty

sudo docker service logs laughing_carson.1

no such task or service

There are no errors, but my XNAT data isn't changing, so I know it didn't work. Is there a different way to view logs? By the way, it doesn't make a difference if I don't set my XNAT node to drain and let it try to work there, so it shouldn't be a networking issue. Everything works well if I'm not in swarm mode.

As usual, any help is much appreciated! Thanks for your great work!

Jens

p.s. totally different question: There is a flag "--reserve-memory" for service creation that I would like to use. Where in the code would I have to look if I wanted to integrate that somehow.

Jens Petersen

unread,

Jan 23, 2018, 5:29:31 AM1/23/18

to xnat_discussion

I found that my image's entrypoint is overridden, so it just executes an empty command... any idea why that could be the case?

John Flavin

unread,

Jan 23, 2018, 11:46:33 AM1/23/18

to XNAT Discussion board

Jens,

TL;DR Yes, there are reasons why your entrypoint is not being used when the container service runs in swarm mode. It is possible I could make this work for you, but I would have to do it carefully to avoid breaking a lot of other commands.

In what follows, I will explain the issue in more detail than anyone really needs. I do this for my own good, because writing it all out is how I can come to understand it.

The APIs for Swarm and vanilla docker are not consistent in how they use entrypoints and commands and arguments. Check out this comment (https://github.com/moby/moby/issues/29171#issuecomment-265274698) on a docker issue thread that gives a big table of the vanilla docker APIs used to define commands to run when creating containers (Image.Entrypoint and Image.Cmd) vs the Swarm APIs to define the command for a service (ContainerSpec.Command and ContainerSpec.Args). I will refer to those APIs throughout this post, so the table is very useful if not required for understanding the rest.

Let me break down how the container service resolves the command-line string and gets it into the container. First, we resolve the command-line string in the command using the arguments. Call that CMD. When we create the container (respectively, swarm service) we set the property Image.Cmd (resp. ContainerSpec.Command) to the list ["/bin/sh", "-c", "CMD"]. So we always execute the command within a shell.

That gets to the reason why your entrypoint is being overridden: because we use the Swarm API ContainerSpec.Command when creating the swarm service, the entrypoint will always be overridden. That's just how the API works.

However, there is more to the story. In the last version of the container service (1.4.0) I made a change that overrode the image entrypoint when launching containers in non-swarm mode as well. (I didn't publicize this much, but I probably should have, because it is a change that will likely impact commands.) The reason I did this is that the API I use to launch containers, Image.Cmd, meant there were problems running containers for images which have an entrypoint. For example, we'll call that entrypoint ENTRY. When not in swarm mode, the command that docker actually ends up executing is ENTRY /bin/sh -c CMD. That isn't what anyone wants.

So I made the decision to override the entrypoint when launching containers (though, again, what you are seeing in swarm mode is not related to my decision, it is just a natural consequence of the APIs used to create swarm services) with the assumption that whatever had been the entrypoint on the image would simply be made explicit in the command's command-line string.

But that may not have been the right decision!

Now that I have typed this all out, I can see another way things could maybe, possibly work. Instead of doing what I currently do, which is...

* Non-swarm: Image.Cmd = /bin/sh -c CMD, Image.Entrypoint = "" <--- using an empty string explicitly overrides the entrypoint

* Swarm: ContainerSpec.Command = /bin/sh -c CMD, ContainerSpec.Args = null

I could do this...

* Non-swarm: Image.Cmd = CMD, Image.Entrypoint = null <--- using null leaves image entrypoint intact

* Swarm: ContainerSpec.Command = null, ContainerSpec.Args = CMD

In the new way, if an entrypoint is set, it gets used. Everything in the resolved command-line string gets passed to it as arguments. If an entrypoint is not set, it gets the default value of /bin/sh -c. What gets executed is /bin/sh -c CMD. So that's equivalent to what I am already doing anyway!

The only other issue is when an image has an entrypoint set, but the person writing the command does not want to use it. I have to consider whether to allow command authors to explicitly, knowingly override the entrypoint by setting a property in the command.

Sorry that your stuff isn't working. I'll see what I can do to fix it. Give me a few days to clear a couple other things off my plate, and I will play with this issue a little.

Flavin

--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xnat_discussi...@googlegroups.com.
To post to this group, send email to xnat_di...@googlegroups.com.
Visit this group at https://groups.google.com/group/xnat_discussion.
For more options, visit https://groups.google.com/d/optout.

Jens Petersen

unread,

Jan 23, 2018, 1:01:23 PM1/23/18

to xnat_discussion

Hey Flavin,

thank you for explaining this in such detail.

First things first, I managed to solve the problem for my use case by just removing the "command" option in the ContainerSpec. But that only works because I don't rely on commands and everything is done by my entrypoint. Maybe a short explanation why: I'm trying to build self-contained containers and everything I need I pass on as environment variables that are resolved with the ENTRYPOINT. I could of course use CMD instead, but as soon as the container is launched with a first arg, things stop working, because that input is no longer resolved for variables. I'm sure you know this, just in case others are interested as well.

Now the difference in APIs between Swarm and regular Docker is quite unfortunate, I came across a thread similar to the one you posted just a few hours ago. That being said, I think the alternative you suggest is superior to the current approach, at least from a design perspective. ENTRYPOINTs give the author of the image greater control over how users interact with it and I think in the majority of cases an ENTRYPOINT is added with purpose. One example could be the choice of shell. Is there a reason you're specifying /bin/sh for execution? I would leave that to the author of the image. So with respect to non-swarm use: Image.Cmd = CMD and Image.Entrypoint = null seems like the right choice and intended use by Docker. If someone wants to run the container without the entrypoint, I would assume that in most cases they are also the author of the image and could construct a second version without it.

For swarm-mode use it is less clear to me what the intended use of Command and Args is, but I think most people don't design images differently for use with Swarm (at least XNAT users, because the whole service concept doesn't really apply here at the moment), so I would try to make it work as similar as possible to the non-swarm case.

Just my opinion on the matter. I managed to make things work in my setting, so I can't complain :) Still using 1.3.2 though, in 1.4.0 the automation and history GUI doesn't do anything!? Also, I'm still trying to add a reserve-memory flag to the service creation, do you know how to do this by any chance?

Best,

Jens

Jens Petersen

unread,

Jan 24, 2018, 10:45:20 AM1/24/18

to xnat_discussion

So for others interested, I managed to add an option for memory reservation that can be added to the command. Basically you just start with DockerControlApi.java and replicate another optional argument like workingDirectory. The following files I also touched: Command.java, ResolvedCommand.java, CommandEntity.java, CommandResolutionServiceImpl.java. Now my services show the desired entry, i.e.

"Resources": {

"Reservations": {

"MemoryBytes": 3145728000

}

Unfortunately, this just seems to be ignored. I have one node with a little over 30GB of RAM and the Swarm just launched 5 containers on it. I would have thought it queued them instead. Maybe I just didn't understand what this option is supposed to do...

Jens Petersen

unread,

Jan 24, 2018, 12:19:30 PM1/24/18

to xnat_discussion

5*3GB < 30GB. Everything works fine :D

Will Horton

unread,

Jan 24, 2018, 1:32:16 PM1/24/18

to xnat_discussion

Jens, this UI bug is not something I'm seeing on my end, but it sounds like you're introducing some new wildcards that we haven't tested for in terms of your container / command setup. Are there any messages showing up in the browser console when you load the Admin UI and try to access the History and/or Automation tables?

Regards,
Will

Jens Petersen

unread,

Jan 25, 2018, 5:02:41 AM1/25/18

to xnat_discussion

Hi Will,

I'm getting "Uncaught TypeError: Cannot read property 'length' of null" and the offending piece of code is this:

function viewLink(item, wrapper){

var label = (wrapper.description.length) ?

wrapper.description :

wrapper.name;

Line 2 of this to be more precise, so I'm guessing the wrapper is null, which is indeed the case if I look at my command... Seeing that the description property is nullable for both Command and CommandWrapper, this should probably be checked? Or does wrapper mean something different in this context?

Thanks for your help!

Jens

John Flavin

unread,

Jan 25, 2018, 10:19:17 AM1/25/18

to xnat_discussion

Jens,

The memory reservation property looks like a great addition. Would you be comfortable contributing that back to the container service repo?

I would want to add a little bit to it first. For instance, I would also want to add the property to Container and ContainerEntity, so we could record in the database that this property was set. And we would need to make sure this doesn’t break any tests.

Flavin

Will Horton

unread,

Jan 25, 2018, 10:54:40 AM1/25/18

to xnat_discussion

Great, thanks Jens. That's an easy fix.

Jens Petersen

unread,

Jan 26, 2018, 9:58:59 AM1/26/18

to xnat_discussion

Flavin, I'll gladly contribute after expanding it a little bit (makes more sense to have limits and reservations for both cpu and memory). It'll probably take me a few days before I get to it though.

Jens Petersen

unread,

Jan 29, 2018, 11:15:23 AM1/29/18

to xnat_discussion

I quickly fixed this myself and now I can't select my wrappers for automation. The grey text says the selection is limited by the contexts available to the wrapper. Mine is for context xnat:mrSessionData and I'm trying to automate on the SessionArchived event. Any idea what the problem could be? I'm on the dev branch.

Thanks!

Jens Petersen

unread,

Jan 30, 2018, 6:36:39 AM1/30/18

to xnat_discussion

I found that SessionArchived takes only xnat:imageSessionData. Are you getting rid of the subtypes or do they just need to be added to the event options?

Will Horton

unread,

Jan 31, 2018, 5:18:48 PM1/31/18

to xnat_discussion

Yes, it looks like the automation setup is currently not smart enough to translate specific image session xsitypes in the command context to the SessionArchived event. I just created two new debug commands like this for MR and PET sessions:

    {
      "name": "debug-MR-session",
      "description": "Run the debug container with a MR session mounted",
      "contexts": [
        "xnat:mrSessionData"
      ],
      "external-inputs": [
        {
          "name": "session",
          "description": "Input session",
          "type": "Session",
          "matcher": null,
          "default-value": null,
          "required": true,
          "replacement-key": null,
          "provides-value-for-command-input": null,
          "provides-files-for-command-mount": "in",
          "via-setup-command": null,
          "user-settable": null,
          "load-children": false
        }
      ],
      "derived-inputs": [],
      "output-handlers": [
        {
          "name": "output-resource",
          "accepts-command-output": "output",
          "as-a-child-of-wrapper-input": "session",
          "type": "Resource",
          "label": "DEBUG_OUTPUT"
        }
      ]
    },

In the automation menu, only the original debug-session command (which is tied to the xnat:imageSessionData xsiType) is selectable. Bummer.

This gets into a problem of how to expose XFT and the relationships between data types in the front end. I'll log a bug for this and look into it.

John Flavin

unread,

Feb 1, 2018, 9:43:41 AM2/1/18

to XNAT Discussion board

This is a bug but I don't know how likely it is to be fixed. This bug is in the container service's events and automation code, which is not long for this world.

We are currently working on a big update to the event and automation infrastructure within XNAT. When that update is done, we will remove the container-service-specific event and automation code/ui/apis and migrate the container service over to use XNAT's code/ui/apis instead.

With that said, it would not be an efficient allocation of limited resources to fix bugs in code that we plan to remove soon.

Flavin

Reply all

Reply to author

Forward