Custom task driver advertising ip and port

48 views
Skip to first unread message

jer...@fly.io

unread,
May 6, 2019, 4:34:29 PM5/6/19
to Nomad
I've implemented a custom task driver that spins up firecracker microvms and I'm having difficulties with the networking bit.

I'm creating tap networking interfaces for each VM and then assigning them an IP on the interface's range. For example, the tap interface for a task could be `172.18.0.1` and the VM would have the IP `172.18.0.2`. Now I'm returning a DriverNetwork struct (as per the StartTask function signature) with the `172.18.0.2` IP, a supplied port map (right now it's "http = 80") and AutoAdvertise set to true.

I'm running the nomad agent locally with `nomad agent -dev`.

Consul's health check shows as healthy, because I set the service check to use the driver address_mode.

The issue here is that nomad shows the allocation's addresses on the loopback interface (or whatever ip the network_interface is set in the client config.)

This is roughly my task config:

job "example" {
  group "vms" {
    task "vm" {
      driver = "firecracker"

      config {
        port_map {
          http = 80
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "http" {
            static = 80
          }
        }
      }

      service {
        name = "app123"
        port = "http"
        address_mode = "driver"
        check {
          address_mode = "driver"
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

and I get this on an allocation:

$ nomad alloc status 672164e9
ID                  = 672164e9
Eval ID             = 5dc2f315
Name                = example.vms[0]
Node ID             = 76c1e075
Job ID              = example
Job Version         = 6
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 20s ago
Modified            = 1s ago
Deployment ID       = db938cae
Deployment Health   = healthy

Task "vm" is "running"
Task Resources
CPU      Memory   Disk     Addresses
500 MHz  256 MiB  300 MiB  http: 127.0.0.1:80

Task Events:
Started At     = 2019-05-06T20:19:12Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2019-05-06T16:19:12-04:00  Started     Task started by client
2019-05-06T16:19:10-04:00  Task Setup  Building Task Directory
2019-05-06T16:19:10-04:00  Received    Task received by client

Is there a way to set it to show the 172.x IP I'm supplying it?

Does it matter? Should I just use Consul directly for service discovery? Consul appears to be storing the right address in its catalog (hence, the health check succeeds.)

Am I missing documentation on custom task drivers or are we expected to look into already-implemented task drivers (like docker, qemu, lxc, singularity, etc.)? The latter is fine, I know what it's like to run a small business :)

Thanks!

Michael Schurter

unread,
May 7, 2019, 12:44:13 PM5/7/19
to jer...@fly.io, Nomad
Hi Jerome!

(The short answer is: you're doing everything right as far as I can tell!)

This is really exciting! Great job figuring out the tricky DriverNetwork/AutoAdvertise code. The reason you see 127.0.0.1:80 in the allocation status output is because that's the port assignment done by the scheduler.

The job file's resources stanza requests the scheduler place it somewhere with port 80 free (the static port assignment inside the task resources stanza) on the interface specified by the client.network_interface configuration parameter. The dev agent uses the loopback device by default, so that's why 127.0.0.1 is used.

All of that can be thought of as the schedulers view of your job. The task config's port map and service stanzas are used by the Nomad client to configure the driver and Consul. Other services should rely on Consul for service discovery, not the Nomad scheduler's allocated ports.

Direct answers to your questions below:

On Mon, May 6, 2019 at 1:34 PM <jer...@fly.io> wrote:
Is there a way to set it to show the 172.x IP I'm supplying it?

No. The Resources block being displayed shows the resources reserved by the server for this allocation. The DriverNetwork isn't even communicated back to the server, so we'd need to add it to where the client reports resource utilization to make it visible via alloc status: https://github.com/hashicorp/nomad/blob/v0.9.1/client/allocrunner/alloc_runner.go#L896

Please open an issue if you'd like that to be exposed via the CLI. We always intended to expose it somehow but couldn't figure out the right way to distinguish it from the allocated network information. I'd hate to make the output more confusing! We also lacked the ability to pull it from clients when DriverNetwork was first implemented, so exposing it would have required storing it via Raft -- which is relatively expensive for storing debugging information. We have the ability to pull it from clients now though, so we should circle back and figure out how to expose it.
 
Does it matter? Should I just use Consul directly for service discovery? Consul appears to be storing the right address in its catalog (hence, the health check succeeds.)

It does not matter as long as you're using Consul for service discovery.
 
Am I missing documentation on custom task drivers or are we expected to look into already-implemented task drivers (like docker, qemu, lxc, singularity, etc.)? The latter is fine, I know what it's like to run a small business :)

There are although we need to link to them more prominently. It took me a minute to find them! I'll get that fixed ASAP. https://www.nomadproject.io/docs/internals/plugins/task-drivers.html

Please feel free to file issues or PRs against the docs as well. There's obviously a lot of subtlety, and we may have missed some things! Source for the website is in our main repo: https://github.com/hashicorp/nomad/tree/master/website
 
Network namespaces are coming soon (0.10 or 0.11). Everything will be backward compatible, so don't worry about losing work. It should add a lot of expressiveness to Nomad's networking functionality. Currently the relationship between client config, job resources, driver networks, etc is all pretty tricky to tie together.

If you're code is open source I'd love to see it and add a link from our docs when it's ready! Feel free to reach out to me directly as well.

Thanks,
Michael Schurter
Nomad Engineer

jer...@fly.io

unread,
May 7, 2019, 1:06:22 PM5/7/19
to Nomad
Thanks so much Michael, that helps a lot! I could not find these docs. (I didn't think to look in "internals")

If/when we do open source it, we'll definitely open a PR.

I think that answered all my questions, I do have one more though: do I need to specify address_mode = "driver" everywhere or could that somehow be implied like with the docker task driver examples?

Michael Schurter

unread,
May 7, 2019, 1:21:47 PM5/7/19
to jer...@fly.io, Nomad
On Tue, May 7, 2019 at 10:06 AM <jer...@fly.io> wrote:
I think that answered all my questions, I do have one more though: do I need to specify address_mode = "driver" everywhere or could that somehow be implied like with the docker task driver examples?

If your driver is setting DriverNetwork.AutoAdvertise=true, then by default Consul advertises the IP+Port set on the DriverNetwork. This is the code that controls it: https://github.com/hashicorp/nomad/blob/v0.9.1/command/agent/consul/client.go#L1209

Service stanzas can override the auto advertisement flag by manually specifying an address mode: https://www.nomadproject.io/docs/job-specification/service.html#address_mode


The reasoning is that driver networks are often used with overlays like Weave where you want to advertise the service on its overlay address, but since health checks are run by Consul on the same host they should communicate over the host address.

The intention was for the default behavior to work for the most common use case, but it's unfortunately confusing to explain.

Hope that helps!
Reply all
Reply to author
Forward
0 new messages