Bootstrapping Iris on a CoreOS cluster

44 views
Skip to first unread message

James Cooper

unread,
Oct 3, 2014, 3:00:18 PM10/3/14
to projec...@googlegroups.com
hey folks,

I've been playing with Docker and CoreOS.  CoreOS ships with etcd, and they provide a cluster discovery boostrap endpoint at discovery.etcd.io that you can use to get your CoreOS cluster converged.  More info on that here: https://coreos.com/docs/cluster-management/setup/cluster-discovery/

I don't know if this is taboo to ask, but I wonder if the iris relay could optionally accept one or more host arguments, which if provided, would be used to bootstrap the cluster discovery.  The relay would continue to probe the network for other members of course.

My thinking is that if CoreOS is providing this mechanism for free, does it make sense to leverage it?  In environments with huge network address spaces (e.g. non-VPC EC2) this could prove useful.

Is this a dumb idea?

-- James

Péter Szilágyi

unread,
Oct 4, 2014, 5:03:44 AM10/4/14
to James Cooper, projec...@googlegroups.com
Hey James,

  Nothing is taboo :)... yet :D

  I was also aiming to do a containerized Iris on top of Docker and CoreOS, so it's definitely an idea that should be considered. I'm a bit reluctant though to add platform specific stuff into Iris, especially if that entails additional configuration parameters. The whole point of Iris would be zero configuration - as much as possible that is - and adding such options would kinda' dent this promise.

  I am not familiar with CoreOS yet, so I don't know if what follows would be something doable or not, but if we could do this CoreOS discovery without needing an additional configuration parameter (i.e. have the bootstrapper figure out by itself if its on some specific platform (CoreOS, Serf, whatnot) that has bootstrapper help and take advantage of it), than that would be a much more sympathetic solution and actually one I would be easily convinced to add/accept.

  Btw, the bootstrapper is one of the oldest modules of the system that was hacked together to allow working on higher level stuff, and is in dire need of a rework/replacement, so now is probably the best possible time to discuss changes, features, etc for it :) So by all means, keep the ides flowing :D

Cheers,
  Peter

--
You received this message because you are subscribed to the Google Groups "Iris cloud messaging" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-iris...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Péter Szilágyi

unread,
Oct 22, 2014, 12:18:01 PM10/22/14
to James Cooper, projec...@googlegroups.com
Hi James,

  I've implemented a CoreOS/etcd based seamless seed server into the bootstrapper. It is not yet pushed to GitHub as I'll need a day more to reorganize the rest of the ad hoc bootstrap mechanisms and didn't want to push too much of a work in progress. Hopefully it'll get in there tomorrow.

  However, given that you're playing with Amaozn EC2 + CoreOS, could you eventually spare the time to test it whether the seed server functionality works as expected? I haven't gotten to the point of preparing docker images for Iris and CoreOS images below that, so if you have something already that you can just plug in and check, it would be of great help to know that it *seems* to work :)

Cheers,
  Peter

James Cooper

unread,
Oct 22, 2014, 12:21:56 PM10/22/14
to Péter Szilágyi, projec...@googlegroups.com
Hi Peter,

I actually have a free day on Friday, so I could do some EC2 + CoreOS tests then.  Could you push a branch of Iris that has the etcd bootstrapper support to GitHub?  I could make a Docker image from that branch.

I'm a bit of a Docker/CoreOS novice, so no promises that I'll do things the right way, but I'm happy to give it a go and see what happens.

cheers

-- James
--

James Cooper
Principal Consultant - Bitmechanic LLC
http://www.bitmechanic.com/

Péter Szilágyi

unread,
Oct 22, 2014, 12:24:33 PM10/22/14
to James Cooper, projec...@googlegroups.com
It is not yet integrated into the bootstrapper, only works through the tests so you won't be able to use it at the moment (and I have to run off in 7 minutes, so I cannot hack together anything right now :P), but I'll promise to push something reasonable either later tonight or tomorrow the latest :)

Péter Szilágyi

unread,
Oct 23, 2014, 12:07:53 PM10/23/14
to James Cooper, projec...@googlegroups.com
Hi,

  I've pushed a code with an updated bootstrapper. There are a few know issues still, so take it with a grain of salt: https://github.com/project-iris/iris/issues/55. You shouldn't need to configure anything at all to take advantage of the CoreOS etcd service. If it's running, Iris will integrate it... hopefully. I've also added leveled logging to the bootstrapper (and a bit more detailed for now to the CoreOS seeder) so you should be able to see if it discovers anything.

  No, currently there isn't a way to retrieve the active peer list. It is something I've planned for a long time (along with a lot of other monitoring stuff), but I haven't gotten to them yet (hey, but I did complete my dissertation :D). I'll keep working on this part of the system in the coming days (though maybe not till Monday), but I thought I'd ping you if you're up to experimenting :)

Cheers,
  Peter

On Wed, Oct 22, 2014 at 7:26 PM, James Cooper <jamesp...@gmail.com> wrote:
Sounds good.  Any tips on usage would be appreciated.

Also, is there a good way to ask the Iris relay who its known peers are?  Or is our only way of measuring convergence time to look at log output?

-- James

James Cooper

unread,
Oct 24, 2014, 12:49:25 PM10/24/14
to projec...@googlegroups.com
Hi there,

Ok I've started playing around with this. One thing I see is that the relays are trying to connect on random high ports.  I see stuff like this:

2014/10/24 16:35:52 pastry: failed to dial remote peer at 172.31.6.73:52046: dial tcp 172.31.6.73:52046: i/o timeout.
and:
2014/10/24 15:04:23 pastry: failed to dial remote peer at 172.31.16.232:36275: dial tcp 172.31.16.232:36275: i/o timeout.

(a) Is there a port range I should open in the EC2 security group?  Or do I need to open all TCP ports between nodes in the cluster?

(b) My original plan was to run Iris in a container on the CoreOS machines, but given how CoreOS/docker exposes ports, I'm not sure this will be possible.  Specifically I was doing something like:

docker run -d -p 55555:55555 coopernurse/iris

Which exposes port 55555 from the relay instance to the host (so other containers could connect), but it looks like Iris chats on some other random ports at runtime.

___

I'm going to switch gears and try running Iris directly on the CoreOS host with a wide open security group,but if there's a better way to do this please let me know.

-- James
Reply all
Reply to author
Forward
0 new messages