CONSUL Road map

497 views
Skip to first unread message

XavM

unread,
May 15, 2014, 6:05:28 AM5/15/14
to consu...@googlegroups.com
Hello,

Could you give some hints about the road map for consul ?

We have read that : 

  - v0.3 could introduce some "leader election support" for services
  - Handlers, events and query (similar to Serf) could be introduced later
  - envconsul has just been announced

What are the others things you have in mind ?
What could be the timing for all this ?

Regards,

Xavier



Dr Nic Williams

unread,
May 15, 2014, 11:11:36 AM5/15/14
to XavM, consu...@googlegroups.com
Just last night I started pondering what my options were around consul and/or serf to manage leader/slave elections and triggering scripts. The event handler style from serf looked interesting - except for a service tracking it's cluster and knowing if it's been made the leader.

--
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mitchell Hashimoto

unread,
May 15, 2014, 12:07:42 PM5/15/14
to Dr Nic Williams, XavM, consu...@googlegroups.com
Xavier,

Besides what we say publicly, we don't like to make public promises of
features unless they're already in the pipeline.

As you bullet pointed out, we have some locking primitives coming in
0.3 to help facilitate leader election and some other use cases. The
primitives we're working on are very powerful and we're excited to see
how they're used. It has taken us some time to work on these though
since it has required quite a bit of research for what the tradeoffs
are in a distributed system with regards to locks.

If you have a specific feature request, I'd be happy to comment on it.

Best,
Mitchell

XavM

unread,
May 15, 2014, 7:50:48 PM5/15/14
to consu...@googlegroups.com, Dr Nic Williams, XavM
Thank you for your answer Mitchell,

My question about the road map is not really about one specific feature request, but more on where do you plan to go, when and how
(and where do you know you will not go)


That being said, the specific feature request I have, would be to expose in some way the event handler and query interface available in serf

I know about the blocking queries already available, but as far as I have seen, they work pretty well for kv, but are not so friendly for catalog and health : 

"X-Consul-Index" just keep incrementing every second or so, even when no node or service has been "changed"

Those idempotent writes make the use of blocking queries not so useful to detect changes 


An other concern is about the distinction that surfaces between KV on the one hand and "nodes + services" on the other hand

Both are really useful, but I still don't see how they will be "glued" together
If they are not, i feel that we could land with one great product (consul), but two distinct functionalities and workflows (KV vs n+s)

Do you plan to allow services and nodes to be bound to a subset ok keys ?

This could allow to discover services, including the nodes:port they are deployed on and associated states, but that would allow to discover their conf as well 

Ex:

cat ${data_dir}/myService.json

  "service": {
    "name": "myService",
    "kv": "/services/common/conf", // KV could be generic for the service
    "tags": [                                         // or they could be specific to a tag, overriding the generic
      {
        "name": "A",
        "kv": [ "/services/common/A/conf/", "/services/myService/A/conf/" ]
      }
     ],
     ...

GET /v1/catalog/service/myService?tag=A
[
  {
    "Node": "node1",
    "Address": "10.0.0.1",
    "ServiceID": "myService",
    "ServiceName": "myService",
    "ServiceTags": [
      "A"
    ],
    "ServicePort": 80,
    "keys": [
      "user=appUser",      // This kv comes from "/services/common/conf"
      "apiVersion=xxxx",  // This one comes from "/services/common/A/conf"
      "timeOut=3ms",       // This one comes from "/services/myService/A/conf"
    ]
  },
 ...

I do not pretend this example is the good way to do it, I am just wondering out loud how KV and n+s could be tightly integrated 


Lets pretend that we are in a perfect world, what I would love to have :
  
  Any change triggers an event, with change being any of the following :

    - Service is registered or de registered 
    - Service tags have changed
    - The pool of underlying nodes that expose this service and/or tag.service has changed (new nodes, less nodes, failing checks, etc ...)
    - Some of the KV associated with this service or tag.service have changed (CUD)


1 more question: Do you plan to implement KV store replication between datacenters ?

Anyway, congrats for the great job you have already done with all the awesome HashiProducts

Regards,

Xavier

Brian Lalor

unread,
May 15, 2014, 9:34:58 PM5/15/14
to consu...@googlegroups.com
On May 15, 2014, at 7:50 PM, XavM <mail...@gmail.com> wrote:

I know about the blocking queries already available, but as far as I have seen, they work pretty well for kv, but are not so friendly for catalog and health : 

"X-Consul-Index" just keep incrementing every second or so, even when no node or service has been “changed"

I just wanted to address this separately.  I also had this problem, but it was because the check output was different on every run.  I’ve modified my checks (I’m using the TTL style) so that the notes/output contain a fairly discrete set of output.  For example, I initially included the time taken to load a HTTP health check for one of my servers in the output, something like “request took 12ms”.  Well, that’s pretty variable (anywhere between 5 and 100ms, say) and each time consul saw that the output changed it’d trigger an update.  Instead I just set the notes to “ok” (or don’t set them at all).  It’s less useful, but cuts down on the number of events my monitoring system needs to handle.

--
Brian Lalor

Armon Dadgar

unread,
May 15, 2014, 9:35:07 PM5/15/14
to XavM, consu...@googlegroups.com, XavM, Dr Nic Williams
Hey,

Sorry, I’ve been a bit delayed to this thread. At this point, the public roadmap
for Consul is the following:

0.2.X: 
    UI improvements
    Bug fix release
    Expected in the next few weeks.

0.3: 
    Experimental support for locking / leader election 
    DNS performance knobs (TTL + Stale reads)
    Expected in the next few weeks

0.4:
    Support for handler system (ala Serf style)
    Potentially exposing some of the Serf features (Event/Query)
    Refine locking / leader election
    Expected probably several weeks after 0.3

In terms of some other questions raised here is a short list:

* KV Replication: I plan on releasing a daemon in the next few weeks that operates
   independently to do this. It will take a source + destination DC with a key prefix, and
   replicate from the source to the destination. This will allow you to specify a particular
   DC as authoritative for a key space and replicate master/slave style to other DCs. There
   are no plans to support master-master replication, as that is a huge can of worms.

* Integration of Catalog+KV data. I admit a design flaw on my part with this. If I could do
  it again, all Consul data would be exposed over a “/proc” like file system, where some keys
  are just magically populated while others are standard file-like entries. I don’t think its too
  late, and a v2 API could introduce a lot more unification in how data is exposed.

* Service Configs: It seems we need a stronger convention around this. We use envconsul
  with some conventions internally, but I can see the use for tighter integration with some  
  conventions on use. I sent an email to the list about this, and would love to get feedback before
  committing to anything. I do however, thing a 0.3 or 0.4 could introduce better support for this.        

Lastly, Xavier, the X-Consul-Index does not auto-increment, so there must be lots of idempotent
writes happening in your use case. Probably worth starting a thread about that, since it shouldn’t
be the case.

Best Regards,
Armon Dadgar

Armon Dadgar

unread,
May 15, 2014, 9:37:29 PM5/15/14
to consu...@googlegroups.com, Brian Lalor
Glad you brought this up! We were just talking about this. So I agree that you want to be able
to provide verbose output and you especially care when a check transitions from passing -> critical or
any other transition.

However, what we are thinking is a flag that controls how often “Output” is updated on the servers
if the state is quiescent. As an example, if the check remains in the passing state, only update
the output every 5 minutes. This way, you can have the verbose output you want, and when a check
transitions you get that output immediately, but for a stable check you get relatively up-to-date output.

Thoughts?

Best Regards,
Armon Dadgar

Brian Lalor

unread,
May 15, 2014, 9:57:27 PM5/15/14
to Armon Dadgar, consu...@googlegroups.com
On May 15, 2014, at 9:37 PM, Armon Dadgar <armon....@gmail.com> wrote:

Glad you brought this up! We were just talking about this.

I know.  I quoted part of Xav’s message. ;-)

However, what we are thinking is a flag that controls how often “Output” is updated on the servers
if the state is quiescent. As an example, if the check remains in the passing state, only update
the output every 5 minutes. This way, you can have the verbose output you want, and when a check
transitions you get that output immediately, but for a stable check you get relatively up-to-date output.

I think that seems reasonable.  As much as I’d like to be able to show up-to-moment information on my monitoring dashboard, the fact of the matter is that processing results for 10,000 checks takes some seconds to execute.  It’s not reasonable to update the data in real-time like that.  But I think this behavior should be laid out in the docs, as even ping checks and disk free reports can be quite variable.

Even better would be if there were a way to have Consul return what changed when a blocking query returns, rather than the current state of, say, a health check.  As it is, whenever a service changes state, I have to poll all the health states to determine the new state (critical → passing?  passing → warning?). I’m up to about 10,000 checks right now, so that’s a fair bit for my monitoring system to ingest.  What would *really* be useful would be something like /v1/health/state/changes that blocks and returns the state changes.

--
Brian Lalor

Reply all
Reply to author
Forward
0 new messages