> I suggest you set discovery.zen.minimum_master_nodes to a higher value, in
> your case, something like 2 or 3. Then, if a node looses connection to
> other nodes, it will not "form its own cluster", but will try and rejoin
> and forma cluster with that minimum specified.
> On Mon, Jan 16, 2012 at 10:16 PM, Grant <gr...@brewster.com> wrote:
> > We're using unicast now (Rackspace doesn't allow multicast traffic).
> > Here's a sample of what's in the logs during the issues. This kind of
> > things was steaming pretty much continuously:
> > [2012-01-16 02:52:41,711][WARN ][indices.cluster ] [prod-es-
> > r03] [contact_documents-527859-0][0] master [[prod-es-r06][IfNWkYASSg-
> > TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as started, but
> > shard have not been created, mark shard as failed
> > [2012-01-16 02:52:41,711][WARN ][cluster.action.shard ] [prod-es-
> > r03] sending failed shard for [contact_documents-527859-0][0],
> > node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master [prod-es-
> > r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]] marked shard
> > as started, but shard have not been created, mark shard as failed]
> > [2012-01-16 02:52:41,880][WARN ][indices.cluster ] [prod-es-
> > r03] [contact_documents-194054-1322678627][0] master [[prod-es-r06]
> > [IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as
> > started, but shard have not been created, mark shard as failed
> > [2012-01-16 02:52:41,880][WARN ][cluster.action.shard ] [prod-es-
> > r03] sending failed shard for [contact_documents-194054-1322678627]
> > [0], node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master
> > [prod-es-r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]]
> > marked shard as started, but shard have not been created, mark shard
> > as failed]
> > [2012-01-16 02:52:41,894][WARN ][indices.cluster ] [prod-es-
> > r03] [contact_documents-527859-0][0] master [[prod-es-r06][IfNWkYASSg-
> > TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as started, but
> > shard have not been created, mark shard as failed
> > [2012-01-16 02:52:41,894][WARN ][cluster.action.shard ] [prod-es-
> > r03] sending failed shard for [contact_documents-527859-0][0],
> > node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master [prod-es-
> > r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]] marked shard
> > as started, but shard have not been created, mark shard as failed]
> > On Jan 16, 2:57 pm, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote:
> > > You might want to try switching from multicast to unicast just to
> > > eliminate a variable.
> > > Some networks don't treat multicast traffic very well.
> > > It's also useful to look at the logs for the ES nodes during these
> > > outages. What do they say?