Instances not being removed from Eureka

4,795 views
Skip to first unread message

Andrew Spyker

unread,
Jul 9, 2013, 5:54:52 PM7/9/13
to eureka_...@googlegroups.com
I am using Karyon to register service instances into Eureka - well sort of.  See [1] and [2], [3] for where I documented how I worked to get Karyon (and Eureka) starting up in my services.

If I stop the service nicely (tomcat bin/shutdown.sh), they go away from Eureka.  However, if the instance is terminated, I'd expect Eureka to time out the existence of a service.  I see in the wiki that it should likely go away in 90 seconds or so.  From what I've seen the instance stays in Eureka for a very long time (hours).

Is there a way to debug why the eureka info isn't expiring?

Given I hacked my way to Karyon, I suspect that is the issue.  However, if an instance is terminated, there is no way it's heartbeating, so it seems like something about Eureka's timeout isn't working in my environment.

nk...@netflix.com

unread,
Jul 10, 2013, 1:35:39 AM7/10/13
to eureka_...@googlegroups.com
If you are having a handful of instances in the eureka server, it is possible that eureka has got into self-preservation mode if a few of them have stopped sending heartbeats. Eureka does this to prevent against network partitions that may prevent servers from sending heartbeats.
You would have to restart eureka server with self-preservation disabled if you are testing this scenario.
More details about this mode can be found here: https://github.com/Netflix/eureka/wiki/Understanding-Eureka-Peer-to-Peer-Communication

daniel....@gmail.com

unread,
Jul 10, 2013, 5:26:46 PM7/10/13
to eureka_...@googlegroups.com

Self preservation mode is the most likely culprit here. Also, I'm exploring this myself, so here are my personal notes:

"If webapp is running and service healthcheck returns false periodically then the webapp service will be periodically registered with the servers. This means its lease does not expire, because it's been updated recently! This means the service will remain in the registry.

On the other hand, if the webapp uses no healthcheck callback but manually sets status to DOWN then the lease will expire and the service will be removed from the registry.

Similarly, if the webapp is killed then the lease will expire no matter what status it is set to: UP, DOWN... the lease will expire

Therefore, anything with a healthcheck callback will always remain in the registry."

Nitesh Kant

unread,
Jul 11, 2013, 2:54:14 AM7/11/13
to eureka_...@googlegroups.com
There are two periodic tasks that the DiscoveryClient (responsible for registering an instance state to discovery) does in this respect:

1) Send heartbeats.
2) Update instance status (UP/DOWN) in the discovery server.

The heartbeat thread makes sure that the lease is always renewed for this instance with the eureka server. Absence of heartbeats expires the lease and hence removes the instance from eureka.

Instance update task just transmits the current status of the instance, consulting the healthcheck handler (if available), to the eureka server. 

The instance is removed from eureka's registry in two scenarios:

1) Explicit cancel/unregister call from the instance.
2) Lease expiration.

2) will happen only if the heartbeats fail to reach the eureka server.
1) will happen when the application gracefully shutsdown.

Daniel, 

I am assuming when you say " if the webapp uses no healthcheck callback but manually sets status to DOWN" you mean shutdown on DiscoveryClient, which is case 1 above. In this case lease does not expire but it is cancelled. OTOH, if you meant just updating ApplicationInfoManager's status, then this will just lead to status updated as DOWN but the lease being renewed and hence the instance will remain in eureka's registry.

>>> "If webapp is running and service healthcheck returns false periodically then the webapp service will be periodically registered with the servers. This means its lease does not expire, because it's been updated recently! This means the service will remain in the registry"

This is not related to the healthcheck but the heartbeat thread which periodically renews the lease irrespective of the status.

>>> Therefore, anything with a healthcheck callback will always remain in the registry.
Again its not healthcheck but the heartbeat thread.

The self-preservation mode disables any lease expiration assuming that if a good percentage of servers (defaults to 85%) does not send heartbeats, then it must be a network partition as opposed to a catastrophic failure. What happens in cases when the number of servers is very low and a few of them de-register, discovery thinks its a network partition & gets into self-preservation hence ignoring lease expiry. However, cancels are always honored. So, if you are in this scenario, graceful shutdown will remove the instance from eureka where as abrupt termination will not.





--
You received this message because you are subscribed to the Google Groups "eureka_netflix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eureka_netfli...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Andrew Spyker

unread,
Jul 11, 2013, 5:15:33 PM7/11/13
to eureka_...@googlegroups.com
I need to get time to look at the information here more deeply.

I will first try disabling the self preservation mode as I am just prototyping (small number of instances - as few as 2) and there is a good change this could be cause.

Is there some log4j setting you'd like to see the logs from to more clearly see what is causing it?  I could just register a service, kill the instance and wait for an hour or so and put the log file on gist.  Alternatively, if you could point me to what log messages I should look for to see if self preservation is kicking in and/or what to look for on a lease timeout due to no heartbeat I would be more than willing to look myself.

Nitesh Kant

unread,
Jul 11, 2013, 5:43:19 PM7/11/13
to eureka_...@googlegroups.com
The error log here: https://github.com/Netflix/eureka/blob/master/eureka-core/src/main/java/com/netflix/eureka/PeerAwareInstanceRegistry.java#L539 will show that the self preservation mode is enabled. Also, the eureka UI shows the status that leases are not getting expired.

Andrew Spyker

unread,
Jul 24, 2013, 4:02:49 PM7/24/13
to eureka_...@googlegroups.com
I can confirm that after using enableSelfPreservation=false that it expires as expected, as documented in the FAQ by Chris

Reply all
Reply to author
Forward
0 new messages