How can I configure Eureka Server to recognize if a service is DOWN/OUT_OF_SERVICE?

8,869 views
Skip to first unread message

andreas.l...@gmail.com

unread,
Nov 3, 2014, 2:54:48 AM11/3/14
to eureka_...@googlegroups.com
Hi,

I'm trying to configure Eureka Server to recognize when a service registered with the Eureka server is returning DOWN/OUT_OF_SERVICE using a HealthCheckHandler on the clients DiscoveryClient.

But, it does not seem to be invoked by the Eureka server. I've seen some examples using Karyon, is this the only way?

How can I configure the Eureka server to invoke, e.g. /health on the given service to check the service internal state?

I cannot only rely on the service sending heart beats.

Thanks!

Kind regards,
Andreas

tb...@netflix.com

unread,
Nov 3, 2014, 12:10:23 PM11/3/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com
Hi,
Eureka server expects a client to send heartbeats, not the other way around. 
The heartbeat logic executes on the client side. The bottom line is that it updates local InstanceInfo status, which is next pushed to Eureka server.
Can you paste the code snippet with your setup/initialization logic?
/Tomasz

andreas.l...@gmail.com

unread,
Nov 4, 2014, 2:59:28 AM11/4/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com

I setup by:

discoveryClient.registerHealthCheck(new DefaultHealthCheckHandler(healthIndicatorService));

where 'healthIndicatorService' is a health check service from Spring Boot.

The config that I guess pushes out the instance info are (in DefaultEurekaClientConfig):

- instanceInfoReplicationIntervalSeconds;
- initialInstanceInfoReplicationIntervalSeconds;


Here's my HealthCheckHandler:


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.Status;

import se.payzone.crypto.service.HealthIndicatorService;
import se.payzone.crypto.utils.LogMessage;

import com.netflix.appinfo.HealthCheckHandler;
import com.netflix.appinfo.InstanceInfo.InstanceStatus;

public class DefaultHealthCheckHandler implements HealthCheckHandler {

private final Logger logger = LoggerFactory.getLogger(this.getClass());

private final HealthIndicatorService healthIndicatorService;

public DefaultHealthCheckHandler(
final HealthIndicatorService healthIndicatorService) {
this.healthIndicatorService = healthIndicatorService;
}

@Override
public InstanceStatus getStatus(final InstanceStatus currentStatus) {

this.logger.debug(LogMessage.createForAction("getStatus").markStart()
.toString());

InstanceStatus newStatus = InstanceStatus.UP;

final Health health = this.healthIndicatorService.health();
if (!Status.UP.equals(health.getStatus())) {
newStatus = InstanceStatus.OUT_OF_SERVICE;
}

this.logger.debug(LogMessage.createForAction("getStatus")
.addPart("instanceStatus", newStatus).markEnd().toString());

return newStatus;
}

}



Den måndagen den 3:e november 2014 kl. 18:10:23 UTC+1 skrev tb...@netflix.com:
> Hi,Eureka server expects a client to send heartbeats, not the other way around. 

tb...@netflix.com

unread,
Nov 4, 2014, 6:44:48 PM11/4/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com
Hi,

Your code looks good. Maybe you have invalid configuration. I have written simple client that is doing pretty much the same, but has hardcoded instance status:

Client code:
public class SampleDiscoveryClient {

    public static void main(String[] args) {

        System.setProperty("eureka.region", "default");
        System.setProperty("eureka.environment", "test");
        System.setProperty("eureka.client.props", "sample-eureka-client");

        DiscoveryManager.getInstance().initComponent(
                new MyDataCenterInstanceConfig(),
                new DefaultEurekaClientConfig());

        ApplicationInfoManager.getInstance().setInstanceStatus(InstanceStatus.UP);
        DiscoveryClient discoveryClient = DiscoveryManager.getInstance().getDiscoveryClient();
        discoveryClient.registerHealthCheck(new DefaultHealthCheckHandler());

        System.out.println("Waiting indefinitely");
        while (true) {
            try {
                Thread.sleep(1000000);
            } catch (InterruptedException e) {
                // IGNORE
            }
        }
    }
}

Health check handler:
public class DefaultHealthCheckHandler implements HealthCheckHandler {

    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    @Override
    public InstanceStatus getStatus(final InstanceStatus currentStatus) {

        this.logger.debug("Called get status with current status=" + currentStatus);

        InstanceStatus newStatus = InstanceStatus.OUT_OF_SERVICE;

        this.logger.debug("Setting instance status to " + newStatus);

        return newStatus;
    }
}

Client configuration (taken from eureka/eureka-server/conf/sampleclient):
###Eureka Client configuration for Sample Eureka Client

#Properties based configuration for eureka client. The properties specified here is mostly what the users
#need to change. All of these can be specified as a java system property with -D option (eg)-Deureka.region=us-east-1
#For additional tuning options refer <url to go here>


#Region where eureka is deployed -For AWS specify one of the AWS regions, for other datacenters specify a arbitrary string
#indicating the region.This is normally specified as a -D option (eg) -Deureka.region=us-east-1
eureka.region=default

#Name of the application to be identified by other services

eureka.name=sampleEurekaClient

#Virtual host name by which the clients identifies this service
#eureka.vipAddress=eureka.mydomain.net

#The port where the service will be running and servicing requests
#eureka.port=80

#For eureka clients running in eureka server, it needs to connect to servers in other zones
eureka.preferSameZone=true

#Change this if you want to use a DNS based lookup for determining other eureka servers. For example
#of specifying the DNS entries, check the eureka-client-test.properties, eureka-client-prod.properties
eureka.shouldUseDns=false

eureka.us-east-1.availabilityZones=default

eureka.serviceUrl.default=<your_discovery_service_url>

It worked for me with my test cluster.

/Tomasz

andreas.l...@gmail.com

unread,
Nov 7, 2014, 3:47:30 AM11/7/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com
I tested your code and it does seem to work!

But, if the result from my HealthCheckHandler changes after a while it does not seem to end-up in Eureka server?

It works when I call the Eureka server using the REST API, but I cannot get it to change status because of the result from the HealthCheckHandler.

Regards,
Andreas

tb...@netflix.com

unread,
Nov 7, 2014, 12:16:09 PM11/7/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com
Hi,

I have modified my code a little bit to read instance status from terminal, which is next used by healthcheck during next invocation:

public class SampleDiscoveryClient {

    public static void main(String[] args) {

        System.setProperty("eureka.region", "default");
        System.setProperty("eureka.environment", "test");
        System.setProperty("eureka.client.props", "sample-eureka-client");

        DiscoveryManager.getInstance().initComponent(
                new MyDataCenterInstanceConfig(),
                new DefaultEurekaClientConfig());

        ApplicationInfoManager.getInstance().setInstanceStatus(InstanceStatus.UP);
        DiscoveryClient discoveryClient = DiscoveryManager.getInstance().getDiscoveryClient();
        discoveryClient.registerHealthCheck(new DefaultHealthCheckHandler());

        LineNumberReader lr = new LineNumberReader(new InputStreamReader(System.in));
        while (true) {
            System.out.print("Enter new status: ");
            try {
                String status = lr.readLine();
                DefaultHealthCheckHandler.nextStatus = InstanceStatus.valueOf(status);
                System.out.println("Set new status value to " + DefaultHealthCheckHandler.nextStatus);
            } catch (IOException e) {
                e.printStackTrace();
            } catch (IllegalArgumentException ex) {
                System.err.println("Invalid status value");
            }
        }
    }
}

public class DefaultHealthCheckHandler implements HealthCheckHandler {

    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    public static InstanceStatus nextStatus = InstanceStatus.UP;

    @Override
    public InstanceStatus getStatus(final InstanceStatus currentStatus) {

        this.logger.debug("Called get status with current status=" + currentStatus);

        InstanceStatus newStatus = nextStatus;

        this.logger.debug("Setting instance status to " + newStatus);

        return newStatus;
    }
}

It works for me. Whenever I change instance status, when healthcheck is called next time (at 30sec interval), the Eureka registry gets updated.
You should see the following log lines printed after your healthcheck is called and changes instance status:

DOWN
Set new status value to DOWN
Enter new status: 2014-11-07 09:09:17,368 INFO  com.netflix.discovery.DiscoveryClient$InstanceInfoReplicator:1651 [DiscoveryClient-3] [run] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - retransmit instance info with status DOWN
2014-11-07 09:09:17,368 INFO  com.netflix.discovery.DiscoveryClient:614 [DiscoveryClient-3] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak: registering service...
2014-11-07 09:09:17,443 INFO  com.netflix.discovery.DiscoveryClient:619 [DiscoveryClient-3] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - registration status: 204
2014-11-07 09:09:38,245 INFO  com.netflix.discovery.DiscoveryClient$HeartbeatThread:1590 [pool-2-thread-1] [run] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - Re-registering apps/SAMPLEEUREKACLIENT
2014-11-07 09:09:38,246 INFO  com.netflix.discovery.DiscoveryClient:614 [pool-2-thread-1] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak: registering service...
2014-11-07 09:09:38,322 INFO  com.netflix.discovery.DiscoveryClient:619 [pool-2-thread-1] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - registration status: 204
OUT_OF_SERVICE
Set new status value to OUT_OF_SERVICE
Enter new status: 2014-11-07 09:10:17,453 INFO  com.netflix.discovery.DiscoveryClient$InstanceInfoReplicator:1651 [DiscoveryClient-1] [run] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - retransmit instance info with status OUT_OF_SERVICE
2014-11-07 09:10:17,453 INFO  com.netflix.discovery.DiscoveryClient:614 [DiscoveryClient-1] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak: registering service...
2014-11-07 09:10:17,527 INFO  com.netflix.discovery.DiscoveryClient:619 [DiscoveryClient-1] [register] DiscoveryClient_SAMPLEEUREKACLIENT/lgml-tbak - registration status: 204

andreas.l...@gmail.com

unread,
Nov 10, 2014, 7:44:36 AM11/10/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com

Hmm, your code indeed makes a good example of it working.

I think I found an issue that might cause things to mess it up for my code.

I run 2 copies of the same service (different name) on same host (same IP, same hostname) on my machine.

It seems like that Eureka manages services on host level?

Thing is that if I have a service running as two different processes on same machine and I change the status using REST API, bot services get the new status even though my REST call was targeted a single instance?

Consider two application registered with IDs "My-SERVICE-8042" and "MY-SERVICE-8081" on same machine/instance called "MYHOST57".

This call will change status for both of the applications:

http://localhost:8761/v2/apps/MY-SERVICE-8042/MYHOST57/status?value=UP



//Andreas

tb...@netflix.com

unread,
Nov 10, 2014, 12:19:42 PM11/10/14
to eureka_...@googlegroups.com, andreas.l...@gmail.com
I tried to replicate this error in my environment but it always works fine. What I did, I run two instances of the app I posted above, but with two different names.
First I tried in my local deployment. I could change the statuses independently by posting a new state directly like you did above. Only the target app instance was updated.
The same in AWS deployment.
I vaguely remember now an issue with two apps deployed in the same node, but I cannot remember the details of it.
Looking into the code, we always start with application record fetch, which aggregates a list of server instances. If the latter have overlapping names, it does not matter, as long as app names are different.
/Tomasz

Andreas Eriksson

unread,
Nov 11, 2014, 2:27:37 AM11/11/14
to tb...@netflix.com, eureka_...@googlegroups.com
Hmm, ok...

I can see my client sending its status to URL apps/MY-SERVICE-8042/MYHOST57?status=UP&lastDirtyTimestamp=...

What's the name of the endpoint in Eureka receiving this update? I'll guess I have to debug the request.

PS. Thank you for spending time on something that probably is because me mis-configured och coded something wrong :-/

--
¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Andreas Eriksson

¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Andreas Eriksson

unread,
Nov 11, 2014, 2:49:07 AM11/11/14
to tb...@netflix.com, eureka_...@googlegroups.com
I found the class InstanceResource with 'renewLease' method.

I can see that the query param 'status' is set to UP, param 'overriddenStatus' is null and 'lastDirtyTimestamp' is set.

From what I can tell the only part setting a new status is: registry.storeOverriddenStatusIfRequired(this.id, InstanceInfo.InstanceStatus.valueOf(overriddenStatus));

But the condition to get to this code is never fulfilled:

if ((response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()) && (overriddenStatus != null) && (!InstanceInfo.InstanceStatus.UNKNOWN.equals(overriddenStatus)) && (isFromReplicaNode))

Am I'm on to something?


--
¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Andreas Eriksson

¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Andreas Eriksson

unread,
Nov 11, 2014, 5:03:44 AM11/11/14
to tb...@netflix.com, eureka_...@googlegroups.com
Sorry for spamming :-/

Think I found the problem.

My HealthCheckHandler returned status 'OUT_OF_SERVICE' which seems to be a state you cannot get back to status 'UP' or any other status (?).

It works fine when switching between 'UP' and 'DOWN'.


--
¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Andreas Eriksson

¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

tb...@netflix.com

unread,
Nov 13, 2014, 11:55:32 AM11/13/14
to eureka_...@googlegroups.com, tb...@netflix.com, andreas.l...@gmail.com
The typical pattern is to use UP/DOWN status values on the application side, and OUT_OF_SERVICE from the management console.  For example if you use Asgard (https://github.com/Netflix/asgard), you can disable services there (set OUT_OF_SERVICE) when doing red/black pushes.
Have you found out the reason why setting OUT_OF_SERVICE does not work for you?

johnw...@gmail.com

unread,
Oct 23, 2015, 1:10:03 PM10/23/15
to eureka_netflix, andreas.l...@gmail.com

I believe I am seeing a similar issue. If I send an OUT_OF_SERVICE request to a particular app on a given host that runs multiple apps, all Eureka enabled apps on that same host start registering an OUT_OF_SERVICE state.

For example... in my test environment I have a Eureka-server instance and 3 Eureka enabled apps running across 3 nodes. If I issue an OUT_OF_SERVICE to appA running on node1(curl -X PUT http://node1:9090/eureka/v2/apps/appA/node1/status?value=OUT_OF_SERVICE), all apps (including Eureka-server) on node1 start showing an OUT_OF_SERVICE state. Is this expected behavior?

Thanks.

-John

Tomasz Bak

unread,
Oct 23, 2015, 1:44:00 PM10/23/15
to eureka_...@googlegroups.com, andreas.l...@gmail.com
I can confirm that this is current behavior, and it is actually a bug in the implementation.
Please, report it as an issue on github.com/Netflix/eureka. It may take however some weeks before we have time to fix that.


--
You received this message because you are subscribed to the Google Groups "eureka_netflix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eureka_netfli...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

johnw...@gmail.com

unread,
Oct 23, 2015, 2:31:53 PM10/23/15
to eureka_netflix, andreas.l...@gmail.com
Will do. Thanks for the quick response.
Reply all
Reply to author
Forward
0 new messages