Hazelcast Exception Handling

504 views
Skip to first unread message

Nathan Williams

unread,
Sep 28, 2017, 7:45:33 PM9/28/17
to Hazelcast
Something I've found challenging when working with Hazelcast is understanding what to expect exception-wise.

As a caller, I generally do not know:
- What exceptions may result from a given method call on a Hazelcast API.
- How possible exceptions may vary depending on the topology and locality of the Hazelcast cluster I'm working with.
- Whether a JDK exception like IllegalStateException is from Hazelcast or from other code I'm running in the same vicinity (short of wrapping every Hazelcast-specific call in a try/catch).
- How changes in Hazelcast's internal implementation over time could impact any exception handling strategy I may try to employ.

Is this something that is improving as Hazelcast matures or that there is good documentation or a set of best practices around?  Given the complexity of what Hazelcast is doing, I'm not surprised that this is a challenge, but from a usage perspective this has been a source of friction.  It would be nice if there were at least a way to identify exceptions thrown by Hazelcast (without having to scan the stack trace).

http://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/core/HazelcastException.html
http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#common-exception-types

Christoph Engelbert

unread,
Oct 16, 2017, 2:55:13 AM10/16/17
to Hazelcast
Hey Nathan,

First of all, sorry for the long wait.

To your question, the answer is a little bit longer and more complicated but to make a good note at the beginning: we always try to optimize the exception handling and I expect for Hazelcast 4 to streamline the exceptions a bit into a more general fashion.

Ok so let's begin. As a general rule, we try to keep all of the common exceptions that are defined by the reimplemented APIs. That said, if the JavaDoc of the original API defines an exception for a special case, there's a very high chance that this exception will actually happen at the occurrences defined.

Anyhow you might see, at this is common to almost all exceptions in Hazelcast, an extended stacktrace which will have a stackframe that looks similar to at ------ End remote and begin local stack-trace ------.(Unknown Source). This shows, that the stacktrace in reality consists of two different ones which were merged by Hazelcast prior to returning it. The local part contains the actual local call path, just as you'd expect, the stackframes after this separator show the call path of the remote entity being involved in the call (the Hazelcast member node). A deserialization exception for example is very likely to come from a node and therefore being thrown on the node, being transported to the client. The client now merges its local call path with the exception received from the node, separating them with above separator.

For exception types itself, I'd recommend the page http://docs.hazelcast.org/docs/latest-development/manual/html/Common_Exception_Types.html as a first hint. However it does not talk about all exceptions. The following list will add a few more types of exceptions to the game.

HazelcastSerializationException: This exception is most likely to be thrown from the client protocol implementation and gives information about problems serializing or deserializing some client / node packet.
QueryResultSizeExceedException: Can be thrown when you query a large data set but the result size is above a configurable threshold. This exception is meant to prevent an out of memory on the callers side.
ConfigMismatchException: When a node joins with a non matching configuration (XML or programmatic), the exception will be thrown to prevent different behavior on different nodes (like one node's evicting but the others don't)
TransactionException: Guess the name speaks for itself

The most important however is, OperationTimeoutException. That operation can be thrown from all calls which involve remote operations. If the called node, for any reason, cannot respond in a meaningful (configurable) time, this operation will be thrown. It is unlikely to happen but nodes might hang in GC phases or are unresponsive for any other reason (deadlocks, JVM crash, ...) and therefore not respond.

A general rule is, that every special Hazelcast related exception inherits from HazelcastException. Most exception types, except the OperationTimeoutException or HazelcastInstanceNodeActiveException are defined in the Hazelcast specific JavaDocs like IMap. There's also a bunch of methods from Map and ConcurrentMap overridden, to add additional information to the JavaDoc of the original methods. If something is not defined, please file a bug.

For your question about IllegalStateException, that is hard to say. Normally Hazelcast tries to use as few common Java exception types (non HazelcastException extending) as possible, to make the differentiation of Hazelcast related and user code related exceptions easier. However, in places where a common exception is the most matching one, you might still see them bubbling up. In general I'd recommend to first expect a common exception to come from user code but the stacktraces should give more detail.

In terms of changes of the exceptions, we try to keep behavior the same for non-major version changes, as we understand that users depend on existing exception behavior. Exception messages on the other side sometimes change. Also it is possible that new exceptions are introduced to match new code branches or requirements but those are most likely to extend HazelcastException.

As mentioned at the beginning Hazelcast 3 has grown to where it is and Hazelcast 4, in the future, will streamline a lot of internal stuff to align exception handling, data structure apis and configuration but since we try to not make any breaking changes in non-major versions, we push those onwards.

I hope this helps a bit. If you miss anything, feel free to ask further questions.

Chris

Nathan Williams

unread,
Oct 16, 2017, 7:09:57 PM10/16/17
to Hazelcast
Thanks for the in depth reply.  Having a common base class (HazelcastException) is preferable, but I understand that changing signatures is a breaking change that has to be carefully coordinated and communicated.

One particular IllegalStateException I was running into was in a client (3.9-EA) where the cluster can't be reached or has died.  It seems to be a popular exception type in the Hazelcast code base for failed sanity checks in general (though I don't know how many are able to leak out to the caller like this one).

10/16/17 12:42:57.310 PM  IllegalStateException: Unable to connect to any address in the config! The following addresses were tried: [localhost/127.0.0.1:5701, localhost/127.0.0.1:5702] {Thread[qtp302366050-22,5,main]}

                at com
.hazelcast.client.connection.nio.ClientConnectionManagerImpl.connectToClusterInternal(ClientConnectionManagerImpl.java:760)
                at com
.hazelcast.client.connection.nio.ClientConnectionManagerImpl.connectToCluster(ClientConnectionManagerImpl.java:719)
                at com
.hazelcast.client.connection.nio.DefaultClientConnectionStrategy.start(DefaultClientConnectionStrategy.java:44)
                at com
.hazelcast.client.connection.nio.ClientConnectionManagerImpl.start(ClientConnectionManagerImpl.java:251)
                at com
.hazelcast.client.impl.HazelcastClientInstanceImpl.start(HazelcastClientInstanceImpl.java:415)
                at com
.hazelcast.client.HazelcastClientManager.newHazelcastClient(HazelcastClientManager.java:78)
                at com
.hazelcast.client.HazelcastClient.newHazelcastClient(HazelcastClient.java:72)



Reply all
Reply to author
Forward
0 new messages