node failure/stops inserting

Colin

unread,

Mar 1, 2011, 9:22:07 PM3/1/11

to hector...@googlegroups.com

I am writing to a small cluster of 4 nodes using the latest Hector client.

I am setting hostConfig.setAutoDiscoverHosts(true);

And

A custom ConsistencyLevel of HConsistencyLevel.ANY; is being used.

When I take a node down, I receive:

Problem executing insert:: May not be enough replicas present to handle
consistency level.
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be
enough replicas present to handle consistency level.
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Exceptio
nsTranslatorImpl.java:52)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServi
ceImpl.java:95)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServi
ceImpl.java:88)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.jav
a:101)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(H
ConnectionManager.java:221)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(Key
spaceServiceImpl.java:129)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceSer
viceImpl.java:100)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceSer
viceImpl.java:106)
at
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:2
03)
at
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:2
00)
at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeas
ure(KeyspaceOperationCallback.java:20)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace
.java:85)
at
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:200)
at
com.cep.darkstar.offramp.TopicToCassandra$Publish.insert(TopicToCassandra.ja
va:71)
at
com.cep.darkstar.offramp.TopicToCassandra$Publish.run(TopicToCassandra.java:
90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
at java.lang.Thread.run(Thread.java:662)
Caused by: UnavailableException()
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.jav
a:16485)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.jav
a:916)
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890
)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServi
ceImpl.java:93)
... 16 more

Is this the expected behavior? Replication factor for the keyspace is set
at 1.

When I bring the node back up, insertion resumes.

I'm not initially connecting to the node that I'm bringing down.

-----Original Message-----
From: hector...@googlegroups.com [mailto:hector...@googlegroups.com]
On Behalf Of Nate McCall
Sent: Tuesday, March 01, 2011 1:38 PM
To: hector...@googlegroups.com
Subject: Re: MultigetSliceQuery fails with IndexOutOfBoundsException, when
Integer type keys and column names are used

Cassandra is pretty efficient with storing bytes, yes. IMO, the amount of
space that would save is not at all worth the effort.

On Tue, Mar 1, 2011 at 1:32 PM, Aklin_81 <asd...@gmail.com> wrote:
> Nate, another thing that you said about :- "IntegerSerializer is
> going to be packing your key as a four byte integer." . Does this mean
> that I should store the values that are very very small than the 4
> bytes as byte[] serialized, perhaps, as that could help me economize
> some space, though quite minimal. I thought that since cassandra works
> only with byte[] to store the data, it would be smartly storing the
> integer values in efficient manner consuming only the necessary bytes.
>
>
>
> On Tue, Mar 1, 2011 at 10:16 PM, Nate McCall <na...@datastax.com> wrote:
>> IntegerSerializer is going to be packing your key as a four byte
>> integer. How did you insert this data? Via the cassandra-cli? If so,
>> that may have packed the raw two byte value in that case, thus the
>> mismatch. If you created and inserted the row initially with
>> IntegerSerializer, than it would work (if you did use
>> IntegerSerializer for insertion, then we have an issue).
>>
>> On Tue, Mar 1, 2011 at 12:35 AM, Aklin_81 <asd...@gmail.com> wrote:
>>> My column family(Standard) uses bytes type comparator. And I am
>>> passing <Integer, Integer,byte[]> as serializers for
>>> <keySerializer,nameSerializer, valueSerializer>
>>>
>>> I tried the following code:-
>>> *************************
>>> Integer[] keys={43};
>>> Integer[] columns={42};
>>>
>>> MultigetSliceQuery<Integer, Integer,byte[]> query =
>>> HFactory.createMultigetSliceQuery("trialKeyspace",
>>> integerSerializer, integerSerializer, BYTES_ARRAY_SERIALIZER);
>>>
>>> query.setColumnFamily("ProfileData").setKeys(keys).setColumnNames(co
>>> lumns);
>>>
>>>
>>> QueryResult<Rows<Integer, Integer, byte[]>> queryResult =
>>> query.execute();
>>>
>>> *****************************
>>>
>>> However, If I change keyserializer & 'keys' to be of byte[] type,
>>> then query is successful.
>>>
>>>
>>>
>>>
>>> On Tue, Mar 1, 2011 at 9:31 AM, Nate McCall <na...@datastax.com> wrote:
>>>> What are the values for the Serializers in the above code? Do they
>>>> match up with how you have declared the column family comparator?
>>>>
>>>> On Mon, Feb 28, 2011 at 8:59 PM, Asil <asd...@gmail.com> wrote:
>>>>> Can anyone point out if I have done anything wrong above ?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>
>>
>

Nate McCall

unread,

Mar 1, 2011, 10:05:43 PM3/1/11

to hector...@googlegroups.com, Colin

Well we are only wrapping what is coming back from Cassandra in this
case - the UnvailableException. Could you give the full example of how
you are setting CL.ANY? In theory, the coordinator should write a hint
locally and acknowledge the write as successful in this case. Do you
per chance have hinted handoff disabled for this column family?

Colin

unread,

Mar 1, 2011, 10:49:30 PM3/1/11

to Nate McCall, hector...@googlegroups.com

Here's the consistency level class, and I'm not disabling hinted handoff:

// consistency level
private static class MyConsistencyLevel implements
ConsistencyLevelPolicy {
@Override
public HConsistencyLevel get(OperationType op, String arg1)
{
switch(op) {
case READ: return HConsistencyLevel.ONE;
case WRITE: return HConsistencyLevel.ANY;
}
return HConsistencyLevel.ONE;
}

@Override
public HConsistencyLevel get(OperationType op) {
switch(op) {
case READ: return HConsistencyLevel.ONE;
case WRITE: return HConsistencyLevel.ANY;
}
return HConsistencyLevel.ONE;

Nate McCall

unread,

Mar 1, 2011, 11:00:32 PM3/1/11

to hector...@googlegroups.com

That looks fine - just make sure you are using this policy to create
the Keyspace via one of the overloaded HFactory#createKeyspace methods
that takes a CL policy?

Colin

unread,

Mar 1, 2011, 11:03:22 PM3/1/11

to hector...@googlegroups.com

Will this work:

CassandraHostConfigurator hostConfig = new
CassandraHostConfigurator(cassandraHostName+":9160");
hostConfig.setAutoDiscoverHosts(true);
cluster = HFactory.createCluster("xxx", hostConfig);
ko = HFactory.createKeyspace("yyy", cluster);
ko.setConsistencyLevelPolicy(mcl);

--
Colin Clark
+1 315 886 3422 cell
+1 701 212 4314 office
http://cloudeventprocessing.com
http://blog.cloudeventprocessing.com
@EventCloudPro

Nate McCall

unread,

Mar 1, 2011, 11:08:31 PM3/1/11

to hector...@googlegroups.com, Colin

It should. Would be more direct to do the following though:
ko = HFactory.createKeyspace("yyy",cluster,mcl);

Colin

unread,

Mar 1, 2011, 11:15:55 PM3/1/11

to Nate McCall, hector...@googlegroups.com

Ok, I did that, and I'm still getting the same error.

This seems rather serious to me - I hope that I'm just doing something wrong
here.

Nate McCall

unread,

Mar 1, 2011, 11:19:03 PM3/1/11

to colp...@gmail.com, hector...@googlegroups.com

At this point I would recommend turning logging up to DEBUG on the
Cassandra node coordinating the request to what is showing up on the
server.

Colin

unread,

Mar 1, 2011, 11:34:46 PM3/1/11

to Nate McCall, hector...@googlegroups.com

This is the debug trace from the node the hector client is logging into.

This is what occurs right after I kill the node, and then after I restart
(after the node restarts, then the client inserting resumes inserting):

DEBUG 22:33:14,566 Pre-mutation index row is null
DEBUG 22:33:14,566 applying index row DecoratedKey(4d4f4d,
4d4f4d):ColumnFamily(<anonymous>
[32666437346236302d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040394006006,])
DEBUG 22:33:14,566 applying index row DecoratedKey(41,
41):ColumnFamily(<anonymous>
[32663335336630332d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040392944045,])
DEBUG 22:33:14,566 applying index row DecoratedKey(0000012e74d75b16,
0000012e74d75b16):ColumnFamily(<anonymous>
[32666437346236302d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040394006000,])
DEBUG 22:33:14,567 applying index row DecoratedKey(4649584f7264657241636b,
4649584f7264657241636b):ColumnFamily(<anonymous>
[32666437346236302d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040394006009,])
DEBUG 22:33:14,566 Processing response on a callback from 10062@/192.168.1.8
DEBUG 22:33:14,566 applying index row DecoratedKey(0000012e74d756f0,
0000012e74d756f0):ColumnFamily(<anonymous>
[32663335336630332d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040392944039,])
DEBUG 22:33:14,567 applying index row DecoratedKey(4649584f7264657241636b,
4649584f7264657241636b):ColumnFamily(<anonymous>
[32663335336630332d343438362d313165302d623233352d343438376663643432303762:fa
lse:0@1299040392944048,])
INFO 22:33:17,801 error writing to /192.168.1.5
DEBUG 22:33:17,801 error was
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpCon
nection.java:112)
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.jav
a:88)
DEBUG 22:33:18,801 attempting to connect to /192.168.1.5
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:19,711 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,435 ... timed out
DEBUG 22:33:20,435 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,454 ... timed out
DEBUG 22:33:20,455 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,455 ... timed out
DEBUG 22:33:20,455 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,465 ... timed out
DEBUG 22:33:20,465 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,470 ... timed out
DEBUG 22:33:20,470 logged out: #<User allow_all groups=[]>
DEBUG 22:33:20,552 ... timed out
DEBUG 22:33:20,552 logged out: #<User allow_all groups=[]>
INFO 22:33:20,802 InetAddress /192.168.1.5 is now dead.
DEBUG 22:33:20,802 Resetting pool for /192.168.1.5
DEBUG 22:33:29,806 attempting to connect to /192.168.1.5
DEBUG 22:33:40,810 attempting to connect to /192.168.1.5
DEBUG 22:33:53,815 attempting to connect to /192.168.1.5
DEBUG 22:33:58,709 Disseminating load info ...
INFO 22:34:06,821 Node /192.168.1.5 has restarted, now UP again
INFO 22:34:06,822 Checking remote schema before delivering hints
DEBUG 22:34:06,822 schema for /192.168.1.5 matches local schema
INFO 22:34:06,822 Sleeping 3928ms to stagger hint delivery
DEBUG 22:34:07,399 attempting to connect to /192.168.1.5
DEBUG 22:34:07,402 Node /192.168.1.5 state normal, token
143438270201575387536392136519526174611
INFO 22:34:07,402 Node /192.168.1.5 state jump to normal
DEBUG 22:34:07,402 clearing cached endpoints
DEBUG 22:34:07,402 clearing cached endpoints
DEBUG 22:34:07,402 clearing cached endpoints
DEBUG 22:34:07,402 No bootstrapping or leaving nodes -> empty pending ranges
for FIX
DEBUG 22:34:07,403 No bootstrapping or leaving nodes -> empty pending ranges
for BBO
INFO 22:34:10,750 Started hinted handoff for endpoint /192.168.1.5
INFO 22:34:10,750 Finished hinted handoff of 0 rows to endpoint
/192.168.1.5
DEBUG 22:34:58,710 Disseminating load info ...

Nate McCall

unread,

Mar 2, 2011, 11:31:55 AM3/2/11

to colp...@gmail.com, hector...@googlegroups.com

The 2nd to last line of the logging output indicates no hints were
captured on this node. There are one of two scenarios here:
- There is an issue in Cassandra where CL.ANY is not doing what it is
supposed to
- The mutation issued by hector is not getting set to CL.ANY

I'll see if I can reproduce this locally.

Reply all

Reply to author

Forward