Secondary indices don't work with IntegerSerializer due to padded zeros

5 views
Skip to first unread message

tcn

unread,
Jan 18, 2011, 1:47:50 PM1/18/11
to hector-dev

Ran Tavory

unread,
Jan 18, 2011, 3:53:47 PM1/18/11
to hecto...@googlegroups.com, tcn
What do you guys think?
@tcn you make a good point that this should be fixed on the server, by any chance did you bring it up on users@cassandra?


On Tue, Jan 18, 2011 at 8:47 PM, tcn <timo.n...@gmail.com> wrote:
https://github.com/rantav/hector/issues#issue/126



--
/Ran

tcn

unread,
Jan 19, 2011, 3:16:12 AM1/19/11
to hector-dev


On Jan 18, 9:53 pm, Ran Tavory <ran...@gmail.com> wrote:
> @tcn you make a good point that this should be fixed on the server, by any
> chance did you bring it up on users@cassandra?

Sure, but no response so far...

tcn

unread,
Jan 19, 2011, 3:39:13 AM1/19/11
to hector-dev
Err, seems to be a big/little endian issue...somewhere.

@Override
public ByteBuffer toByteBuffer(final Integer obj) {
if (obj == null) {
return null;
}
final int l = obj;
return ByteBuffer.wrap(new byte[] {
(byte)l,
(byte)(l >>> 8),
(byte)(l >>> 16),
(byte)(l >>> 24)
});
}

[default@tracking] get crawler where rc<98 and
user_agent=foo;
-------------------
RowKey: 1295426032931
=> (column=rc, value=1627389952, timestamp=1295426033110000)
=> (column=url, value=http://www/0, timestamp=1295426033109000)
=> (column=user_agent, value=foo, timestamp=1295426033093000)

1 Row Returned.

[default@tracking] get crawler where rc=97 and user_agent=foo;

0 Row Returned.
[default@tracking] get crawler where rc>97 and user_agent=foo;
-------------------
RowKey: 1295426032931
=> (column=rc, value=1627389952, timestamp=1295426033110000)
=> (column=url, value=http://www/0, timestamp=1295426033109000)
=> (column=user_agent, value=foo, timestamp=1295426033093000)

1 Row Returned.
[default@tracking] get crawler where rc>98 and user_agent=foo;

0 Row Returned.
[default@tracking] get crawler where rc=1627389952 and user_agent=foo;
-------------------
RowKey: 1295426032931
=> (column=rc, value=1627389952, timestamp=1295426033110000)
=> (column=url, value=http://www/0, timestamp=1295426033109000)
=> (column=user_agent, value=foo, timestamp=1295426033093000)

1 Row Returned.


OMG...

Nate McCall

unread,
Jan 19, 2011, 10:10:36 AM1/19/11
to hecto...@googlegroups.com
Can you turn on debug logging on cassandra? (You can poke this through
StorageServiceMBean#setLogger)

tcn

unread,
Jan 19, 2011, 10:45:31 AM1/19/11
to hector-dev
This works as expected:

.addInsertion(now, cf, createColumn("rc", new String(new
BigInteger("97").toByteArray()), SS, SS)).execute();

[default@tracking] get crawler where user_agent=foo and rc=97;
-------------------
RowKey: 1295451825452
=> (column=rc, value=97, timestamp=1295451825634000)
=> (column=url, value=http://www/, timestamp=1295451825633000)
=> (column=user_agent, value=foo, timestamp=1295451825620000)

1 Row Returned.
[default@tracking] get crawler where user_agent=foo and rc>=97;
-------------------
RowKey: 1295451825452
=> (column=rc, value=97, timestamp=1295451825634000)
=> (column=url, value=http://www/, timestamp=1295451825633000)
=> (column=user_agent, value=foo, timestamp=1295451825620000)

1 Row Returned.
[default@tracking] get crawler where user_agent=foo and rc>=96;
-------------------
RowKey: 1295451825452
=> (column=rc, value=97, timestamp=1295451825634000)
=> (column=url, value=http://www/, timestamp=1295451825633000)
=> (column=user_agent, value=foo, timestamp=1295451825620000)

1 Row Returned.
[default@tracking] get crawler where user_agent=foo and rc<=96;

0 Row Returned.

tcn

unread,
Jan 19, 2011, 10:54:03 AM1/19/11
to hector-dev
On Jan 19, 4:10 pm, Nate McCall <n...@riptano.com> wrote:
> Can you turn on debug logging on cassandra? (You can poke this through
> StorageServiceMBean#setLogger)

Which class(es)?

Nate McCall

unread,
Jan 19, 2011, 11:32:46 AM1/19/11
to hecto...@googlegroups.com
org.apache.cassandra unfortunately. The relevant output could come
from any of db, io, net, service and locator packages.

tcn

unread,
Jan 19, 2011, 12:06:10 PM1/19/11
to hector-dev


On Jan 19, 5:32 pm, Nate McCall <n...@riptano.com> wrote:
> org.apache.cassandra unfortunately. The relevant output could come
> from any of db, io, net, service and locator packages.

Maybe watching my example with a debugger makes more sense then :)

Nate McCall

unread,
Jan 19, 2011, 12:07:48 PM1/19/11
to hecto...@googlegroups.com
If it is local, that is the best way.
Message has been deleted

Nate McCall

unread,
Jan 19, 2011, 1:46:11 PM1/19/11
to hecto...@googlegroups.com, T Jake Luciani
[Looping in Jake Luciani]
Jake,
Timo discovered an issue with number serialization to which you might
have some insight:
https://github.com/rantav/hector/issues#issue/126

and the following thrift issue seems related:
https://issues.apache.org/jira/browse/THRIFT-773

I guess we could force number types to BigInteger, but I'm afraid
there might be deeper issues. Thoughts?

On Wed, Jan 19, 2011 at 12:05 PM, tcn <timo.n...@gmail.com> wrote:
> On Jan 19, 6:07 pm, Nate McCall <n...@riptano.com> wrote:
>> If it is local, that is the best way.
>

> I meant you guys :)
>
> So, this one behaves weird:
>
>  .addInsertion(now, cf, createColumn("rc", 256, SS, new
> IntegerSerializer())).execute();
>
> which serializes to [0,0,1,0], i.e little endian.
>
> This one works perfectly:


>
>  .addInsertion(now, cf, createColumn("rc", new String(new

> BigInteger("256").toByteArray()), SS, SS)).execute();
>
> which serializes to [1,0], i.e. big endian and no padding of zeros.
>
> This is where my knowledge about thrift/cassandra/hector ends. I saw
> that the IntegerType of cassandra uses BigInteger.toByteBuffer which
> serializes big endian. When googling for thrift endian you find
> THRIFT-773.
>
> For somebody who knows cassandra and its code base better than me it
> should be much easier to figure out what's going on.
>

Jake Luciani

unread,
Jan 19, 2011, 6:53:28 PM1/19/11
to Nate McCall, hecto...@googlegroups.com
773 only relates to C++

But from http://download.oracle.com/javase/6/docs/api/java/math/BigInteger.html I see toByteArray() always serializes to big-endian

-Jake

Ed Anuff

unread,
Jan 19, 2011, 6:58:54 PM1/19/11
to hecto...@googlegroups.com
Hey, just saw this thread.  I just finished modifying these to get rid of our hand-rolled interger and long serializing and deseriailzing in favor of using the methods in ByteBuffer.

ed
Message has been deleted

tcn

unread,
Jan 21, 2011, 10:08:21 AM1/21/11
to hector-dev
Probably rather a question to the cassandra guys but anyway. I'm
wondering how it could possibly be that

where indexed_col1=cond1 and indexed_col2=cond2

returns results while flipping the conditions doesn't. Could it be
that cassandra internally doesn't use both secondary indices but only
and always the first one and resolves the second condition by a nested
loop?
Reply all
Reply to author
Forward
0 new messages