Component values in Composite column query results

177 views
Skip to first unread message

Matt Stump

unread,
Dec 19, 2011, 2:43:18 PM12/19/11
to hector-users
I'm working on a patch to add composite columns to clj-hector, the
Clojure wrapper for hector, and I'm running into an issue I was hoping
you guys could shed light on.

I'm adding Components to a Composite instance using the addComponent
method with the following values:

["col"
#<AsciiSerializer
me.prettyprint.cassandra.serializers.AsciiSerializer@158c4d4c>
"AsciiType"
#<ComponentEquality EQUAL>]

I insert my value into cassandra, and since this is a test I use the
same Composite instance to perform a column slice query. Upon
inspecting the results I always get a Composite whose Component values
are always instances of java.nio.HeapByteBuffer instead of the
expected String. This is true no mater what I specify for serializer
and comparator values when adding Components to my Composite instance.

Is this expected behavior?

Is there any way I can get my expected type instead of
java.nio.HeapByteBuffer?

I'm using hector "1.0-1". Any help would be greatly appreciated.

Nate McCall

unread,
Dec 19, 2011, 3:06:55 PM12/19/11
to hector...@googlegroups.com, Ed Anuff
Hmm looks like from:
https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/hector/api/beans/AbstractComposite.java#L143
Already constructed components won't get de/ser'ed correctly. CC'ing
Ed to get his thoughts on this.

Ed Anuff

unread,
Dec 19, 2011, 3:45:56 PM12/19/11
to Nate McCall, hector...@googlegroups.com
I'll take a look at it later this afternoon.

Sent from my iPhone

Matt Stump

unread,
Dec 19, 2011, 3:52:31 PM12/19/11
to hector...@googlegroups.com, Nate McCall
I should mention that I'm seeing same results with row fetch as well
as column slice.

hpgisler

unread,
Dec 19, 2011, 4:27:31 PM12/19/11
to hector-users
Hi,

Sorry to crash into this discussion...

Has this perhaps to do with this post?:
http://groups.google.com/group/hector-users/browse_thread/thread/269ccdb759096384
I am having this same problem when using:
columnslice.getColumnByName(nameComposite)
columnslice has the column in question inside it, but when trying to
retreive it via getColumnByName it is not found in the hashmap because
the hashmap has stored the composite via HeapByteBuffer wich does not
matchup to nameComposite.

Any thoughts?

(Sorry again, if this is not related to the current discussion)
Regards
Hanspeter

On Dec 19, 9:52 pm, Matt Stump <mst...@sourceninja.com> wrote:
> I should mention that I'm seeing same results with row fetch as well
> as column slice.
>
>
>
>
>
>
>
> On Mon, Dec 19, 2011 at 12:45 PM, Ed Anuff <e...@anuff.com> wrote:
> > I'll take a look at it later this afternoon.
>
> > Sent from my iPhone
>

> > On Dec 19, 2011, at 12:06 PM, Nate McCall <n...@datastax.com> wrote:
>
> >> Hmm looks like from:

> >>https://github.com/rantav/hector/blob/master/core/src/main/java/me/pr...

Nate McCall

unread,
Dec 19, 2011, 4:31:07 PM12/19/11
to hector...@googlegroups.com
Yep - that looks related. I'll dig into this a bit more later tonight as well.

Matt Stump

unread,
Dec 20, 2011, 5:17:38 PM12/20/11
to hector...@googlegroups.com
Just curious if you guys made any progress. I'm going to go off and implement support for dynamic composites in clj-hector in the hopes that it isn't affected.  If it is also borked, I'll start digging into the hector code at the points that Nate mentioned.  Unless I can get it resolved I think it's going to hold up our roll out of DSE.  

Matt Stump

unread,
Dec 20, 2011, 5:58:08 PM12/20/11
to hector...@googlegroups.com
Just finished implementing DynamicComposites, and from my initial tests it looks like it's unaffected by the bug.

Nate McCall

unread,
Dec 21, 2011, 12:20:36 PM12/21/11
to hector...@googlegroups.com
Matt, let me know if anything comes up and I'll take a look. The
hardest part of tearing into this is having some real-world use cases
against it to make sure it doesnt blow up for existing users.

Thanks for the update though.

Matt Stump

unread,
Dec 27, 2011, 3:31:14 PM12/27/11
to hector...@googlegroups.com
So it's not the line of code that you suspected.  I dug into this a bit before heading out on vacation but I didn't have time to post the results.  

The problem is that the Components which make up the Composite in the result set have an incorrect serializer.  This only occurs when I'm using Composites, and does not happen when I use DynamicComposite.  Both the Composite and DynamicComposite tests run through the same code, the only differences are the column family definition, I've switched out the type Composite for DynamicComposite, and I'm using the corresponding serializer.  

It doesn't look like there are any tests that cover this use case in hector.  Can someone please validate that deserialization of components for static composites does work?

I create a column family with the following attributes:

name: A
comparator: CompositeType
column-type: STANDARD
default-validation-class: AsciiType
comparator-alias: (AsciiType, AsciiType)

I create a composite with the following two components:

{:value "col", :n-serializer :ascii, :comparator :ascii, :equality :equal}
{:value "name", :n-serializer :ascii, :comparator :ascii, :equality :equal}

I put a column value with the following attributes:

name: [col, name] <- string repr of Composte create above
value: v
n-serializer: me.prettyprint.cassandra.serializers.CompositeSerializer@24812051
v-serializer:me.prettyprint.cassandra.serializers.TypeInferringSerializer@56101751

And when I fetch the row as I iterate through Composite.getComponents this is what I see:

value: java.nio.HeapByteBuffer[pos=0 lim=3 cap=3] 
serializer: me.prettyprint.cassandra.serializers.ByteBufferSerializer@7ea4b9da 
bytes: java.nio.HeapByteBuffer[pos=0 lim=3 cap=3] 
comparator: BytesType

Nate McCall

unread,
Dec 27, 2011, 3:50:06 PM12/27/11
to hector...@googlegroups.com
This may be as simple as it looks like we don't have AsciiType in the
SerializerTypeInferrer.

Can you quickly toggle these to UTF8 in the CFDef and see if the work
as anticipated?

Matt Stump

unread,
Dec 27, 2011, 4:09:58 PM12/27/11
to hector-users
Nope, same result. I used string serializer and utf-8 comparator when
populating my components. I also switched the CFDef to use
'(UTF8Type, UTF8Type)' and UTF8Type as the validator. My
DynamicComposite test is also using ASCII and it's passing.

Nate McCall

unread,
Dec 27, 2011, 4:17:06 PM12/27/11
to hector...@googlegroups.com
Thanks for the quick feedback - I'll start digging into this on the
Hector side.

Nate McCall

unread,
Dec 27, 2011, 6:08:19 PM12/27/11
to hector...@googlegroups.com
Actually, after digging into this again and thinking about it, We need
to take a step back here and look at the inherent differences between
Composite and DynamicComposite (DC).

Initially what I loss site of is that with Composite, is that no type
info comes back from the column names, only the start and end
positions of the component. Because they are statically defined, we
assume the user knows that the component in position "2" is a "string"
because that is always the case for this column family (or else the
original write would not have validated).

You are getting the correct typing back from DC because the column
name contains a separator *and* a marker about what type the next
component is composed of (hence the dynamic nature of it).

DynamicComposite is safe to use if you adhere to the restrictions as
described out in:
https://issues.apache.org/jira/browse/CASSANDRA-3625
(Matt actually reminded me of this issue offline, ftr) the gist of
which is that you can move the types around in different rows, just
not within the same rows (or things will go horribly wrong and you
will corrupt your data if you do).

As a hack, you could still use Composite but store your own 'type
marker' component in the composite before every real component and
parse that on the way out in order to provide the functionality for
which you are looking.

Come to think, if enough people would find that useful I'll just add
it as a wrapper of some sort.

Matt Stump

unread,
Dec 27, 2011, 10:12:56 PM12/27/11
to hector-users
I understand your point, but the asymmetric support for serialization
is a little odd.  If that was your intent, it feels like you shouldn't
offer the ability to even specify a serializer when adding a component
to the Composite. You should just take a buffer of bytes. If you offer
the ability to specify a serializer, and you perform the serialization
for the user, and you are required to supply type aliases in the
CFDef, it would follow that you would also provide deserialization
when fetching results. I don't understand the point of all of these
facilities, meta information, and type aliasing if we don't make use
of them.

Matt Stump

unread,
Dec 27, 2011, 10:44:07 PM12/27/11
to hector-users
I understand this is still being decided, but how is CQL going to
handle this? I imagine that if the users of CQL just got back a byte
stream for component values it would probably be unacceptable. You
could do things like cache the CFDef and use the type alias info, but
I'm not sure if thats query-able, and that seems like a hack.

Nate McCall

unread,
Dec 28, 2011, 1:09:59 PM12/28/11
to hector...@googlegroups.com
We had a whole validation layer in hector early on that touched the CF
meta data. We got burned a number of times from API changes while
transitioning from 0.6 to 0.7 so we took it out.

Also, dealing with caching the CfDefs is a real PITA when cluster
sizes start to grow and you have lots of endpoints talking to a
cluster.

You make a good point about exposing this functionality through the
API if we arent going to support it (admittedly, it's telling that I
needed to code dive to even remember the Composite minutia).

We are certainly open to suggestions/Patches/pull requests on API behavior.

Nate McCall

unread,
Dec 28, 2011, 1:12:10 PM12/28/11
to hector...@googlegroups.com
As for CQL, that's a huge TBD. Yes, the syntax will inevitably be
obtuse and difficult to handle correctly.

Keep your eyes on http://wiki.apache.org/cassandra/Cassandra2474 for
the latest here.

Reply all
Reply to author
Forward
0 new messages