Hector vs CQL

Sameer Farooqui

unread,

May 12, 2011, 2:42:19 PM5/12/11

to hector-users

Can one of the Hector developers comment on the future of Hector now
that CQL is out for 0.8?

From Eric Evans on the Cassandra mailing list: "The client space as a
whole *is* a mess, despite heroic efforts on the part of our third-
party API maintainers. The root cause goes back to... the RPC
interface is baroque, and too tightly coupled to Cassandra's
internals. The third-party library maintainers can only do so much to
paper over that. The solution here is... CQL."

Ed Anuff replied: "The client libraries are not a mess. Some might be,
some are not - Hector, which is the one I contribute to, is pretty
good. Client libraries aren't going away."

So does the release of CQL mean that moving forward for most Java
developers CQL will be the defacto method of connecting to Cassandra?

Is Hector going to keep putting wrappers around Thrift or are there
plans to send CQL? If so, do you anticipate the Hector syntax changing
in a future release?

Nate McCall

unread,

May 12, 2011, 3:08:12 PM5/12/11

to hector...@googlegroups.com

First off, I have every intention of adding support for CQL in hector
before Cassandra 0.8.0 goes "gold." (Patricio and I particularly have
been bogged down on Brisk the past couple of weeks, so there has not
been the movement here that I wanted).

That said, I've pontificated on this a couple of times recently, but
to sum, here are my thoughts:
- If anyone wants to switch their app to JDBC and CQL I suggest you
grep the cql source directory for "UnsupportedOperationException"
(Note: the Cassandra folks have done an *excellent* job for an
initial implementation, but the driver currently has a very narrow set
of use cases)

- Good luck finding a JDBC pooling library that understands:
* Consistency levels
* Idempotence
* intermittant failures and timeouts
* All nodes sharing the same roll

- Try a CQL query interchanging "<" and "<=" and see what happens
(there is work in progress to fix this one though)
- The thrift methods are not going anywhere anytime soon

Given the above, one of the next big design changes I want to put
through is encapsulating the connection logic further (as some folks
have suggested/already worked on) for two main reasons:
- Continue to support existing APIs for our installed base as long as
Cassandra has them
- Start to work on building out a JDBC driver that uses this pooling
logic does address the issues I mentioned above

I also want to go further and add large cluster support in the form of
proxies. Anyone who has worn the network admin hat will agree with me
that 200 app servers each talking to 80 cassandra nodes is not a
terribly good idea for a number of reasons. (Open to feature requests,
design suggestions, code submissions here as always).

Hopefully others will chime in with what they are looking for as well.

Ed Anuff

unread,

May 12, 2011, 4:25:25 PM5/12/11

to hector...@googlegroups.com

Honestly, I think that the course of action is going to be to wait and
see. There are a *lot* of unrealistic expectations around CQL that I
think we need to let people find out about for themselves before
racing to figure out where Hector should go.

CQL makes one piece of talking to Cassandra simpler, and it's a very
nice thing to have. It also provides a higher level mechanism than
thrift, although thrift is still it's transport. However, most of the
issues people have with Cassandra were long ago erased by most of the
major client libraries, and what people have been attributing to
Thrift has, in fact, just been the Cassandra data model. CQL won't
change that. Thrift is universally maligned, but the fascinating
thing is that when I talk to application developer using Cassandra and
then to the developers actually working on Cassandra, I find there is
very little overlap between what the two groups dislike about Thrift.

As a Java developer, using JDBC and PreparedStatement makes it easier
to do basic queries and statement, but you're going to be using JDBC
in a way you've never done before, largely because Cassandra just
doesn't have types the way JDBC was designed principally to operate
on. So, you'll be using the metadata API's in JDBC much more than you
probably have done, if, in fact, you've done so at all. Most Java
developers never have.

I think in practice, we're going to have to find a way to let you mix
CQL and Hector usage, because there's going to be a lot of object
mapping and even marshalling of the wide range of Java types that CQL
won't help with on it's own. The JDBC driver throws
UnsupportedExceptions for the majority of Java types passed to it.

Just my 2 cents.

Ed

Bill

unread,

May 14, 2011, 10:01:37 PM5/14/11

to hector...@googlegroups.com

On 12/05/11 21:25, Ed Anuff wrote:
> Honestly, I think that the course of action is going to be to wait and
> see. There are a *lot* of unrealistic expectations around CQL that I
> think we need to let people find out about for themselves before
> racing to figure out where Hector should go.
>
> CQL makes one piece of talking to Cassandra simpler, and it's a very
> nice thing to have. It also provides a higher level mechanism than
> thrift, although thrift is still it's transport. However, most of the
> issues people have with Cassandra were long ago erased by most of the
> major client libraries, and what people have been attributing to
> Thrift has, in fact, just been the Cassandra data model. CQL won't
> change that.

Exactly. CQL may end up simply highlighting inconsistencies with
Cassandra's data model (CFs and SCFs are disjoint, etc). It may be that
CQL can be hooked into the existing Query stuff with minimal growth in
the current API - I'd be reluctant to see PreparedStatement/ResultSet
style interfaces added (times 2 of course, for CF/SCF) before knowing
how things will play out.

Bill

Nate McCall

unread,

May 15, 2011, 10:08:47 AM5/15/11

to hector...@googlegroups.com

You make a good point about CF/SCF inconsistencies, but I will say
there is no intention on the Cassandra side to every support SCF via
CQL. See:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-v1-0-0-why-super-column-family-not-descirbed-in-it-td6333784.html

(Ed - not sure if you meant for CASSANDRA-2231 to have this outcome,
but I think we all owe you a beer if this ends up contributing so much
to SC replacement :-)

David Boxenhorn

unread,

May 15, 2011, 10:33:23 AM5/15/11

to hector...@googlegroups.com

After official composite columns come along, I think the next step (i.e. in 0.9) should be to use them to implement super columns internally, and throw out the super column code base (except for the bit that supports the Thrift super column API).

Ed Anuff

unread,

May 15, 2011, 12:17:54 PM5/15/11

to hector...@googlegroups.com

All of the composite column stuff got kicked off in this original
thread, appropriately titled "Is SuperColumn necessary?". It's still
a surprisingly relevant discussion:

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-SuperColumn-necessary-td4963369.html

What I think became better understood over the last year has been the
duel role that supercolumns play. On one hand, they're important for
index creation, and I think that composite column names are superior
to them for that. On the other, they're used as a kind of poor man's
object serialization mechanism, which I'm not a fan of. In almost any
case where I'd use an SCF for the latter, I've always found that
storing a serialized object in the column value works just as well.

This is actually a wider issue which probably should go in a separate
email thread or blog post, but I'll bring it up here since it's
extremely relevant to client development. In many way, this all boils
down to the question of how much should Cassandra actually be aware of
what goes in the column value? Historically it's been very little.
Cassandra needs to know a lot about the column name, but the column
value was originally entirely application (or client) dependent. This
is a big strength of Cassanda, IMHO, however for many users, it's a
big part about what I mean when I say people have difficulty with the
data model, because most conventional databases are entirely focused
on the column value. If you look at what Hector does, a very big part
of it is in dealing with how to map Java objects and types to and from
column values, whether via serializers, column family templates, or
object mapping.

In order to make Cassandra less dependent on the clients, that sort of
thing ends up needing to move into Cassadra. The first instance of
this was Cassandra secondary indexes. In order for those to work, the
column value type needed to be one of the Cassandra comparator types.
I think that as people start pulling the threads in CQL, that there
will be an increasing need for there to be more of that in order to
make CQL work like SQL. Things like the validation_class in the
column family definition are part of this. I have very mixed feeling
about how well that is going to work out. I like things that make it
easier for people to get up to speed faster on Cassandra and I think
it can broaden adoption. However, I think it ends up becoming a much
bigger challenge to do right than would be anticipated at first
glance. My ultimate concern is the question of focus. Is the most
important thing for Cassandra's success that the Cassanda "hello
world" is as easy as the SQL "hello world" or the Mongo "hello world",
or is the most important thing that people can run Cassandra at scale
in production on Amazon EC2 without trying to figure out the
intricacies of compaction and memory consumption and all the other
issues that scare the bejeezus out of me when I see people raising
them on the Cassandra list. I'm happy to write a few more lines of
code in my app.

Ed

B. Todd Burruss

unread,

May 16, 2011, 12:37:34 AM5/16/11

to hector...@googlegroups.com

i'm not sure what to say about this. i think it is refreshing that Cassandra (and NoSql in general) is not tied to JDBC, JDO, JPA, etc These technologies are not fitting for Cassandra. I am the author of the Hector Object Mapper (HOM) and myself and Nate have tried to be as JPA compliant as can be (and I know datastax is working a lot harder than me at being JPA 2.0 compliant) but i don't really think it is necessary or desired for most folks that are familiar with Cassandra. I created JPA like annotations to help map Cassandra rows to POJOs because i was tired of repeating the same code over and over - nothing more.

When CQL came along i thought the primary driver was to add a layer between a client and the thrift data structures. if that is the case, then i'm behind it. however, if the goal is to make Cassandra JDBC compliant, then i think this course is bad. NoSql is good because it isn't strapped with the confines of existing specifications. it remains easy to use and fast. add JDBC and i think we've made it more complicated than needed.

Since NoSql doesn't really fit into JDBC and JPA semantics I believe that creating these types of interfaces to systems like Cassandra only prolongs a required slope on the learning curve to be a good Cassandra developer. and if developers use technologies like JDBC to interact with Cassandra (at this stage) they will undergo a "come to jesus" moment eventually and wish that JDBC wasn't hiding the details.

i think there will probably be some sort of semantic abstraction to Cassandra eventually, but not JDBC - i'm not sure what advantage is gained by being JDBC compliant. spring has made themselves famous by abstracting the salient parts of complex technologies - maybe they will be the forerunner in this space with spring-data.

David Boxenhorn

unread,

May 16, 2011, 4:27:27 AM5/16/11

to hector...@googlegroups.com

If I were starting from scratch (which I'm not), I'd definitely start with HOM and work from there. HOM has the potential to be Cassandra's Spring. Things that I'd add to HOM (which I won't, because I already built my own, less general, equivalent of HOM) include support for object trees, so I can model complex entities, and dirty flags so I can load, change, and save objects. (Neither of these is very difficult to do. Now that we have composite columns, they are easier.)

Reply all

Reply to author

Forward