That said, I've pontificated on this a couple of times recently, but
to sum, here are my thoughts:
- If anyone wants to switch their app to JDBC and CQL I suggest you
grep the cql source directory for "UnsupportedOperationException"
(Note: the Cassandra folks have done an *excellent* job for an
initial implementation, but the driver currently has a very narrow set
of use cases)
- Good luck finding a JDBC pooling library that understands:
* Consistency levels
* Idempotence
* intermittant failures and timeouts
* All nodes sharing the same roll
- Try a CQL query interchanging "<" and "<=" and see what happens
(there is work in progress to fix this one though)
- The thrift methods are not going anywhere anytime soon
Given the above, one of the next big design changes I want to put
through is encapsulating the connection logic further (as some folks
have suggested/already worked on) for two main reasons:
- Continue to support existing APIs for our installed base as long as
Cassandra has them
- Start to work on building out a JDBC driver that uses this pooling
logic does address the issues I mentioned above
I also want to go further and add large cluster support in the form of
proxies. Anyone who has worn the network admin hat will agree with me
that 200 app servers each talking to 80 cassandra nodes is not a
terribly good idea for a number of reasons. (Open to feature requests,
design suggestions, code submissions here as always).
Hopefully others will chime in with what they are looking for as well.
CQL makes one piece of talking to Cassandra simpler, and it's a very
nice thing to have. It also provides a higher level mechanism than
thrift, although thrift is still it's transport. However, most of the
issues people have with Cassandra were long ago erased by most of the
major client libraries, and what people have been attributing to
Thrift has, in fact, just been the Cassandra data model. CQL won't
change that. Thrift is universally maligned, but the fascinating
thing is that when I talk to application developer using Cassandra and
then to the developers actually working on Cassandra, I find there is
very little overlap between what the two groups dislike about Thrift.
As a Java developer, using JDBC and PreparedStatement makes it easier
to do basic queries and statement, but you're going to be using JDBC
in a way you've never done before, largely because Cassandra just
doesn't have types the way JDBC was designed principally to operate
on. So, you'll be using the metadata API's in JDBC much more than you
probably have done, if, in fact, you've done so at all. Most Java
developers never have.
I think in practice, we're going to have to find a way to let you mix
CQL and Hector usage, because there's going to be a lot of object
mapping and even marshalling of the wide range of Java types that CQL
won't help with on it's own. The JDBC driver throws
UnsupportedExceptions for the majority of Java types passed to it.
Just my 2 cents.
Ed
Exactly. CQL may end up simply highlighting inconsistencies with
Cassandra's data model (CFs and SCFs are disjoint, etc). It may be that
CQL can be hooked into the existing Query stuff with minimal growth in
the current API - I'd be reluctant to see PreparedStatement/ResultSet
style interfaces added (times 2 of course, for CF/SCF) before knowing
how things will play out.
Bill
(Ed - not sure if you meant for CASSANDRA-2231 to have this outcome,
but I think we all owe you a beer if this ends up contributing so much
to SC replacement :-)
What I think became better understood over the last year has been the
duel role that supercolumns play. On one hand, they're important for
index creation, and I think that composite column names are superior
to them for that. On the other, they're used as a kind of poor man's
object serialization mechanism, which I'm not a fan of. In almost any
case where I'd use an SCF for the latter, I've always found that
storing a serialized object in the column value works just as well.
This is actually a wider issue which probably should go in a separate
email thread or blog post, but I'll bring it up here since it's
extremely relevant to client development. In many way, this all boils
down to the question of how much should Cassandra actually be aware of
what goes in the column value? Historically it's been very little.
Cassandra needs to know a lot about the column name, but the column
value was originally entirely application (or client) dependent. This
is a big strength of Cassanda, IMHO, however for many users, it's a
big part about what I mean when I say people have difficulty with the
data model, because most conventional databases are entirely focused
on the column value. If you look at what Hector does, a very big part
of it is in dealing with how to map Java objects and types to and from
column values, whether via serializers, column family templates, or
object mapping.
In order to make Cassandra less dependent on the clients, that sort of
thing ends up needing to move into Cassadra. The first instance of
this was Cassandra secondary indexes. In order for those to work, the
column value type needed to be one of the Cassandra comparator types.
I think that as people start pulling the threads in CQL, that there
will be an increasing need for there to be more of that in order to
make CQL work like SQL. Things like the validation_class in the
column family definition are part of this. I have very mixed feeling
about how well that is going to work out. I like things that make it
easier for people to get up to speed faster on Cassandra and I think
it can broaden adoption. However, I think it ends up becoming a much
bigger challenge to do right than would be anticipated at first
glance. My ultimate concern is the question of focus. Is the most
important thing for Cassandra's success that the Cassanda "hello
world" is as easy as the SQL "hello world" or the Mongo "hello world",
or is the most important thing that people can run Cassandra at scale
in production on Amazon EC2 without trying to figure out the
intricacies of compaction and memory consumption and all the other
issues that scare the bejeezus out of me when I see people raising
them on the Cassandra list. I'm happy to write a few more lines of
code in my app.
Ed