Cassandra input support now working ...

2 views
Skip to first unread message

burtonator

unread,
Feb 6, 2012, 6:31:34 PM2/6/12
to peregrine...@googlegroups.com
I have Cassandra input support working ... I need to test it some more but the general functionality is there.

I have a branch called burton-cassandra-support that I have to merge into default first.

It's somewhat maintainable as I'm just using the stock Hadoop+Cassandra InputFormat and then decorate it to look like a peregrine JobInput ... 

Output should be easy after this point.

The general idea is that you write just have a normal job but instead of reading from a file you read from:

cassandra://localhost:9160/mykeyspace/graph

which is just a URI for building the config used in Cassandra.

There is more work to be done of course:

- It would be nice to actually have Peregrine unit tests startup Cassandra, import data into it, then have Peregrine map over it...

- I haven't run any benchmarks.

- We're taking the key/value maps for cassandra records and mapping them to a new interface since EVERY 'record' in cassandra is key/value based.  It wastes a bit of CPU to do this but I can't think of an elegant way to do this without breaking a LOT of abstractions.

- We don't do any routing based on host name.. so data is just randomly read.  This is the FALLBACK case for when you're reading data from non-local machines but we should do something more intelligent.  Directly mapping by hostname is the first step but understanding network topology is probably required eventually.


Reply all
Reply to author
Forward
0 new messages