Hi,
I am new to the ElephantDB. I read the blog post introducing
ElephantDB at
http://backtype.posterous.com/introducing-elephantdb-a-distributed-databaseand tried to run the sample example from the blog post. This
example uses elephantdb-cascading to read the key/value pairs
from a file and stores them into ElephantDB.
First of all, using a JCascalog query, I stored some data into a
hadoop sequential file. In this query I defined the sink as
Api.hfsSeqfile("/tmp/gendercount"). This query
executed correctly and the results were written to the sink. The
output data contains two fields: gender and count, where gender
is String and count is int. Also, this data is in the byte format.
Then, I decided to store contents of /tmp/gendercount into the
ElephantDB. So, I wrote following code:
Tap source = new Hfs(new SequenceFile(new Fields("key", "value")), "/tmp/gendercount");
DomainSpec spec = new DomainSpec(new JavaBerkDB(), new HashModScheme(), 32);
ElephantDBTap sink = new ElephantDBTap("/tmp/elephantdb/gendercount", spec, new ElephantDBTap.Args(), TapMode.SINK);
Pipe p = new Pipe("pipe");
p = new KeyValTailAssembly(p, sink);
FlowConnector flowConnector = new HadoopFlowConnector();
flowConnector.connect(source, sink, p).complete();
The source tap reads the key/value pairs, where key is String and the
values are ints.
When I ran this code, the KeyValTailAssembly threw a ClassCastException.
It complained that it was unable to convert the key of type String
to byte[]. Here is the relevant portion of the stack trace:
java.lang.ClassCastException: java.lang.String cannot be cast to [B
at elephantdb.cascading.KeyValTailAssembly$Shardize.operate(KeyValTailAssembly.java:42)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
Line 42 of the KeyValTailAssembly indeed tries to cast an Object
to byte[]. Here is the relevant code from the KeyValTailAssembly.
The exception is encountered while executing
(byte[])key.
public void operate(FlowProcess process, FunctionCall call) {
Object key = call.getArguments().getObject(0);
int shard = shardIndex((byte[])key);
call.getOutputCollector().add(new Tuple(shard));
}I wonder why am I getting this exception especially, since both
the key and value are simple data types and they are being
read from a hadoop sequence file. Would someone please help
me understand what I am missing?
Best regards,
Tushar Deshpande