Hi,
I'm having issues loading a super column family into Pig.
I've defined the keyspace and column family with with cassandra-cli:
create keyspace testdb with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options =
[{replication_factor:1}];
create column family supercolumntest with column_type=Super and
comparator=UTF8Type and key_validation_class=UTF8Type and
subcomparator=UTF8Type;
and loaded some data with pycassaShell:
>>> pool=pycassa.connect('testdb')
>>> table=pycassa.ColumnFamily(pool,'supercolumntest')
>>> table.insert('1',{'2':{'3':'4'}})
1317656760822247L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4')]))])
>>> table.insert('1',{'2':{'5':'6'}})
1317656833986201L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4'), (u'5', '6')]))])
>>> table.insert('1',{'7':{'8':'9'}})
1317657074414167L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4'), (u'5', '6')])), (u'7',
OrderedDict([(u'8', '9')]))])
I've then tested with cassandra-cli that all the data was there:
[default@unknown] use testdb;
[default@testdb] get supercolumntest['1'];
=> (super_column=2,
(column=3, value=34, timestamp=1317656760822247)
(column=5, value=36, timestamp=1317656833986201))
=> (super_column=7,
(column=8, value=39, timestamp=1317657074414167))
Returned 2 results.
Finally, I use Pig to load and process the column family:
grunt> A = LOAD 'cassandra://testdb/supercolumntest' USING
CassandraStorage();
grunt> dump A;
[...]
HadoopVersion PigVersion UserId StartedAt
FinishedAt Features
0.20.203.1-brisk1-beta2 0.8.3 frodo 2011-10-03 17:53:42
2011-10-03 17:53:59 UNKNOWN
Success!
[...]
(1,{(,{(3,4),(5,6)}),(,{(8,9)})})
Unexpectedly, the supercolumn keys ('2' and '7') are not returned.
Even by providing a schema:
A = LOAD 'cassandra://testdb/supercolumntest' USING CassandraStorage()
AS (a:chararray, sc{T:tuple(b:chararray,c:chararray)})
I have the same result. Am I missing something? Could you please help
me?
I've seen some related bug on pig and supercolumns
https://issues.apache.org/jira/browse/PIG-1866
https://issues.apache.org/jira/browse/PIG-1849
but I'm not sure if these bugs cause my problem.
I'm using Brisk-beta2 on Ubuntu 11.04.
Thank you,
Marco