pig supercolumn key

25 views
Skip to first unread message

pixman

unread,
Oct 3, 2011, 12:32:14 PM10/3/11
to Brisk Users
Hi,
I'm having issues loading a super column family into Pig.

I've defined the keyspace and column family with with cassandra-cli:

create keyspace testdb with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options =
[{replication_factor:1}];
create column family supercolumntest with column_type=Super and
comparator=UTF8Type and key_validation_class=UTF8Type and
subcomparator=UTF8Type;

and loaded some data with pycassaShell:

>>> pool=pycassa.connect('testdb')
>>> table=pycassa.ColumnFamily(pool,'supercolumntest')
>>> table.insert('1',{'2':{'3':'4'}})
1317656760822247L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4')]))])
>>> table.insert('1',{'2':{'5':'6'}})
1317656833986201L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4'), (u'5', '6')]))])
>>> table.insert('1',{'7':{'8':'9'}})
1317657074414167L
>>> table.get('1')
OrderedDict([(u'2', OrderedDict([(u'3', '4'), (u'5', '6')])), (u'7',
OrderedDict([(u'8', '9')]))])

I've then tested with cassandra-cli that all the data was there:

[default@unknown] use testdb;
[default@testdb] get supercolumntest['1'];
=> (super_column=2,
(column=3, value=34, timestamp=1317656760822247)
(column=5, value=36, timestamp=1317656833986201))
=> (super_column=7,
(column=8, value=39, timestamp=1317657074414167))
Returned 2 results.

Finally, I use Pig to load and process the column family:

grunt> A = LOAD 'cassandra://testdb/supercolumntest' USING
CassandraStorage();
grunt> dump A;
[...]
HadoopVersion PigVersion UserId StartedAt
FinishedAt Features
0.20.203.1-brisk1-beta2 0.8.3 frodo 2011-10-03 17:53:42
2011-10-03 17:53:59 UNKNOWN

Success!
[...]

(1,{(,{(3,4),(5,6)}),(,{(8,9)})})

Unexpectedly, the supercolumn keys ('2' and '7') are not returned.
Even by providing a schema:

A = LOAD 'cassandra://testdb/supercolumntest' USING CassandraStorage()
AS (a:chararray, sc{T:tuple(b:chararray,c:chararray)})

I have the same result. Am I missing something? Could you please help
me?

I've seen some related bug on pig and supercolumns

https://issues.apache.org/jira/browse/PIG-1866
https://issues.apache.org/jira/browse/PIG-1849

but I'm not sure if these bugs cause my problem.

I'm using Brisk-beta2 on Ubuntu 11.04.

Thank you,

Marco
Reply all
Reply to author
Forward
0 new messages