creating indexed CF in pycassa doesn't work for me

83 views
Skip to first unread message

Daniel Gabriele

unread,
Mar 28, 2013, 3:29:05 PM3/28/13
to pycassa...@googlegroups.com
I have spent the last few hours trying to use pycassa to setup a column family with a secondary index. It does not work. Specifically, I am able to index the column family only if I manually set the column_metadata field in cassandra-cli. I tried setting every validation_class property in every constructor and "alter" method possible to be uniformly UTF8Type, but nothing worked. Also, if I tried to set everything to UTF8Type and then called alter_column_family(...., comparator='UTF8Type') I would get a message about incompatible comparator types. Why can't I do this? Is it the case that I must manually set column metadata outside of pycassa before defining indices? Thanks.

I'm using the latest version of Cassandra and Pycassa obtained via the Arch Linux package manager.

KEYSPACE = 'test'
COL_FAM = 'users'

sys = pycassa.SystemManager()
sys.create_keyspace(KEYSPACE, pycassa.SIMPLE_STRATEGY, {'replication_factor':'1'})
sys.create_column_family(KEYSPACE, COL_FAM)

pool  = pycassa.ConnectionPool(KEYSPACE)
users = pycassa.ColumnFamily(pool, COL_FAM)

try:
    
    ## insert some rows 
    for idx, key in enumerate(list('abcdef')): 
        users.insert(key, {'age' : unicode(idx)})
        
     ## create index, expression, clause
     sys.create_index(KEYSPACE, COL_FAM, 'age', pycassa.UTF8_TYPE, pycassa.KEYS_INDEX)
     expr = pycassa.index.create_index_expression('age', '1')
     clause = pycassa.index.create_index_clause([expr])
        
     ## this SHOULD print "b 1" but doesn't print anything. why not?
     for key, user in users.get_indexed_slices(clause): 
         print key, user['age']

    except Exception as e:
        print e 
    finally: 
        sys.drop_keyspace(KEYSPACE)
        sys.close()
    

Tyler Hobbs

unread,
Mar 28, 2013, 3:42:33 PM3/28/13
to pycassa...@googlegroups.com
It's possible that the secondary index isn't built by the time your query is issued.  Try sleeping for a couple of seconds between creating the index and issuing the query as a start.  Alternatively, creating the index and then inserting the data might do the trick.

I don't see anything else that's obviously incorrect.


On Thu, Mar 28, 2013 at 2:29 PM, Daniel Gabriele <theo...@gmail.com> wrote:
Also, if I tried to set everything to UTF8Type and then called alter_column_family(...., comparator='UTF8Type') I would get a message about incompatible comparator types. Why can't I do this?

The comparator controls what order Cassandra stores columns in (in memory and on disk), so changing a comparator could potentially require re-writing all data. For this reason, Cassandra doesn't let you change the comparator.

--
Tyler Hobbs
DataStax

Daniel Gabriele

unread,
Mar 28, 2013, 4:44:21 PM3/28/13
to pycassa...@googlegroups.com
Well, damn. You were right about switching the order of the statements. Goddamnit! I s pent so many hours in vain! Most vile Fate, what have you done! Neverthless, this seems like a bug to me because wouldn't it be more logical to assume that an index should be created after the data, not the other way around? I tried sleeping for a second before trying the alternative ordering, but it had no effect. I would think a second would be long enough, but maybe not.

The correct ordering is as follows:
    
     ## create index, expression, clause
     sys.create_index(KEYSPACE, COL_FAM, 'age', pycassa.UTF8_TYPE, pycassa.KEYS_INDEX)
     expr = pycassa.index.create_index_expression('age', '1')
     clause = pycassa.index.create_index_clause([expr])

    ## insert some rows 
    for idx, key in enumerate(list('abcdef')): 
        users.insert(key, {'age' : unicode(idx)})
            







Tyler Hobbs

unread,
Mar 28, 2013, 5:04:59 PM3/28/13
to pycassa...@googlegroups.com
On Thu, Mar 28, 2013 at 3:44 PM, Daniel Gabriele <theo...@gmail.com> wrote:
Well, damn. You were right about switching the order of the statements. Goddamnit! I s pent so many hours in vain! Most vile Fate, what have you done!

Hah, I know the feeling.
 
Neverthless, this seems like a bug to me because wouldn't it be more logical to assume that an index should be created after the data, not the other way around?

Either way will work, it's just that secondary index creation is asynchronous in Cassandra; the create_index() statement won't block until it's built, only until the schema itself is updated.  There's a way to look at the status of index builds, but I can't seem to recall where.  It might be through JMX (i.e. nodetool) or a Thrift method.
 
I tried sleeping for a second before trying the alternative ordering, but it had no effect. I would think a second would be long enough, but maybe not.

Yeah, I would think so as well, but I'm not exactly sure how index builds are scheduled in Cassandra.

--
Tyler Hobbs
DataStax

Daniel Gabriele

unread,
Mar 28, 2013, 5:56:05 PM3/28/13
to pycassa...@googlegroups.com
I see.

By the way, I was wondering if there's a method somewhere for simply checking whether a column family exists in a keyspace, given its name.... for example, pycassa.columnfamily.exists('mycf') # returns bool

Thanks again. 


--
You received this message because you are subscribed to a topic in the Google Groups "pycassa-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pycassa-discuss/eNqmFwhVuxA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to pycassa-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tyler Hobbs

unread,
Mar 28, 2013, 5:59:22 PM3/28/13
to pycassa...@googlegroups.com

On Thu, Mar 28, 2013 at 4:56 PM, Daniel Gabriele <theo...@gmail.com> wrote:
I was wondering if there's a method somewhere for simply checking whether a column family exists in a keyspace, given its name.... for example, pycassa.columnfamily.exists('mycf') # returns bool

I would use SystemManager.get_keyspace_column_families(), for example:

cf_exists = "mycf" in sys.get_keyspace_column_families("mykeyspace")


--
Tyler Hobbs
DataStax

Daniel Gabriele

unread,
Mar 28, 2013, 7:40:08 PM3/28/13
to pycassa...@googlegroups.com
Thanks again. 

By the way, my name is Daniel Gabriele. Yes, lots of "els" in that name.   I am using Cassandra for the first time, so Pycassa has already come in handy. 



--
Reply all
Reply to author
Forward
0 new messages