It was changed like this in Pycassa to reflect the upstream changes in
Cassandra, where keyspace is now set on a per connection basis. This
was done because keyspace is the per-application namespace in the same
way that databases are in an RDBMS. Passing the keyspace on every
call was wasteful and confusing to people trying to understand the
data/query model. This was a bit controversial but I still think it
was wise.
> The proper way to approach this could be to introduce a `Session` in
> pycassa. A `Connection` can have multiple sessions, a session is bound
> to a keyspace, may store state (scanners, counters, whatnot), and all
> operations are done using the session instance as interface - instead
> of Connection as today. hell, we might even add a session id for good
> measure.
>
> Not sure i like like the complexity of that, but it's also not cool
> breaking iterative scanners reading data by switching keyspace on the
> connection underneath it - from another thread.
>
> Easiest way is no doubt as it is now + disallowing set_keyspace for
> safety. This enforces one connection/session per thread of execution,
> and leaves the rare case of keyspace multiplexing to the application.
I see what you're saying, but the only way someone could do this would
be to access the raw thrift methods on their connection object (not
through any means of Pycassa's API). There is always someway to shoot
yourself in the foot and bypassing Pycassa to directly access a
connection object in use by pycassa.ColumnFamily instances is squarely
in Better-Know-What-You're-Doing territory.
> Otherwise, we'll need some kind of `Session` encapsulation to manage
> state and synchronization.
In Cassandra-land / Thrift RPC-land, people understand (or are being
made to understand), that *if* you're trying to access more than one
keyspace from a single application, then you need more than one
connection. In Pycassa the "connection" is kind of a misnomer since
under the covers it's a pool of connections, but the vernacular works
nonetheless. The way things stand today it's consistent, you need a
new connection instance for each keyspace you want to use.
Another way to approach this is to look at what you're optimizing for,
the one-application-multiple-keyspace use-case. Right or wrong, the
Cassandra project has decided that's a corner-case, and it's
discouraging folks from going there. And like I said before, it's
been a bit controversial, but only a bit IMO.
--
Eric Evans
john.er...@gmail.com