Switching keyspace on a connection? Not without a seat belt.

10 views
Skip to first unread message

Daniel Lundin

unread,
Aug 28, 2010, 7:20:04 AM8/28/10
to pycassa-devel
I don't think the set_keyspace is a good api call, as it modifies
shared state.

Besides login, it's the only state on a connection, and switching
keyspace on the fly can and will break stuff in threaded apps.

I think it was better before, shipping keyspace on all calls, state
kept firmly on the client side and the api kept stateless.

The proper way to approach this could be to introduce a `Session` in
pycassa. A `Connection` can have multiple sessions, a session is bound
to a keyspace, may store state (scanners, counters, whatnot), and all
operations are done using the session instance as interface - instead
of Connection as today. hell, we might even add a session id for good
measure.

Not sure i like like the complexity of that, but it's also not cool
breaking iterative scanners reading data by switching keyspace on the
connection underneath it - from another thread.

Easiest way is no doubt as it is now + disallowing set_keyspace for
safety. This enforces one connection/session per thread of execution,
and leaves the rare case of keyspace multiplexing to the application.

Otherwise, we'll need some kind of `Session` encapsulation to manage
state and synchronization.

/d

Eric Evans

unread,
Aug 28, 2010, 10:31:25 AM8/28/10
to pycass...@googlegroups.com
On Sat, Aug 28, 2010 at 6:20 AM, Daniel Lundin <d...@eintr.org> wrote:
> I don't think the set_keyspace is a good api call, as it modifies
> shared state.
>
> Besides login, it's the only state on a connection, and switching
> keyspace on the fly can and will break stuff in threaded apps.
>
> I think it was better before, shipping keyspace on all calls, state
> kept firmly on the client side and the api kept stateless.

It was changed like this in Pycassa to reflect the upstream changes in
Cassandra, where keyspace is now set on a per connection basis. This
was done because keyspace is the per-application namespace in the same
way that databases are in an RDBMS. Passing the keyspace on every
call was wasteful and confusing to people trying to understand the
data/query model. This was a bit controversial but I still think it
was wise.

> The proper way to approach this could be to introduce a `Session` in
> pycassa. A `Connection` can have multiple sessions, a session is bound
> to a keyspace, may store state (scanners, counters, whatnot), and all
> operations are done using the session instance as interface - instead
> of  Connection as today. hell, we might even add a session id for good
> measure.
>
> Not sure i like like the complexity of that, but it's also not cool
> breaking iterative scanners reading data by switching keyspace on the
> connection underneath it - from another thread.
>
> Easiest way is no doubt as it is now + disallowing set_keyspace for
> safety. This enforces one connection/session per thread of execution,
> and leaves the rare case of keyspace multiplexing to the application.

I see what you're saying, but the only way someone could do this would
be to access the raw thrift methods on their connection object (not
through any means of Pycassa's API). There is always someway to shoot
yourself in the foot and bypassing Pycassa to directly access a
connection object in use by pycassa.ColumnFamily instances is squarely
in Better-Know-What-You're-Doing territory.

> Otherwise, we'll need some kind of `Session` encapsulation to manage
> state and synchronization.

In Cassandra-land / Thrift RPC-land, people understand (or are being
made to understand), that *if* you're trying to access more than one
keyspace from a single application, then you need more than one
connection. In Pycassa the "connection" is kind of a misnomer since
under the covers it's a pool of connections, but the vernacular works
nonetheless. The way things stand today it's consistent, you need a
new connection instance for each keyspace you want to use.

Another way to approach this is to look at what you're optimizing for,
the one-application-multiple-keyspace use-case. Right or wrong, the
Cassandra project has decided that's a corner-case, and it's
discouraging folks from going there. And like I said before, it's
been a bit controversial, but only a bit IMO.

--
Eric Evans
john.er...@gmail.com

Reply all
Reply to author
Forward
0 new messages