Multiple sessions for multiple keyspaces

2,100 views
Skip to first unread message

SMA Tester

unread,
Feb 4, 2014, 12:10:33 AM2/4/14
to java-dri...@lists.datastax.com
Hello,

We have about 100 keyspaces with pretty much same Schema

I created 100 sessions (one session per keyspace); and they are alive till the life of App.

Questtions
1. Have i got this wrong?

2. Performance imapct; i ran into "too many files open" and the app died

3. Cassandras are chruning upto 80-90% CPU since after this migration from Hector API

4. I see 100s of worker threads (Thread [Cassandra Java Driver worker-38])   created and destroyed

5 Do i even have an option of using sinple Session?
To keep performance to min; i am considering to ugly my queries by tagging keyspaces as needed
Example:
String query = "SELECT * FROM "+keyspace1+".table";
PreparedStatement ps = session.prepare(query)

I fear i am going to hit the wall or deadend by doing these string substitues
Rightnow i have session per keyspace so PreparedStatement ps = session.prepare("SELECT * FROM table") is cleaner

Can you clear this up guys?

Sylvain Lebresne

unread,
Feb 4, 2014, 6:06:27 AM2/4/14
to java-dri...@lists.datastax.com
> We have about 100 keyspaces with pretty much same Schema
>
> I created 100 sessions (one session per keyspace); and they are alive till the life of App.
>
> Questtions
> 1. Have i got this wrong?

Each Session in the driver has it's own separated set of per-host connection
pools and hence is not particularly lightweight. Having 100 of them is thus
expected to have a relatively big footprint. Personally, I certainly would
avoid it by using one Session and fully qualified keyspace names in queries
if I was in that situation.

> 2. Performance imapct; i ran into "too many files open" and the app died

Again, each Session will have for each C* host a pool of connection, and
by default, each pool has 2 connections (at a minimum, it can grow up to 8
connection per pool under load (again those are the defaults, they can be
changed in PoolingOptions)). Even for a relatively modest cluster of 10 nodes,
that's already 100 Sessions * 10 nodes * 2 connection per node = 2000
connections. While that's not something a modern system can't handle, this
might go over the maximum number of file descriptor allowed per-process on your
system (which is of 1024 by default on linux I believe). If that's the case
(which you can try to validate using lsof ('lsof -n | grep java')), you can
always up your system limits (google ulimt and/or /etc/security/limits.conf),
though again, using just one Session with fully qualified keyspace names is
likely a much better solution overall.

> 3. Cassandras are chruning upto 80-90% CPU since after this migration from Hector API

I don't think anyone can help you here without much more details (what were you
doing before precisely, what are you doing now precisely, what has changed
etc...). But keep in mind that with 100 sessions, you have at least 200
connections per Cassandra node. I don't know if that explain some unexpected
CPU usage but that certainly is not optimal; I'm not sure you were using Hector
the same way in particular.

> 4. I see 100s of worker threads (Thread [Cassandra Java Driver worker-38])   created and destroyed

There is nothing particularly wrong in that, though you definitively won't see
that many if you have less Session.

> 5 Do i even have an option of using sinple Session?
> To keep performance to min; i am considering to ugly my queries by tagging keyspaces as needed
> Example: 
> String query = "SELECT * FROM "+keyspace1+".table";
> PreparedStatement ps = session.prepare(query)
>
> I fear i am going to hit the wall or deadend by doing these string substitues 
> Rightnow i have session per keyspace so PreparedStatement ps = session.prepare("SELECT * FROM table") is cleaner

It seems you've answered your own question. You can indeed use a single Session
through fully qualified keyspace names in the query. And there is no wall to be
hit by doing this. You can also use the QueryBuilder instead of doing
manual string concatenation.

--
Sylvain

oded peer

unread,
Feb 4, 2014, 8:26:03 AM2/4/14
to java-dri...@lists.datastax.com
I have a multi-tenant SaaS. I am planning on creating a cassandra user and keyspace per-tenant for security reasons.
If one Session is not an option what are my alternatives?

Sylvain Lebresne

unread,
Feb 4, 2014, 9:03:24 AM2/4/14
to java-dri...@lists.datastax.com
On Tue, Feb 4, 2014 at 2:26 PM, oded peer <peer...@gmail.com> wrote:
I have a multi-tenant SaaS. I am planning on creating a cassandra user and keyspace per-tenant for security reasons.
If one Session is not an option what are my alternatives?

Hum, it seems I haven't been clear. One session *is* an option and is in fact 
*the* option I recommend if you have more than a handful of keyspaces. Just
use fully qualified keyspace names in your queries.

If you have just 2 or 3 keyspaces, using one session per keyspace is fine if you
prefer it over using fully qualified keyspace names. But even in that case using
just one Session is a perfectly reasonable choice too, it's probably mainly a 
matter of personal taste.

As an aside, I'll note that "lots and lots of keyspace" is not a case Cassandra
particularly optimize for. I'm sure you guys know what you're doing, but I would
certainly be careful with going crazy there.

--
Sylvain
 

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

SMA Tester

unread,
Feb 4, 2014, 11:24:54 AM2/4/14
to java-dri...@lists.datastax.com
Sylvian,
As always THANK YOU so much for prompt reply and GREAT driver code. Made my life so much better from Hector.

1.By wall i meant NOT all the CQL quries could be used Keyspace qulified I am looking into CQL ref so see if there comes a case were you will HAVE TO open a session to that keyspace.
2.I have mixed responses on 100s of keyspaces; so far i realized that C* perf matters with number of Tables than keyspaces.
3.For clarity and completeness can we put your suggestions in Driver Javadocs; you have done an amaysing job in Javadocs and thank you a million for that.
Adding this caution in Javadocs might have helped me implement "SINGLE SESSION" only. Right now i am staying up late nights recycling the App; desipte increasing the file limits
I got impression; and trust me its everywhere SINGLE SESSION PER KEYSPACE

Scott Lewis

unread,
Feb 5, 2014, 2:23:55 PM2/5/14
to java-dri...@lists.datastax.com
One thought:

Would it be possible to add java driver configuration API to allow the per-session thread pool size to be controlled/managed?  

Then...the environment in which the java-driver runs could make it's own decisions about # of keyspaces/sessions vs. number of threads per session.

Scott

SMA Tester

unread,
Feb 5, 2014, 2:45:56 PM2/5/14
to java-dri...@lists.datastax.com
Session per keyspace does help cleaner code; really!!
QueryBuilder is NOT my favorite infact i never used it. I rather wait for ORM to mature.

right now the same code; session.prepare("SELECT * from Table"); is called by 100s of sessions/keyspace.
I am fighting a different fire; "Re-preparing prepared statements" warning.

The driver does so much good; if it took care of these realtime problems would make it the best.

Hector API actually sets the keyspace in the query; and they use single session managed connections. I think the driver followed similar parttern.


Alex Popescu

unread,
Feb 5, 2014, 2:58:10 PM2/5/14
to java-dri...@lists.datastax.com

On Wed, Feb 5, 2014 at 11:23 AM, Scott Lewis <scott...@gmail.com> wrote:
Would it be possible to add java driver configuration API to allow the per-session thread pool size to be controlled/managed? 

Would you mind explaining a bit how do you think this change things? 

As far as I can tell this would make the configuration phase more complicated. Also the management of threads 
would become quite complicated too, as you'd not only have to deal with it at the session level, but 
also across all session to make sure you are not overwhelming the client.


--

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru

Scott Lewis

unread,
Feb 5, 2014, 4:10:28 PM2/5/14
to java-dri...@lists.datastax.com
On 2/5/2014 11:58 AM, Alex Popescu wrote:

On Wed, Feb 5, 2014 at 11:23 AM, Scott Lewis <scott...@gmail.com> wrote:
Would it be possible to add java driver configuration API to allow the per-session thread pool size to be controlled/managed? 

Would you mind explaining a bit how do you think this change things? 

As far as I can tell this would make the configuration phase more complicated.

Yes...I expect it could make configuration phase somewhat more complicated.  Of course with reasonable defaults and configuration API the complexity doesn't really have to be intrusive for the common uses(s).


Also the management of threads 
would become quite complicated too, as you'd not only have to deal with it at the session level, but 
also across all session to make sure you are not overwhelming the client.

I agree that management of threads could become complicated.  But my observation is that...frequently for scaling needs...such management/control of resources (in this case threads) is going to be needed.

Scott


Alex Popescu

unread,
Feb 6, 2014, 2:50:39 AM2/6/14
to java-dri...@lists.datastax.com
So we both agree about the added complexity. What are the advantages? Considering you can already tune these at the cluster level and that having hundreds of keyspaces is not  really a use case Cassandra is
optimizing for, why would we want a more complicated API and thread management?

--

:- a)

@al3xandru

Sylvain Lebresne

unread,
Feb 6, 2014, 5:01:41 AM2/6/14
to java-dri...@lists.datastax.com
Probably more importantly, managing thread pool sizes per-Session wouldn't really solve the problem we're talking about. It's not like the default number of connections per host per Session is currently high, it's 2. You can globally set it to 1 if you want, but if you have 100+ sessions, you still have 100 connections per-host, which is clearly wasteful. Being able to tweak each session manually won't help much really.

The fact is, if you have many keyspaces, you should use fully qualified keyspace name, not per-connection keyspace logging, because that forces you to have one connection per keyspace for each host. It's not specific to the java driver in any way, it's inherent to the concept of having per-connection default keyspace (and I don't mean that as a bad thing, logging to a keyspace is convenient for interactive sesions for instance, or if you have a handful of keyspaces, which is by and large the norm).

As for whether per-Session thread pool size configuration would be useful in general (outside this "lots of keyspace" problem), let's say that I'm not convinced it's worth the complexity because I don't think it really buy much (if anything really). And if you really really want to have 2 sessions with different settings, you can always create 2 Cluster instance: the Session is the heavy part in the driver, but the Cluster don't had all that much overhead on top of it, and having 2 Cluster instance is probably what you want if you really want to go fined grained (because you will be able to tweak all settings, have separated metrics etc..). But again, I suspect very very few users will need to go there.

--
Sylvain



To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Scott Lewis

unread,
Feb 6, 2014, 4:41:30 PM2/6/14
to java-dri...@lists.datastax.com
On 2/5/2014 11:50 PM, Alex Popescu wrote:
<stuff deleted>

So we both agree about the added complexity.

Yes, although I would guess we might estimate the cost of that complexity somewhat differently...i.e. I don't think it means that much more complexity...and I don't think additional complexity is going to be much of a problem for anyone...given the config stage for cluster creation that already exists in the driver.


What are the advantages? Considering you can already tune these at the cluster level

Could you explain what you mean by 'tune these'?  I'm specifically referring to the number of threads running on the client for a given session.


and that having hundreds of keyspaces is not  really a use case Cassandra is
optimizing for,

My first question about this is:  why not?  This isn't directly related to this discussion though, so please don't feel obligated to answer.


why would we want a more complicated API and thread management?

In short: to give the client control (aka mgmt) of local resource usage (in this case threads).   There are presumably use cases (e.g. cassandra client is running in webserver/application server), where resource usage/mgmt/control is very important.

And just from experience...I've seen a lot of APIs with thread pooling, and when the scaling needs change unexpectedly, and/or use cases expand, it's very common for the environment to need to manage those resources.

Scott

Scott Lewis

unread,
Feb 6, 2014, 4:52:31 PM2/6/14
to java-dri...@lists.datastax.com
On 2/6/2014 2:01 AM, Sylvain Lebresne wrote:
> Probably more importantly, managing thread pool sizes per-Session
> wouldn't really solve the problem we're talking about. It's not like
> the default number of connections per host per Session is currently
> high, it's 2. You can globally set it to 1 if you want, but if you
> have 100+ sessions, you still have 100 connections per-host, which is
> clearly wasteful. Being able to tweak each session manually won't help
> much really.

I wasn't talking about the number of connections (although I can imagine
that there might be need...at some point...to be able to configure that
as well). I was referring specifically to number of threads/session.

>
> The fact is, if you have many keyspaces, you should use fully
> qualified keyspace name, not per-connection keyspace logging, because
> that forces you to have one connection per keyspace for each host.
> It's not specific to the java driver in any way, it's inherent to the
> concept of having per-connection default keyspace (and I don't mean
> that as a bad thing, logging to a keyspace is convenient for
> interactive sesions for instance, or if you have a handful of
> keyspaces, which is by and large the norm).

Yes...the norm...now.

>
> As for whether per-Session thread pool size configuration would be
> useful in general (outside this "lots of keyspace" problem), let's say
> that I'm not convinced it's worth the complexity because I don't think
> it really buy much (if anything really).

Ok...but your judgment of 'worth' might be different from...say...the
New York Times' website admin's :).

> And if you really really want to have 2 sessions with different
> settings, you can always create 2 Cluster instance: the Session is the
> heavy part in the driver, but the Cluster don't had all that much
> overhead on top of it, and having 2 Cluster instance is probably what
> you want if you really want to go fined grained (because you will be
> able to tweak all settings, have separated metrics etc..). But again,
> I suspect very very few users will need to go there.

I don't understand what you are saying here. Is there a code example
that does this? What would be the effect on the per-Session thread
pool size...by creating a 2 cluster instance? And returning to Alex'
point: wouldn't/isn't this be fairly complicated in terms of extra
config complexity?

Scott

Sylvain Lebresne

unread,
Feb 7, 2014, 4:13:54 AM2/7/14
to java-dri...@lists.datastax.com
Ok, it doesn't seem that what you're talking about is all that linked to the initial subject and I think I'm slightly confused on what you are asking about/suggesting now. But since you're talking about threads, what I can tell about that is that there the number of session do not directly influence the number of threads in the sense that there is no per-session dedicated threads. This is a number of Netty worker threads and the driver has a few internal ExecutorService so it can do its work (creating connections and host pools, attempt reconnections, ...) without blocking other stuffs, but all of those are shared by all sessions and don't grow with more sessions.

--
Sylvain 




To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscribe@lists.datastax.com.

Scott Lewis

unread,
Feb 7, 2014, 11:35:10 AM2/7/14
to java-dri...@lists.datastax.com
On 2/7/2014 1:13 AM, Sylvain Lebresne wrote:
Ok, it doesn't seem that what you're talking about is all that linked to the initial subject and I think I'm slightly confused on what you are asking about/suggesting now. But since you're talking about threads,

Yes.


what I can tell about that is that there the number of session do not directly influence the number of threads in the sense that there is no per-session dedicated threads. This is a number of Netty worker threads and the driver has a few internal ExecutorService so it can do its work (creating connections and host pools, attempt reconnections, ...) without blocking other stuffs, but all of those are shared by all sessions and don't grow with more sessions.

Here's what this thread was started with:

Hello,


We have about 100 keyspaces with pretty much same Schema

I created 100 sessions (one session per keyspace); and they are alive till the life of App.

...stuff deleted...

4. I see 100s of worker threads (Thread [Cassandra Java Driver worker-38])   created and destroyed ..

...stuff deleted....

[Scott]  And my original question was whether configuration could be added (or already exists) to reduce/manage/control the '100s of worker threads' associated with 100 sessions.   Perhaps this can be done right now with netty configuration (since it seems netty is being used)...but that's what I don't know.

Scott


SMA Tester

unread,
Feb 11, 2014, 9:10:23 AM2/11/14
to java-dri...@lists.datastax.com
Scott,

It is better to use all this time to rewrite the code to convenience; esp end-user's convenience.
- Handle re-prepare query seamlessly instead of cribbing WARNs
- Execute batch in the order of add(); get rid of query timestamp confusion
- Lite Session per keyspace sharing connection pooling at Cluster level and not session level
- Use modifiable collections

and a lot more that I think a good driver should have. Work around Cassandra's silly CQL clauses and other fundamentals that are discouraged.
This could have been an amazing driver if focused on user friendlyness.


SMA Tester

unread,
Feb 11, 2014, 10:54:21 AM2/11/14
to java-dri...@lists.datastax.com
Turns out one of the main culprits for "too many files open" or sockets leak is
- unlimited creation of threads
- too many hung sockets to closing but not disconnecting see http://stackoverflow.com/questions/9114311/httpurlconnection-leaking-sockets-with-cant-identify-protocol-error-message
We are seeing 100s of these; i am looking into code but i could reproduce it with pure Driver Client code

Sylvain Lebresne

unread,
Feb 11, 2014, 12:34:26 PM2/11/14
to java-dri...@lists.datastax.com
On Fri, Feb 7, 2014 at 5:35 PM, Scott Lewis <scott...@gmail.com> wrote:

I created 100 sessions (one session per keyspace); and they are alive till the life of App.

...stuff deleted...

4. I see 100s of worker threads (Thread [Cassandra Java Driver worker-38])   created and destroyed ..

...stuff deleted....

[Scott]  And my original question was whether configuration could be added (or already exists) to reduce/manage/control the '100s of worker threads' associated with 100 sessions.   Perhaps this can be done right now with netty configuration (since it seems netty is being used)...but that's what I don't know.

I see.  As as said, that part is not really a problem. Those 'Java Driver worker' thread are not the Netty workers actually, but they are still not per-session. They are just thread created through an ExecutorService that the Cluster instance uses to do works it doesn't work to block on. Such work include creating per-host connection pools when a new host is added, or when a Session is created. So really, what those messages mean is that 100 tasks are created, one for each session. Now, it happens that in version before 1.0.5/2.0.0-rc1 the driver was not limiting the number of thread the ExecutorService could create, so if 100 session are created basically simultaneously, that thread pool service would create in the order of 100 thread to handle the load of 'connection pool creation' tasks, and once this is done would destroyed those threads because they are not useful. The did fixed that in recent versions of the driver so that no more than 'the number of processor' threads are created. And arguably we could expose configuration to fine tune that ExecutorService, but in any case, this is not directly associated to the number of Sessions and fine tuning that is honestly at best a minor concern (all you might hope to gain by fine tuning is saving a few milliseconds for the creation of your 100 sessions, which sound more effort than is worth).

--
Sylvain

Scott Lewis

unread,
Feb 11, 2014, 1:04:03 PM2/11/14
to java-dri...@lists.datastax.com
Hi Sylvain,


On 2/11/2014 9:34 AM, Sylvain Lebresne wrote:
On Fri, Feb 7, 2014 at 5:35 PM, Scott Lewis <scott...@gmail.com> wrote:

I created 100 sessions (one session per keyspace); and they are alive till the life of App.

...stuff deleted...

4. I see 100s of worker threads (Thread [Cassandra Java Driver worker-38])   created and destroyed ..

...stuff deleted....

[Scott]  And my original question was whether configuration could be added (or already exists) to reduce/manage/control the '100s of worker threads' associated with 100 sessions.   Perhaps this can be done right now with netty configuration (since it seems netty is being used)...but that's what I don't know.

I see.  As as said, that part is not really a problem. Those 'Java Driver worker' thread are not the Netty workers actually, but they are still not per-session. They are just thread created through an ExecutorService that the Cluster instance uses to do works it doesn't work to block on. Such work include creating per-host connection pools when a new host is added, or when a Session is created. So really, what those messages mean is that 100 tasks are created, one for each session. Now, it happens that in version before 1.0.5/2.0.0-rc1 the driver was not limiting the number of thread the ExecutorService could create, so if 100 session are created basically simultaneously, that thread pool service would create in the order of 100 thread to handle the load of 'connection pool creation' tasks, and once this is done would destroyed those threads because they are not useful. The did fixed that in recent versions of the driver so that no more than 'the number of processor' threads are created.

Thanks for the explanation.


And arguably we could expose configuration to fine tune that ExecutorService, but in any case, this is not directly associated to the number of Sessions and fine tuning that is honestly at best a minor concern (all you might hope to gain by fine tuning is saving a few milliseconds for the creation of your 100 sessions, which sound more effort than is worth).

The benefit of creating/exposing such configuration is that people (i.e. consumers of the driver...with possibly unanticipated use cases) might disagree with your assessment of it being a 'minor concern'.

Yes...configuration complexity is slightly increased...although honestly I happen to think that's a 'minor concern'...but the value is in the flexibility for consumers.

Scott

Sylvain Lebresne

unread,
Feb 11, 2014, 1:16:11 PM2/11/14
to java-dri...@lists.datastax.com

And arguably we could expose configuration to fine tune that ExecutorService, but in any case, this is not directly associated to the number of Sessions and fine tuning that is honestly at best a minor concern (all you might hope to gain by fine tuning is saving a few milliseconds for the creation of your 100 sessions, which sound more effort than is worth).

The benefit of creating/exposing such configuration is that people (i.e. consumers of the driver...with possibly unanticipated use cases) might disagree with your assessment of it being a 'minor concern'.

Yes...configuration complexity is slightly increased...although honestly I happen to think that's a 'minor concern'...but the value is in the flexibility for consumers.

In case that wasn't clear, I actually agree that it's worth exposing (the earlier remarks were about adding a per-session configuration, because we didn't understood what you were talking about exactly, but that one is perfectly reasonable). I was just explaining that it's basically irrelevant to the overall 'lots of Session' problem that started this thread. And it's not yet exposed because I didn't took the time initially and nobody reminded me to do it so far. So thanks for doing it now (a JIRA ticket would be even better so I don't forget once more).

--
Sylvain

Scott Lewis

unread,
Feb 11, 2014, 1:28:39 PM2/11/14
to java-dri...@lists.datastax.com
Hi Sylvain,

On 2/11/2014 10:16 AM, Sylvain Lebresne wrote:
> <stuff deleted>

> reminded me to do it so far. So thanks for doing it now (a JIRA ticket
> would be even better so I don't forget once more).

Yes indeed. Here's a new feature bug:

https://datastax-oss.atlassian.net/browse/JAVA-262

Thanks,

Scott


Giovanni Botta

unread,
Jun 26, 2014, 10:13:22 AM6/26/14
to java-dri...@lists.datastax.com
It seems you've answered your own question. You can indeed use a single Session
through fully qualified keyspace names in the query. And there is no wall to be
hit by doing this. You can also use the QueryBuilder instead of doing
manual string concatenation.

I think this is the best option. I think it's very misleading that the javadoc specifically mentions that you can only use one session per keyspace (http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html). Usage of one session per keyspaces is possible and should be better documented.

Alex Popescu

unread,
Jun 26, 2014, 12:31:33 PM6/26/14
to java-dri...@lists.datastax.com


To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Giovanni Botta

unread,
Jun 26, 2014, 2:52:53 PM6/26/14
to java-dri...@lists.datastax.com
I was talking about the javadoc, but thanks for the article!

Giovanni
Reply all
Reply to author
Forward
0 new messages