Caching the PreparedStatement (Java driver)

590 views
Skip to first unread message

Ajay

unread,
Feb 27, 2015, 1:04:06 PM2/27/15
to user, java-dri...@lists.datastax.com
Hi,

We are building REST APIs for Cassandra using the Cassandra Java Driver.

So as per the below guidlines from the documentation, we are caching the Cluster instance (per cluster) and the Session instance (per keyspace) as they are multi thread safe.
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html

As the Cluster and Session instance(s) are cached in the application already and also as the PreparedStatement provide better performance, we thought to build the PreparedStatement for REST query implicitly (as REST calls are stateless) and cache the PreparedStatemen. Whenever a REST query is invoked, we look for a PreparedStatement in the cache and create and put it in the cache if it doesn't exists. (The cache is a in-memory fixed size LRU based).

Is a safe approach to cache PreparedStatement in the client side?.  Looking at the Java driver code, the Cluster class stores the PreparedStatements as a weak reference (to rebuild when a node is down or a  new node added).

Thanks
Ajay

Olivier Michallat

unread,
Feb 27, 2015, 7:35:30 PM2/27/15
to java-dri...@lists.datastax.com
Hi Ajay,

Yes, it is safe to hold a reference to PreparedStatement instances in your client code. If you always run the same pre-defined statements, you can store them as fields in your resource classes.

If your statements are dynamically generated (for example, inserting different subsets of the columns depending on what was provided in the REST payload), your caching approach is valid. When you evict a PreparedStatement from your cache, the driver will also remove the corresponding id from its internal cache. If you re-prepare it later it might still be in the Cassandra-side cache, but that is not a problem.

One caveat: you should be reasonably confident that your prepared statements will be reused. If your query strings are always different, preparing will bring no advantage.

--

Olivier Michallat

Driver & tools engineer, DataStax


To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Ajay

unread,
Feb 27, 2015, 10:47:56 PM2/27/15
to java-dri...@lists.datastax.com

Thanks Olivier.

Most of the REST query calls would come from other applications to write/read to/from Cassandra which means most queries from an application would be same (same column families but different  values).

Thanks
Ajay

Ajay

unread,
Feb 28, 2015, 7:26:00 AM2/28/15
to java-dri...@lists.datastax.com, user
Hi,

My earlier question was whether it is safe to cache PreparedStatement (using Java driver) in the client side for which I got it confirmed by Olivier.

Now the question is do we really need to cache the PreparedStatement in the client side?.

Lets take a scenario as below:

1) Client fires a REST query "SELECT * from Test where Pk = val1";
2) REST service prepares a statement "SELECT * from Test where Pk = ?"
3) Executes the PreparedStatement by setting the values.
4) Assume we don't cache the PreparedStatement
5) Client fires another REST query "SELECT * from Test where Pk = val2";
6) REST service prepares a statement "SELECT * from Test where Pk = ?"
7) Executes the PreparedStatement by setting the values.

In this case, is there any benefit of using the PreparedStatement?

    From the Java driver code, the Session.prepare(query) doesn't check whether a similar query was prepared earlier or not. It directly call the server passing the query. The return from the server is a PreparedId. Do the server maintains a cache of Prepared queries or it still perform the all the steps to prepare a query if the client calls to prepare the same query more than once (using the same Session and Cluster instance which I think doesn't matter)?.

Thanks
Ajay

Vishy Kasaravalli

unread,
Feb 28, 2015, 2:16:14 PM2/28/15
to java-dri...@lists.datastax.com, user
On Feb 28, 2015, at 4:25 AM, Ajay <ajay....@gmail.com> wrote:

Hi,

My earlier question was whether it is safe to cache PreparedStatement (using Java driver) in the client side for which I got it confirmed by Olivier.

Now the question is do we really need to cache the PreparedStatement in the client side?.

Lets take a scenario as below:

1) Client fires a REST query "SELECT * from Test where Pk = val1";
2) REST service prepares a statement "SELECT * from Test where Pk = ?"
3) Executes the PreparedStatement by setting the values.
4) Assume we don't cache the PreparedStatement
5) Client fires another REST query "SELECT * from Test where Pk = val2";
6) REST service prepares a statement "SELECT * from Test where Pk = ?"
7) Executes the PreparedStatement by setting the values.


You should avoid re-preparing the statement (step 6 above). When you create a prepared statement, a round trip to server is involved. So you should create it once and reuse it. You can bind it with different values and execute the bound statement each time. 

ja

unread,
May 15, 2015, 8:58:02 AM5/15/15
to java-dri...@lists.datastax.com, us...@cassandra.apache.org
Hi,

Isn't it a good to have feature for the java driver to maintain a cache of PreparedStatements (PS) . Any reason why it's left to the application to do the same? . I am currently implementing a cache of PS that is loaded at app startup, but how do i ensure this cache is always good to use? . Say, there's a restart on the Cassandra server side, this cache would be stale and I assume the next use of a PS from cache would fail. Any way to recover from this.

Thanks,
Joseph

Vishy Kasaravalli

unread,
May 15, 2015, 12:43:25 PM5/15/15
to java-dri...@lists.datastax.com, user

Answers in-ilne. 

On Feb 28, 2015, at 4:25 AM, Ajay <ajay....@gmail.com> wrote:

Hi,

My earlier question was whether it is safe to cache PreparedStatement (using Java driver) in the client side for which I got it confirmed by Olivier.

Now the question is do we really need to cache the PreparedStatement in the client side?.

Lets take a scenario as below:

1) Client fires a REST query "SELECT * from Test where Pk = val1";
2) REST service prepares a statement "SELECT * from Test where Pk = ?"
3) Executes the PreparedStatement by setting the values.
4) Assume we don't cache the PreparedStatement
5) Client fires another REST query "SELECT * from Test where Pk = val2";
6) REST service prepares a statement "SELECT * from Test where Pk = ?"
7) Executes the PreparedStatement by setting the values.

In this case, is there any benefit of using the PreparedStatement?

No. On the contrary, there is a disadvantage of extra round trip to server for each query in this case. 


    From the Java driver code, the Session.prepare(query) doesn't check whether a similar query was prepared earlier or not. It directly call the server passing the query. The return from the server is a PreparedId. Do the server maintains a cache of Prepared queries or it still perform the all the steps to prepare a query if the client calls to prepare the same query more than once (using the same Session and Cluster instance which I think doesn't matter)?.

Server maintains a Map of prepareId, CQLStatement. To prepare a query, all the that server does is take a hash of the query string which becomes prepareId and stick that in the Map. It does the same thing irrespective of same query prepared in the past or not. 

Today java-driver has no knowledge of how to map prepare statement to prepareId. Hence it can not do any optimization to cut off the extra round trip to server even when the same query is repeatedly prepared by the client. 

However if the java-driver  “understands” the mapping of prepare statement to prepareId, it can cut off the extra round trip by looking up its own cache. This also opens up number of other optimizations related to creation of prepared statements that will be super helpful when the number of prepared statements and the number of cassandra instances are large. 

There are number of optimizations possible in preparation of prepared statements. Follow along these if interested:

https://datastax-oss.atlassian.net/browse/JAVA-714 : Avoid needless round trips to server to create prepared statements during session init
https://datastax-oss.atlassian.net/browse/JAVA-658 : Optimize prepareAllQueries in onUp
https://datastax-oss.atlassian.net/browse/JAVA-725 : SessionManager should invoke PREPARE on all nodes in parallel

Ajay

unread,
May 15, 2015, 1:02:18 PM5/15/15
to java-dri...@lists.datastax.com, user
Hi Joseph,

Java driver currently caches the prepared statements but using a weak reference i.e the cache will hold it as long the client code uses it. So in turn means that we need to cache the same.

But I am also not sure of what happens when a cached prepared statement is executed after cassandra nodes restart. Does the server prepared statements cache is persisted or in memory?. If it is in memory, how do we handle stale prepared statement in the cache?

Thanks
Ajay

Ajay

unread,
May 15, 2015, 1:21:29 PM5/15/15
to java-dri...@lists.datastax.com
Thanks Vishy.

Does it mean that as part of these tickets, the java-driver will understand how to map prepareId and prepared statement and vice versa?. If that is the case, it could be nice to expose an API in the session to reload a prepared statement given a prepareId. As I understand prepared statement is required to create BoundStatement and to bind values for which the query metadata is required (like data type of each bind variables and so on).

Then the client could have a LRU cache for prepared statement (to avoid ever growing memory) and when a  prepared statement not found in the cache, it can be reloaded.

Thanks
Ajay

Joseph Anish Alex

unread,
May 16, 2015, 12:27:31 AM5/16/15
to java-dri...@lists.datastax.com
Thanks, Ajay. As you mentioned , if the driver PS cache is exposed to the client (via session), it would also help in re-initializing the PS if server nodes are restarted in addition to alleviating the need for maintaining a cache on client side. Currently, as i understand it, such exceptions would need to be handled by the client and statement will have to be re-prepared in such scenarios. 

-Joseph

Ajay

unread,
May 16, 2015, 12:55:59 AM5/16/15
to java-dri...@lists.datastax.com
Joseph,

As per Tyler, the client need not handle stale prepared statements as cassandra and java-driver handle it automatically behind the scene.

But I think it still make sense to have an API in the session to load the prepared statement given prepareId for scenarios as below:

1) When we have webservices using java-driver, the requests can be handled by different JVMs (different Tomcat instance in our case), the 1st JVM will prepare the statement and the subsequent requests might be handled by different JVM and it will not find it in its in memory cache and so have to prepare the statment again.

But I remember it is mentioned somewhere that prepared statement is tied upto the session which prepares it. So not sure whether it can be used from a different session connected to the same cluster.

2) Secondly as I mentioned earlier, it helps to have a LRU based cache for prepared statements and if not found in cache, can be reloaded.

Thanks
Ajay

Ajay

unread,
May 16, 2015, 12:57:36 AM5/16/15
to java-dri...@lists.datastax.com

Joseph,

As per Tyler, the client need not handle stale prepared statements as cassandra and java-driver handle it automatically behind the scene.

But I think it still make sense to have an API in the session to load the prepared statement given prepareId for scenarios as below:

1) When we have webservices using java-driver, the requests can be handled by different JVMs (different Tomcat instance in our case), the 1st JVM will prepare the statement and the subsequent requests might be handled by different JVM and it will not find it in its in memory cache and so have to prepare the statment again.

But I remember it is mentioned somewhere that prepared statement is tied upto the session which prepares it. So not sure whether it can be used from a different session connected to the same cluster.

2) Secondly as I mentioned earlier, it helps to have a LRU based cache for prepared statements and if not found in cache, can be reloaded.

Thanks
Ajay

Reply all
Reply to author
Forward
0 new messages