Caching Layer Issues

130 views
Skip to first unread message

aakas...@natero.com

unread,
Jun 10, 2015, 12:35:40 AM6/10/15
to jooq...@googlegroups.com
So, I'm working on using Jooq to create a caching layer over Postgres. I've been using the MockConnection/MockDataProvider objects to intercept every query, and this is working, but I'm having a few issues.

First, how do I determine between reads and writes? That is, how do I tell whether a query is an insert/update/etc or a select, given only the MockExecuteContext that's passed into the execute method in MockDataProvider?

And I'm a bit confused on how I can do invalidations. The basic scheme I'm implementing right now is that whenever a "write" query is made to a table, I invalidate all cached queries that involve that table. This goes back to my first question, on telling different types of queries from each other, but also brings up another issue: how would I identify the tables used in a query given only the sql string and the bindings (both are attributes of MockExecuteContext)? 

Also, is this a correct approach at caching? My first thought was to override the fetch() method, but that method is final, and I'd rather not change something already embedded in Jooq itself. This is the only other way I could think of to intercept all requests made so I could create a separate, persistent caching layer. 

I have seen this question, but I'm still not clear on how Lukas recommended to identify tables from the object. I can try to implement a Postgres NOTIFY, but I wanted something native in Jooq first. I've seen this issue pop up a lot too, but I'm not sure how it applies. 

Keep in mind that I'm new to Jooq, so it's quite possible that I'm missing something obvious. 

Thanks!

Lukas Eder

unread,
Jun 11, 2015, 3:14:21 AM6/11/15
to jooq...@googlegroups.com
Hello

(for the record, this discussion is held also on Stack Overflow: http://stackoverflow.com/q/30762177/521799)

I suspect you followed up on our blog post example that we've given here:

We've explicitly left out the topic of cache invalidation, because that's obviously the hardest part of caching in general.

The nature of the MockDataProvider is very low level. We designed it to work also with non-jOOQ JDBC statements, so you don't have access to jOOQ types like Configuration from within a MockDataProvider. The design is mostly suitable for caching master data that hardly ever changes, or that can be invalidated on scheduled basis, rather than on an event basis.

If you really want to implement a fine-grained per-table cache on top of a MockConnection that invalidates every time the underlying table is modified, then you will probably need to implement some tricks using ThreadLocal in order to communicate your table structures to your cache engine.

Another option would be to write a VisitListener / ExecuteListener in addition to the MockDataProvider. The VisitListener could be used to extract tables from a query, whereas the ExecuteListener would activate your MockConnection if needed.

In both cases, NOTIFY is probably the best way to trigger cache invalidations directly from the database, e.g. via triggers on INSERT, UPDATE, DELETE.

Regarding your questions:

2015-06-10 2:44 GMT+02:00 <aakas...@natero.com>:
First, how do I determine between reads and writes?

Within a MockDataProvider, you could resort to using string manipulation. E.g. regexes. jOOQ internally tends to use PreparedStatement.execute(), rather than executeQuery() or executeUpdate(), in order to materialise multiple ResultSets and update counts, so on a JDBC level, you can't be really sure.

If you're using VisitListeners / ExecuteListeners, then you can detect reads / writes more easily.
 
Also, is this a correct approach at caching?

Well, you're writing a very sophisticated cache, and if you execute it well, we'll be very happy to merge that into jOOQ :)
The question is really: Do you need such a low-level, generic, sophisticated cache, or would a simple service-level cache suffice?
 
My first thought was to override the fetch() method, but that method is final, and I'd rather not change something already embedded in Jooq itself

Overriding API might seem like a quick win, but we've evolved this API and its SPIs for a while now, so in the long run, you always want to inject behaviour via SPIs (and request missing features from us).
 
Keep in mind that I'm new to Jooq, so it's quite possible that I'm missing something obvious.

Well, you won't be too new to jOOQ after implementing this, that's for sure. A great way to learn about jOOQ's internals! :)

Hope this helps for starters.
I'm very happy to answer any further, specific questions that you may have.

Cheers
Lukas

aakas...@natero.com

unread,
Jun 11, 2015, 2:30:49 PM6/11/15
to jooq...@googlegroups.com
the ExecuteListener would activate your MockConnection if needed.

How would this work? A MockConnection is registered when the DSLContext object is created, so how would you activate it through a listener?

The VisitListener could be used to extract tables from a query

I was thinking about this. But a VisitListener can't distinguish between reads and writes, and a ExecuteListener can, but it can't extract tables. How could I have them communicate? 

Anyways, thanks for the help! I'll continue thinking about your suggestions and working on this. 

Lukas Eder

unread,
Jun 12, 2015, 3:55:41 AM6/12/15
to jooq...@googlegroups.com
Hello,

2015-06-11 20:30 GMT+02:00 <aakas...@natero.com>:
the ExecuteListener would activate your MockConnection if needed.

How would this work? A MockConnection is registered when the DSLContext object is created, so how would you activate it through a listener?

There is no direct link between the two but you can implement your own "protocol" for the two to communicate. A very simple version using ThreadLocal:

VisitListener:

    if (somecondition)
        Global.threadLocal.set(true);

MockConnection:

    if (Boolean.TRUE.equals(Global.threadLocal.get())
        // Do something

That's of course not very robust. You might find a much better way for the two to communicate in your implementation.

The VisitListener could be used to extract tables from a query

I was thinking about this. But a VisitListener can't distinguish between reads and writes, and a ExecuteListener can, but it can't extract tables. How could I have them communicate? 

The VisitListener can also distinguish between reads and writes, although not as easily as the ExecuteListener. You can always look into VisitContext.clauses() to see what the top-level statement is

aakas...@natero.com

unread,
Jun 15, 2015, 7:10:01 PM6/15/15
to jooq...@googlegroups.com
Hey Lukas,

So I have a basic way to extract tables from the queries our application uses through some regex in MockDataProvider, but I wanted to try using the VisitListener to do this instead, as that seems a more extensible way of implementing this. I was messing around with the Visit Listener, and I had a few questions on an example I've been running:

This is the original query: insert into "org2"."accounts" ("name") values ('acct4'), ('acct5')

And this is some of the output I've printed:

-----------------------------------------

BEGIN CLAUSE LISTING:


INSERT

INSERT_INSERT_INTO

TABLE

TABLE_REFERENCE


END LISTING


QUERYPART LISTING: 

insert into "org2"."accounts" ("name") values ('acct4'), ('acct5')

"org2"."accounts"

"org2"."accounts"

END QUERYPART LISTING

-----------------------------------------

-----------------------------------------

BEGIN CLAUSE LISTING:


INSERT

INSERT_INSERT_INTO

TABLE

TABLE_REFERENCE


END LISTING


QUERYPART LISTING: 

insert into "org2"."accounts" ("name") values ('acct4'), ('acct5')

"org2"."accounts"

END QUERYPART LISTING

-----------------------------------------

-----------------------------------------

BEGIN CLAUSE LISTING:


INSERT

INSERT_INSERT_INTO

FIELD


END LISTING


QUERYPART LISTING: 

insert into "org2"."accounts" ("name") values ('acct4'), ('acct5')

"name"

"name"

END QUERYPART LISTING

-----------------------------------------

-----------------------------------------

BEGIN CLAUSE LISTING:


INSERT

INSERT_INSERT_INTO

FIELD


END LISTING


QUERYPART LISTING: 

insert into "org2"."accounts" ("name") values ('acct4'), ('acct5')

"name"

END QUERYPART LISTING

-----------------------------------------


The first listing is the result of VisitContext.clauses(), and the second is the result of VisitContext.queryParts(). This isn't all of the output; just the first few iterations. Can you give me a general overview of what's happening here? I'm really not understanding.


Thank you!

aakas...@natero.com

unread,
Jun 15, 2015, 7:26:51 PM6/15/15
to jooq...@googlegroups.com
Also, this is from a visitEnd() repeated call.

Lukas Eder

unread,
Jun 16, 2015, 9:26:21 AM6/16/15
to jooq...@googlegroups.com
Hello,

Just to be sure, you removed a lot of query parts in your listing, right? I only see the tables, not the bind variables, for instance...

What's going on here:

causeStart() / clauseEnd() events indicate that a certain "Clause" is starting / ending. A "Clause" is an artificial marker in jOOQ's AST transformation that is used to help identify individual SQL AST elements that are not directly represented by a QueryPart. For instance, the whole SELECT statement is a QueryPart, but the SELECT statement has a SELECT clause, a FROM clause, a WHERE clause, a GROUP BY clause, etc. If you could only listen to "QueryPart" events, it would be hard to distinguish between predicates / conditions from the WHERE clause or the HAVING clause, for instance.

In jOOQ 4.0, we might replace these clauses by actual query parts that model the respective clauses, but we're not sure about the exact design for this, yet.

visitStart() / visitEnd() events simply indicate that a given QueryPart is about to be rendered to the SQL output (or its bind variables extracted).

A high-level outline of a cache invalidation algorithm via VisitListener would include:

- Listen to visitStart() events that involve org.jooq.Table
- Check to see if you're "inside" of the INSERT_INSERT_INTO clause (as opposed to, perhaps, a table that is generated for a subquery inside of the INSERT_VALUES or INSERT_SELECT clause)
- If applicable, invalidate your cache through your own signalling (see suggestions from my previous E-Mail: e.g. via ThreadLocal)

I hope this helps,
Cheers
Lukas

2015-06-16 1:26 GMT+02:00 <aakas...@natero.com>:
Also, this is from a visitEnd() repeated call.

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aakas...@natero.com

unread,
Jun 17, 2015, 4:56:17 PM6/17/15
to jooq...@googlegroups.com
Hey,

I do have invalidation set up (the cache uses Redis, so the ThreadLocal trick didn't end up being necessary), and this did end up working. Your method works like a charm for INSERT/UPDATE/DELETE/MERGE queries, making invalidation by table easy. Right now I'm trying to make it work similarly with arbitrarily nested SELECT queries, which I can already grab tables from using a recursive string-parsing method, but this would be cleaner. Otherwise, my current implementation, using a MockConnection/MockConnectionProvider/MockDataProvider, an ExecuteListener and VisitListener, and Jedis (java client for Redis), is fully functional with table-based invalidation and result storage and works on all types of queries. Thanks for all the help!

Do you have any suggestions on getting the tables of nested select queries (and from joins in those queries) through the VisitListener? Or should I just stick to my string parsing methods?

Lukas Eder

unread,
Jun 18, 2015, 2:41:33 AM6/18/15
to jooq...@googlegroups.com
2015-06-17 22:56 GMT+02:00 <aakas...@natero.com>:
Hey,

I do have invalidation set up (the cache uses Redis, so the ThreadLocal trick didn't end up being necessary), and this did end up working. Your method works like a charm for INSERT/UPDATE/DELETE/MERGE queries, making invalidation by table easy. Right now I'm trying to make it work similarly with arbitrarily nested SELECT queries, which I can already grab tables from using a recursive string-parsing method, but this would be cleaner. Otherwise, my current implementation, using a MockConnection/MockConnectionProvider/MockDataProvider, an ExecuteListener and VisitListener, and Jedis (java client for Redis), is fully functional with table-based invalidation and result storage and works on all types of queries. Thanks for all the help!

Sounds excellent! :)
Do you plan on publishing this work? E.g. via GitHub, or via a blog post? I think it would be very useful for the community.

Do you have any suggestions on getting the tables of nested select queries (and from joins in those queries) through the VisitListener? Or should I just stick to my string parsing methods?

As promised, we've blogged about a similar use-case using a VisitListener to implement row-level security here:

This implementation adds predicates to queries (and subqueries) every time a given table is encountered, in order to protect access to it. Perhaps that's similar to what you've had in mind?

Or what is the use-case of "getting the tables of nested select queries" in your case? Could you provide an example, please?

Cheers,
Lukas

aakas...@natero.com

unread,
Jun 18, 2015, 5:46:14 PM6/18/15
to jooq...@googlegroups.com
Never mind, I ended up finding a solution through the Visit Listener interface. The use case was that I wanted to create Redis lists of queries that involved the same table, so that I could easily invalidate a table when it's written to by getting a list of all the queries in the cache that involve that table, and delete them from Redis. This ended up being a lot easier than I thought it was, and it works as expected now! 

Sure, I can make a blog post about this. I can't put it on Github, because this is a project for the company I'm interning at, so it isn't code I can share, but I can write a post about it, involving a simplified version of my work. 

Lukas Eder

unread,
Jun 19, 2015, 2:29:11 AM6/19/15
to jooq...@googlegroups.com
2015-06-18 23:46 GMT+02:00 <aakas...@natero.com>:
Never mind, I ended up finding a solution through the Visit Listener interface. The use case was that I wanted to create Redis lists of queries that involved the same table, so that I could easily invalidate a table when it's written to by getting a list of all the queries in the cache that involve that table, and delete them from Redis. This ended up being a lot easier than I thought it was, and it works as expected now! 

Very glad it worked out!
 
Sure, I can make a blog post about this. I can't put it on Github, because this is a project for the company I'm interning at, so it isn't code I can share, but I can write a post about it, involving a simplified version of my work.

That would be excellent. Let me know if you need any further help.

Cheers,
Lukas

aakas...@natero.com

unread,
Jun 23, 2015, 6:25:04 PM6/23/15
to jooq...@googlegroups.com
Hey,

So the cache is working now, and it will soon be deployed into our production environment. I'm working on the logical next step, which is to change a lot of our queries to improve cache performance, doing things like reducing the granularity of timestamps (rounding to the nearest hour rather than the current millisecond when we can, for example) so queries can be reused. A major problem with this is that we currently use a lot of JOOQ's automatically generated aliases (so like, alias_393222), which cause unnecessary cache misses. 

We have a lot of queries already written, so I'd like to avoid having to go through and manually attach a .as() statement to each one. Is there anything else I can do to resolve this issue? Maybe edit the query through the VisitListener again or something?

Thanks,

Aakash

aakas...@natero.com

unread,
Jul 11, 2015, 8:02:56 PM7/11/15
to jooq...@googlegroups.com
Hey,

So I ended up making that blog post, you can see it here: http://aakashjapi.com/caching-with-jooq-and-redis/ . Tell me what you think! Also ignore that earlier reply I made to this (that was send for review), I accidentally used my other email. 

aakas...@berkeley.edu

unread,
Jul 13, 2015, 11:36:44 AM7/13/15
to jooq...@googlegroups.com
Hey Lukas,

So I did end up writing that blog post. You can see it here: http://aakashjapi.com/caching-with-jooq-and-redis/ . Tell me what you think!

Lukas Eder

unread,
Jul 13, 2015, 1:34:15 PM7/13/15
to jooq...@googlegroups.com
Hi Aakash,

I'm sorry for the delay - I had been on vacation, and our team hadn't picked up all the open conversations on the user group.

That's an interesting point. SQL doesn't really have a useful syntax for this, although it would be nice to be able to rename the batch of SELECT fields in one go, similar to common table expressions, but without the execution plan implications that CTE's often have (such as materialisation).

The generated aliases use hash codes. Future jOOQ versions might assign indexes instead, but in the past, we've found indexes to be dangerous in the context of potential VisitListeners, plain SQL or other means of interaction...

I'd like to understand more about this use-case. How does it apply, exactly?

Lukas Eder

unread,
Jul 13, 2015, 1:35:20 PM7/13/15
to jooq...@googlegroups.com
2015-07-12 2:02 GMT+02:00 <aakas...@natero.com>:
Hey,

So I ended up making that blog post, you can see it here: http://aakashjapi.com/caching-with-jooq-and-redis/ . Tell me what you think! Also ignore that earlier reply I made to this (that was send for review), I accidentally used my other email.

Great news! I'll read through it in a bit and give you a reply by the end of the week!

Thanks a lot for providing all this insight. This is extremely useful for us, and for the community.

Cheers
Lukas

Lukas Eder

unread,
Jul 17, 2015, 3:26:29 AM7/17/15
to jooq...@googlegroups.com
Hi Aakash,

First of all: I think this is the first post on your blog, right? Excellent first-time post, and we're very flattered that you've written about jOOQ as your first post.
I've already had a couple of friends point the post out to me in the meantime. I've also noticed that you've posted it to /r/programming. It is a difficult environment. Have you considered /r/java as well? But it's a challenging topic. People might not read to the end before giving feedback...

Now for a bit of feedback from my side: 

Methodology:

I really like the simplicity of the examples and how you build the article always pointing out that "in case of this or that added / removed complexity, a different route might be chosen". I think a new user will understand the purpose of your MockDataProvider implementation. So, definitely great kudos for your writing style!

Performance:

Perhaps, there are a couple of performance considerations to be made, as these SPI implementations really heavily influence the overall performance of your application (e.g visitEnd is really called for each and every QueryPart in each and every query that you're running). As always with such optimisations, measuring is important to see if the effect is significant:

1. in your MockDataProvider implementation, I'd delay generating the inlined SQL until it is really needed. The regular expression can match also on the SQL statement containing bind variables, i.e. the one from ctx.sql().
2. in your visitStart() implementation, I suspect that you could abort execution early if the ctx.queryPart() instanceof Table check fails. This would save you from creating the Clause[] array and iterating through it.
3. the same is true for your visitEnd() implementation.

While such optimisations are usually premature in normal environments, they're not in SPI implementations. I've tried to document this in the following blog post:

It would certainly be interesting to point this out to the reader that they should be extra wary about performance within the VisitListener implementation. In fact, I'll add remarks to the Javadoc right now.

Correctness:

An interesting bit to mention would be the fact that your current visitEnd() implementation might not be entirely correct when you're in a derived table (although, the current example probably works for your specific use-case). You'll still be within the SELECT_FROM clause, but a new subselect has "started", and you should reset the nesting scope, because if you're in a subselect, you will also have to be inside the SELECT_FROM clause of that subselect, not only inside the one from the top-level select. Of course, that would be excessive for your examples, but it might be worth mentioning. Let me know if this makes sense.

Feedback:

What would be nice (but perhaps it's a little early for this) would be a section in the conclusion listing advantages and drawbacks of your approach. Perhaps, there are performance implications. Perhaps there were bugs or caveats in jOOQ or in redis, etc. You may be critical, of course :-)



In any case, thanks again very much for this very useful article. We've linked it from jooq.org/community (you're lucky, the list is sorted alphabetically) and will keep posting it from our @JavaOOQ twitter account a couple of times. And of course, if you are interested in writing a follow-up post, I'll be very happy to assist you!

Cheers,
Lukas

Aakash Japi

unread,
Jul 18, 2015, 9:05:13 PM7/18/15
to jooq...@googlegroups.com
Hey Lukas,

About the correctness issue: so currently, in my production code, I have a check that makes sure that if something is inside that SELECT_FROM and is an instance of a table, it doesn't also have the word "select" or "join" in it, which ensures that the only things extracted are actual table names (of course, this assumes that none of our table/column names have the words "select" or "join" in them, which currently they don't). I think this does what you were talking about: resetting the scope if the current querypart is a nested query. I also thought it was a bit excessive to add this check to the post's example, but I can definitely add a note about it.

Your performance improvements are noted. I'll go try them out this week when I get back to work.

As for the feedback, we're still in the process of fully deploying this system right now, but I definitely can make a follow-up post about how we deployed this architecture onto a production system, how it handled load, how much it improved performance, etc. I think that would be a good thing to see for anyone wanting to do something similar. 

Anyways, thanks for all the help!

From,
Aakash




--
You received this message because you are subscribed to a topic in the Google Groups "jOOQ User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jooq-user/ayTAxqTXozw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jooq-user+...@googlegroups.com.

she...@sminq.com

unread,
Sep 29, 2015, 5:26:18 AM9/29/15
to jOOQ User Group
Hi Aakash, Lukas

We have recently deployed our app using the same caching technique (great article Aakash), and its working quite well. We have now included the optimisations pointed out by Lukas and will check the performance after this update.

Regards,
Sheldon

Lukas Eder

unread,
Sep 29, 2015, 9:34:14 AM9/29/15
to jooq...@googlegroups.com
Great, thanks Sheldon. Looking forward to hearing more about your experiments. Another blog post might be great, too! :-)

Cheers,
Lukas
Reply all
Reply to author
Forward
0 new messages