Experiences with jOOQ / join deduplication

97 views
Skip to first unread message

Oskar Norrback

unread,
Nov 24, 2014, 6:33:51 AM11/24/14
to jooq...@googlegroups.com
Hello guys!

We are a new startup and decided to try jOOQ instead of JPA in the backend of our first product and we haven’t regretted it so far. We wrote a blog post summarizing our experiences and thoughts with jOOQ here.

The biggest minus with using jOOQ so far has been the “join deduplication boilerplate” issue I mention in the blog post. 

Does my suggestion of adding some basic association support between the generated Record types make sense to anyone else? E.g. using my blog example: that a CustomerRecord can hold a list of OrderRecords?

The next step would then be that I could automagically convert a Result of a CUSTOMER.leftJoin(ORDER) query into CustomerRecords with the orders populated. Currently I have to write this logic myself, repeatedly in slight variations and it feels like it's boilerplate that could perhaps come with the library. Looks like at least some of the Python ActiveRecord implementations support associations to some extent (SQLObject, SQLAlchemy). I'm not looking for automatic fetching of associations (and the can of worms that comes with), just something that would simplify the step of converting JOIN results into Records with associations.


Regards,

Oskar Norrback

Lukas Eder

unread,
Dec 2, 2014, 3:18:50 AM12/2/14
to jooq...@googlegroups.com, Oskar Norrback
Hi Oskar,

I've left this discussion for others to answer, but it looks as though there hasn't been much feedback so far.

We are aware of the lack of "join deduplication" support being a certain limitation to jOOQ (or to any SQL oriented approach). As I told you before off the list, the main reason for this problem is the fact that by using JOIN or LEFT JOIN, you are effectively denormalising your result set, bringing it into a tabular form that doesn't contain the full information of your original join path anymore. Automatic "deduplication" isn't really possible even if it appears to be in very simple cases.

HQL and JPQL do not have this problem, because they are languages tailored precisely to navigate object graphs and you have no control over the number of SQL queries that are effectively generated under the hood in order to materialise such an object graph. While this can be very convenient, their approach leads to many other missing parts, e.g. UNIONs, FULL OUTER JOINs, CTE and derived tables and many other SQL features that can only be done with SQL, which again doesn't work well with second-level caching, which again is needed to navigate, and more importantly, persist the object graph efficiently.

The way more SQL-oriented APIs usually work around this limitation is by providing synthetic join operations that are not really SQL joins. We've been looking into this kind of solution in the past but haven't come around to implementing it. For instance, LLBLGen calls these things "prefetch paths":

The can of worms you refer to is probably inevitable. Consider the following section from the SQLAlchemy manual:

The Importance of Ordering

A query which makes use of subqueryload() in conjunction with a limiting modifier such as Query.first(),Query.limit(), or Query.offset() should always include Query.order_by() against unique column(s) such as the primary key, so that the additional queries emitted by subqueryload() include the same ordering as used by the parent query. Without it, there is a chance that the inner query could return the wrong rows


As soon as a framework adds synthetic clauses to the SQL language, those clauses will have a huge impact on all of the language. This is true for jOOQ more than for any of the mentioned SQL APIs, as jOOQ really models all of SQL as a language.

Long story short, we have recognised this limitation and we're thinking about it and we also appreciate all feedback from the community with respect to possible solutions. But it is not at all an easy problem to solve thoroughly. As a library vendor, there's no such thing as a "not automatic but just something" solution, I'm afraid.

Having said so, if there is a quick win where we could add some convenience API, that'll do of course. We're constantly adding new methods that do similar things as the existing ones but with less code.

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lukas Eder

unread,
Dec 2, 2014, 5:14:11 AM12/2/14
to jooq...@googlegroups.com, Oskar Norrback
I've browsed through the archives of discussions and reflections on this matter and I now remember this piece. The LEFT OUTER JOIN approach to denormalising relations prior to serialisation over the wire is actually completely wrong if you want to model queries to an object graph in Java. Consider the following Java "schema"

class Author {

    // The author name
    String name;

    // The books he's written
    Book[] books;

    // The films he's written
    Film[] films;
}

If you had LEFT OUTER JOINed both BOOK and FILM to the AUTHOR table, you'd wind up with a cartesian product BOOK × FILM, which has no meaning at all. LEFT OUTER JOIN seems to be useful at first sight when navigating down a single acyclic path of relationships. As soon as you will encounter cycles or multiple paths, denormalisation will prevent you from further pursuing this solution approach.

There are essentially two solutions to this problem:

1. Run multiple queries

If I'm not mistaken, that's what Hibernate does. You can easily run a first query for all the authors, and then run a second query to get all the books and a third one to get all the films:

SELECT * FROM author WHERE [ some predicate ]
SELECT * FROM book WHERE author_id IN (SELECT id FROM author WHERE [ some predicate ])
SELECT * FROM film WHERE author_id IN (SELECT id FROM author WHERE [ some predicate ])

The above is optimal. Less optimal would be to pass in an IN-predicate list of author IDs in both subsequent queries. Even worse would be N+1, i.e. fetch each book/film for each author one record at a time.

2. Do it the SQL way

The SQL standard knows the concept of collection types, such as arrays, sets, or multisets. These can be used to create nested sets, which would really be the best match for object-graph queries.

SELECT 
    a.name
    MULTISET(SELECT * FROM book WHERE author_id = a.id) books
    MULTISET(SELECT * FROM film WHERE author_id = a.id) films
FROM
    author a
WHERE
    [ some predicate ]

Unfortunately, only few databases implement these operators, e.g. Oracle or Informix. If they did, we would have never gone through the hassle of implementing ORMs
Reply all
Reply to author
Forward
0 new messages