Please share your thoughts on how to replace JPA/Hibernate by jOOQ [Was: New to JOOQ - Loading and saving entities that contain children entities]

2,645 views
Skip to first unread message

Lukas Eder

unread,
Jul 27, 2013, 11:58:52 AM7/27/13
to jooq...@googlegroups.com, elect...@gmail.com
Thanks electrotype, for sharing this on the jOOQ User Group!
Myself, I won't respond to your request here for two reasons:

1. I have already given an answer to your SO question.
2. I am very interested in an "unbiased" feedback from the user group.

So please, everyone who is interested in this discussion: 

Feel free to post your opinion / share your experiences on this topic

Cheers
Lukas

2013/7/27 <elect...@gmail.com>
Hi! I'm new to JOOQ!

I'm looking for a solution to replace JPA/Hibernate as the persistence engine in my new Java web application. I'm looking for something closer to SQL and JOOQ seems to be a really good fit!

But I have concerns about how JOOQ can replace Hibernate in some use cases, in particular for loading and saving entities that contain children entities.

Lukas has already helped me on the question but I would like more details. Any help from the JOOQ community would be really appreciated!

Since I already asked my question on StackOverflow, I think the best is to link to it rather than to copy it here (and there is a 100 points bounty on it!) :

http://stackoverflow.com/questions/17860161/replacing-a-full-orm-jpa-hibernate-by-a-lighter-solution-recommended-pattern


Thanks a lot in advance for any input!

electrotype


--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

elect...@gmail.com

unread,
Jul 27, 2013, 12:07:18 PM7/27/13
to jooq...@googlegroups.com, elect...@gmail.com
Thanks for that emphasis on my question Lukas.

Your help and the fact that you are very open-minded about discussing the pros and cons of different solutions really makes JOOQ attractive to me!

Lukas Eder

unread,
Jul 28, 2013, 4:53:32 AM7/28/13
to jooq...@googlegroups.com, elect...@gmail.com

2013/7/27 <elect...@gmail.com>

Thanks for that emphasis on my question Lukas.

Your help and the fact that you are very open-minded about discussing the pros and cons of different solutions really makes JOOQ attractive to me!

Thanks for that feedback. Well, even if jOOQ solves a lot of problems, it's not a magic bullet either. Every mature idea has its merits. This includes JPA/Hibernate of course, which are excellent tools for their specific problem domain.

Cheers
Lukas

khunye...@gmail.com

unread,
Jul 28, 2013, 12:26:12 PM7/28/13
to jooq...@googlegroups.com
A bit long, sorry.

This is just my experience (partially) replacing OpenJPA with JOOQ in a few projects that have around 250 tables. Not sure it is useful for anybody, but the system is being used right now in the wild, so it might count for something for some people.

I used Hibernate for a few years before switching over to OpenJPA (solely for its compile-time enhancement ability at that time). And for the last six months, my team has used JOOQ along side with OpenJPA. Basically, we use OpenJPA when we need to do any insert/update/delete. We use OpenJPA when it is easier to do, and  for small queries where the query time is not important. Of course, the old code with OpenJPA stays the same until we feel the need to:

1. To speed up the query time.
2. Whenever we need to modify any query that is in JPQL.

Our systems have around 250 tables, so replacing all OpenJPA code with JOOQ all at once is out of the question. And we do like the way JPA handles cascade deletes and insertions and refreshes. The easy way to get a transaction is good as well.

I never benchmarked the speed-up, but our JOOQ queries seem to be at least one order of magnitude faster than the OpenJPA JPQL queries they replaced.  That is the reason I started using JOOQ when I first became aware of it (when we needed to get up to 1,000 rows from queries with many joined tables in under 2 seconds or so, and I was spending way too much time fiddling with FetchType.EAGER and FetchType.LAZY and not on the business logics). Our schemas are broadly in 3NF, so joining many tables together is very common for us, sometimes with many left outer joins as well.

But speed is not everything for us or we would have used JDBC directly. The type-safety of JOOQ is very useful for us (except that we sometimes have 30 to 40 columns in a select. We can't do a select() without specifying the columns all the time because we would run out of memory as sometimes we do need to join 20 tables at the same time, for instance. As usual, we found out that we can't always use limits and first result row to limit how many rows we might get back with a query. Any solution other than specifying exactly which columns we want? We do re-use many queries, so sometimes we get runtime errors when we miss specifying a column in the select.). 

Ever since we started using JOOQ, we refactor the database schema even more often as the system evolves, which is a huge advantage for us. JPQL was really that big an inhibitor for us to change anything in the schema at all. Adding tables, etc., is fine, but changing names and such is not fun with JPQL queries.

Can't emphasize enough how useful syntax errors are whenever we refactor a database schema and rebuild the JOOQ jar we have. We are a 3-person team with one tester, so we prefer front-loading everything as much as possible. A syntax error is always much easier to handle than a support call from the field because of a spelling mistake in a JPQL statement and a missed regression test case (as said before, it is still possible to get runtime issues in JOOQ if we miss specifying a column in a select, of course).  As it happened, I just had that dreadful feeling this morning, trying to find all occurrences of a column in all the JPQL queries we have, just because we changed the name of the column to something more appropriate.

The main thing that I wish I could do easier in JOOQ is probably something that is impossible to do in a framework like JOOQ: the ability to write expressions involving the fields in a shorter way, and still make it type-safe. But it is still only a one-time thing to make sure the expression is correct. Can be difficult to figure out what it is later on though:

CAL_IN_WEEKLY.DAY_OF_WEEK.equal(weekday).and(
    CAL_IN_WEEKLY.START_WEEK_NUMBER.lessOrEqual(startingWeekNumber).and(
(val(startingWeekNumber).sub(CAL_IN_WEEKLY.START_WEEK_NUMBER)).mod(
   CAL_IN_WEEKLY.FOR_EVERY).equal(val(0))))

Counting the parentheses does remind me of my Lisp coding days. Of course, if this is the wrong way to do, and there is an easier way, I am all ears. By the way, I still prefer this than JPQL.

So, given what I know now and with my experience with both JPA and JOOQ, if I had to start a project from scratch, I would use OpenJPA for inserts/updates/deletes, and JOOQ for everything else. And absolutely no JPQL.

Christopher Deckers

unread,
Jul 28, 2013, 5:45:37 PM7/28/13
to jooq...@googlegroups.com
Hi all,

If I understand the question, electrotype wants advices on how to move from an object hierarchy tree to a record-based model.

I personally do not use jOOQ, but we have our in-house mechanism which is very similar, so I think I can answer the question.

Looking at your example:
findPersonById_simple(long id)
findPersonById_withAdressCity(long id)
findPersonById_withJob(long id)
findPersonById_withAdressCityAndJob(long id)
... we do have these but in a more simple version:
findPersonById(long id, boolean isLight)
so we either get an object containing only what we consider fundamental data, or all sorts of sub-relations.

The "light" aspect started to be odd as the product grew, because the granularity of light/full was not enough. We generaly now favor smaller objects, where related data is retrieved as a separate entity. These separate entities are not fields of the main entity. Note that in our case, the bottleneck is generally on the number of such entities, so we have these kinds of calls:
findPersonByIds(long[] ids)
findPersonAdressCity(Person[] persons)
findPerson
Jobs(Person[] persons)
If we have 1000 people for which we want all this data, we only issue 3 queries. In fact, some of the above calls like findPersonByIds may not even go to the database because we have all sorts of caching in place for frequently requested data. And in some specific cases, we even have dedicated upfront calls like "loadPersonsAndJobsCaches" so that individual calls to our persistence layer do not go to the database.
That being said, if we find a particular area of the product to have performance issues, we can define new API to retrieve all of this in a single query and return the result as an aggregate (like maps relating all the persons to their jobs, etc.).

When saving data, we track what is modified, and issue the right calls on the right objects, eventually using batches. We don't magically process a whole object hierarchy.


Please do take note that the above approaches are what works for our product and may not be appropriate for all applications!


I also second Khun Yee Fung when he says how useful jOOQ syntax errors are when changing the database: this is a great time saving feature. In fact, because we have those errors, the cost of modifying the database structure dropped to a point that we do not hesitate and aggressively change it whenever we feel it is needed or desirable.


I would also like to mention tools or facilities that revolve around jOOQ that are worth considering.
For us, the jOOQ Console is what boosted our development. When we have a doubt or database-related issue, we open it, log our requests, see the times they take or the number of rows that are returned, perform some ad-hoc queries to better analize the problem and look at the stack traces to see where in our application the code is misbehaving.
We also use the execute listeners to perform all sorts of query manipulations and statistics. One example from our nightly test run is a data import which should be a no-op if the data is already there. So our test imports data in a fresh database, and re-imports it with execute listeners that track insert/update statements with a few known exceptions. It saved us more than once from bad equals() implementations that would result in bloated databases and all sorts of other bugs!

Hope this helps,
-Christopher

elect...@gmail.com

unread,
Jul 28, 2013, 9:42:00 PM7/28/13
to jooq...@googlegroups.com
Thanks to both of you for those very interesting and complete replies!



The "light" aspect started to be odd as the product grew, because the granularity of light/full was not enough. We generaly now favor smaller objects, where related data is retrieved as a separate entity. These separate entities are not fields of the main entity.
 
Chrriis, if I understand well, you would use entities like :
------------

public class Person
{
    long id;
    String name;
    String someOtherSimpleProperty;
}

public class Address
{
    long id;
    String streetName;
    String someOtherSimpleProperty;
}

public class PersonAddress
{
        Person person;
        Address address;
}
------------
So there is no "address" property directly in the Person entity...You actually have to retrieve the associated PersonAddress in a separate request. And all "associations of entities" are managed as standalone objects?

If a method needs to do some processing on a Person and needs to access some of its associated entities, it would have a signature like :

------------
public void doSomeProcessingOnAPerson(Person person, PersonAddress personAddress, PersonJob personJob, Set<Person> personFriends, ...)
{
    //...
}
------------

Instead of :

------------
public void doSomeProcessingOnAPerson(Person person)
{
    //...
}
------------

Do I understand you correctly?

Thanks again for the help!


Durchholz, Joachim

unread,
Jul 29, 2013, 5:08:23 AM7/29/13
to jooq...@googlegroups.com
There are various things that cause difficuilties in Hibernate, and that I hope will be easier with Jooq (still not using Jooq yet, unfortunately):
 
1) Session demarcation. Any exception invalidates the Session, which means you lose the running transaction and the only recourse is to recreate a Session (easy), reload all data (now where's that promise you can code with entities as if they were standard Pojos...), redo any modifications (bad if that was a lot of work) and commit again (that's easy, again). I hope that Jooq does not tie transactional logic to any fragile resources like that.
 
2) It's possible to control which dependent entities are autoloaded in Hibernate when loading a main entity, but the API stinks. Several layers of fallbacks, needs twiddling in at least two places which use gratuitiously different terminology, and in some situations the setting is advisory only i.e. Hibernate's behaviour may vary after an upgrade.
I think you always control that explicitly in Jooq.
 
3) Hibernate has three entirely different APIs (Criteria, HQL, SQL), with different abilities and different semantics (SQL doesn't affect the session cache, for example; Criteria and HQL have a largely overlapping feature set and semantics but some things that you can do you cannot do in the other).
If you find that one API does not suit you, you need to rewrite, and the APIs are so drastically different that you can throw away the Criteria code you have if you need HQL.
Jooq doesn't have this kind of largely overlapping (hence redundant), semantically different (hence requiring full rewrites) nonsense. At least I hope so :-)
 
4) Hibernate is a bit thin on documenting details. Docs are mostly there, but sometimes distributed across official docs, tips&tricks sections, and books; plus, sometimes the docs are incomplete and you have to experiment (and hope that behaviour doesn't change during the next update).
Can't say much about Jooq's documentation since I haven't used it in anger yet, but what I did see didn't look like it left many stones unturned.
 
5) Hibernate is full of subtle semantic distinctions. They're so subtle (but important) that, sometimes, the Hibernate team decides to correct a mistake. Since the issues are subtle, you don't know your usage of Hibernate was affected until the upgrade hits you right in the face.
Jooq has a simpler semantics. That's partly because it doesn't try anything fancy with a session cache. (There have been moves towards subtler semantics though. I'm a bit worried there.)
 
Just my 2c.
Not all of these points are worded particularly carefully, and some issues I haven't thought through to their end; it's more or less my current view of things.
HTH.

Christopher Deckers

unread,
Jul 29, 2013, 6:00:18 AM7/29/13
to jooq...@googlegroups.com
Hi electrotype,


Chrriis, if I understand well, you would use entities like :
------------

public class Person {
    long id;
    String name;
    String someOtherSimpleProperty;
}

public class Address {
    long id;
    String streetName;
    String someOtherSimpleProperty;
}

If an address is bound to a person and retrieved passing Person objects, then we could have the Person as a field of Address. A leaf can know its parents if it is always retrieved through the parent.
This means that:
 
public class PersonAddress {
        Person person;
        Address address;
}

... may not be useful.

 
So there is no "address" property directly in the Person entity...You actually have to retrieve the associated PersonAddress in a separate request. And all "associations of entities" are managed as standalone objects?

Yes. The idea is that there are so many objects and relations in the system that it does not make sense to retrieve all of it. For example, the address of a person is mostly useful on the person definition screen. Eventually we ouptut that information in reports too. But persons are also of interest for some contracts in which case addresses are not needed on those contract screens, etc. We don't want to load a person, its contracts, addresses, and all sorts of related objects, so each screen or area of the product retrieves what it needs (with or without caching at the persistence layer).

 
If a method needs to do some processing on a Person and needs to access some of its associated entities, it would have a signature like :

------------
public void doSomeProcessingOnAPerson(Person person, PersonAddress personAddress, PersonJob personJob, Set<Person> personFriends, ...) {
    //...
}

Well, yes, though generally the method that performs the processing is able to retrieve associated data itself. So we do have such signatures:
 
------------
public void doSomeProcessingOnAPerson(Person person) {
    //...
}

where the call internally calls the person's address, job, etc. by performing other calls. Generally, we pass data like in your first signature when saving a modification to one of the associated objects.

In our product, we have a business layer that exposes all the above API. Each call can internally call other business services. Business services related to database stuff have a persistence layer for their data retrieval. We have some caching at the business or persistence layer depending on what makes sense.

I want to stress again that this is how it is done in our application, and there might be other ways/patterns better suited at your use cases.

-Christopher

Lukas Eder

unread,
Jul 31, 2013, 8:38:19 AM7/31/13
to jooq...@googlegroups.com

2013/7/29 Durchholz, Joachim <Joachim....@hennig-fahrzeugteile.de>

There are various things that cause difficuilties in Hibernate, and that I hope will be easier with Jooq (still not using Jooq yet, unfortunately):

When will you finally use it! :-)
I'm looking forward to tons of feedback from you, once you're starting to delve into details...
 
1) Session demarcation. Any exception invalidates the Session, which means you lose the running transaction and the only recourse is to recreate a Session (easy), reload all data (now where's that promise you can code with entities as if they were standard Pojos...), redo any modifications (bad if that was a lot of work) and commit again (that's easy, again). I hope that Jooq does not tie transactional logic to any fragile resources like that.

Hmm, I'm surprised that you have to reload all data when you get an exception. It's a common pattern to try to insert something, check for a constraint violation, and then insert something else or update something. (I know, race conditions, but sometimes that's the easiest way). So you're saying, with Hibernate you cannot make use of exceptions from constraint violations, or raised from stored procedures?

jOOQ does not interact with your sessions / transactions / connections directly. Since jOOQ 3.0, you're in full control of those lifecycles through the ConnectionProvider SPI:

There are no plans of adding anything like transaction / session support any time soon. Other tools do that much better.
 
2) It's possible to control which dependent entities are autoloaded in Hibernate when loading a main entity, but the API stinks. Several layers of fallbacks, needs twiddling in at least two places which use gratuitiously different terminology, and in some situations the setting is advisory only i.e. Hibernate's behaviour may vary after an upgrade.
I think you always control that explicitly in Jooq.

There's no such thing as autoloading in jOOQ. There's only explicit writing out all the joins. This is debatable, of course, as JOIN syntax is a bit verbose. But even if it was simplified, it would still be explicit. 

On the other hand, if, for instance, you want to "auto-load" addresses when loading customers, you can already do that through SQL views.

3) Hibernate has three entirely different APIs (Criteria, HQL, SQL), with different abilities and different semantics (SQL doesn't affect the session cache, for example; Criteria and HQL have a largely overlapping feature set and semantics but some things that you can do you cannot do in the other).
If you find that one API does not suit you, you need to rewrite, and the APIs are so drastically different that you can throw away the Criteria code you have if you need HQL.
Jooq doesn't have this kind of largely overlapping (hence redundant), semantically different (hence requiring full rewrites) nonsense. At least I hope so :-)

We could argue that the SQL DSL has some overlapping / orthogonality with the "ActiveRecords" / UpdatableRecords. I'll get back to that when answering to your bullet 5)

4) Hibernate is a bit thin on documenting details. Docs are mostly there, but sometimes distributed across official docs, tips&tricks sections, and books; plus, sometimes the docs are incomplete and you have to experiment (and hope that behaviour doesn't change during the next update).
Can't say much about Jooq's documentation since I haven't used it in anger yet, but what I did see didn't look like it left many stones unturned.

I'd say that both documentations are about equal. Except that I'm not aware of any books on jOOQ (yet).
 
5) Hibernate is full of subtle semantic distinctions. They're so subtle (but important) that, sometimes, the Hibernate team decides to correct a mistake. Since the issues are subtle, you don't know your usage of Hibernate was affected until the upgrade hits you right in the face.
Jooq has a simpler semantics. That's partly because it doesn't try anything fancy with a session cache. (There have been moves towards subtler semantics though. I'm a bit worried there.)

Again, jOOQ's "ActiveRecord" / UpdatableRecords feature is a bit subtle. They ship with dirty flags on every attribute in order to control what happens on store(), update(), insert() calls. And then, there's the optimistic locking feature, of course, which some love, others hate, and which I have to maintain ;-)

But in general, jOOQ doesn't (try to) solve as many different problems from different problem domains, which makes it easier to foresee semantics.

Cheers
Lukas

Lukas Eder

unread,
Jul 31, 2013, 8:56:21 AM7/31/13
to jooq...@googlegroups.com, khunye...@gmail.com

A bit long, sorry.

This is just my experience (partially) replacing OpenJPA with JOOQ in a few projects that have around 250 tables. Not sure it is useful for anybody, but the system is being used right now in the wild, so it might count for something for some people.

I'm aware of much larger systems using jOOQ...
 
I used Hibernate for a few years before switching over to OpenJPA (solely for its compile-time enhancement ability at that time). And for the last six months, my team has used JOOQ along side with OpenJPA. Basically, we use OpenJPA when we need to do any insert/update/delete. We use OpenJPA when it is easier to do, and  for small queries where the query time is not important. Of course, the old code with OpenJPA stays the same until we feel the need to:

1. To speed up the query time.
2. Whenever we need to modify any query that is in JPQL.

Our systems have around 250 tables, so replacing all OpenJPA code with JOOQ all at once is out of the question. And we do like the way JPA handles cascade deletes and insertions and refreshes. The easy way to get a transaction is good as well.

Mixing is a common use-case. After all, JPA's strenghts lie in persisting complex object graphs and jOOQ will not evolve into that direction. So jOOQ is not a replacement for those features.

I never benchmarked the speed-up, but our JOOQ queries seem to be at least one order of magnitude faster than the OpenJPA JPQL queries they replaced.  That is the reason I started using JOOQ when I first became aware of it (when we needed to get up to 1,000 rows from queries with many joined tables in under 2 seconds or so, and I was spending way too much time fiddling with FetchType.EAGER and FetchType.LAZY and not on the business logics). Our schemas are broadly in 3NF, so joining many tables together is very common for us, sometimes with many left outer joins as well.

But speed is not everything for us or we would have used JDBC directly. The type-safety of JOOQ is very useful for us (except that we sometimes have 30 to 40 columns in a select. We can't do a select() without specifying the columns all the time because we would run out of memory as sometimes we do need to join 20 tables at the same time, for instance. As usual, we found out that we can't always use limits and first result row to limit how many rows we might get back with a query. Any solution other than specifying exactly which columns we want? We do re-use many queries, so sometimes we get runtime errors when we miss specifying a column in the select.). 

Ever since we started using JOOQ, we refactor the database schema even more often as the system evolves, which is a huge advantage for us. JPQL was really that big an inhibitor for us to change anything in the schema at all. Adding tables, etc., is fine, but changing names and such is not fun with JPQL queries.

Can't emphasize enough how useful syntax errors are whenever we refactor a database schema and rebuild the JOOQ jar we have. We are a 3-person team with one tester, so we prefer front-loading everything as much as possible. A syntax error is always much easier to handle than a support call from the field because of a spelling mistake in a JPQL statement and a missed regression test case (as said before, it is still possible to get runtime issues in JOOQ if we miss specifying a column in a select, of course).  As it happened, I just had that dreadful feeling this morning, trying to find all occurrences of a column in all the JPQL queries we have, just because we changed the name of the column to something more appropriate.

Great feedback! Good to know how much this helps you!

The main thing that I wish I could do easier in JOOQ is probably something that is impossible to do in a framework like JOOQ: the ability to write expressions involving the fields in a shorter way, and still make it type-safe. But it is still only a one-time thing to make sure the expression is correct. Can be difficult to figure out what it is later on though:

CAL_IN_WEEKLY.DAY_OF_WEEK.equal(weekday).and(
    CAL_IN_WEEKLY.START_WEEK_NUMBER.lessOrEqual(startingWeekNumber).and(
(val(startingWeekNumber).sub(CAL_IN_WEEKLY.START_WEEK_NUMBER)).mod(
   CAL_IN_WEEKLY.FOR_EVERY).equal(val(0))))

Counting the parentheses does remind me of my Lisp coding days. Of course, if this is the wrong way to do, and there is an easier way, I am all ears. By the way, I still prefer this than JPQL.

Yeah, you have a couple of options here, that will decrease your lispiness by 20% or so :-)

1. You could use a naming strategy to let the code generator shorten your table name references. I think I should actually implement that (i.e. an example thereof) in a future release (https://github.com/jOOQ/jOOQ/issues/2661)
2. You can use Scala or any other JVM language that lets you omit parentheses. Of course, that can turn out to be a big mistake, too, as you cannot omit *all* parentheses, only "some"
3. You can omit val(...) in most cases, as the API is usually overloaded to accept both T and Field<T> for the same method. E.g. equal(0), instead of equal(val(0))
4. You can reorganise your AND operators as in:

      CAIW.DAY_OF_WEEK.eq(weekday)
      .and(CAIW.START_WEEK_NUMBER.le(startingWeekNumber)) // "close the and" here
      .and(val(startingWeekNumber).sub(CAIW.START_WEEK_NUMBER).mod(CAIW.FOR_EVERY).eq(0))

A bit better? Note, there is no precedence between operators in jOOQ when chaining method calls. So you don't need parentheses around this expression as in Java / SQL:

    (startingWeekNumber - CAIW.START_WEEK_NUMBER) % CIW.FOR_EVERY

You can just write

    A.sub(B).mod(C)

If you wanted the to apply the precedence on the modulo operation, then you'd write this, explicitly:

    A.sub(B.mod(C))

The same applies to predicate construction, with .and(), .or(), .not(). There's no precedence apart from the one you set explicitly by the way you chain methods.

This might be worth documenting somewhere...

Lukas Eder

unread,
Jul 31, 2013, 9:06:00 AM7/31/13
to jooq...@googlegroups.com



2013/7/29 Christopher Deckers <chr...@gmail.com>

Hi electrotype,


Chrriis, if I understand well, you would use entities like :
------------

public class Person {
    long id;
    String name;
    String someOtherSimpleProperty;
}

public class Address {
    long id;
    String streetName;
    String someOtherSimpleProperty;
}

If an address is bound to a person and retrieved passing Person objects, then we could have the Person as a field of Address. A leaf can know its parents if it is always retrieved through the parent.
This means that:
 
public class PersonAddress {
        Person person;
        Address address;
}

... may not be useful.

Every time I have tried to tackle these topics with jOOQ or before jOOQ, I ran into the limitations imposed by the object-relational impedance mismatch. While M:N relations are somewhat feasible through helper classes as above, or though sophisticated collections such as provided by Eclipse's EMF, there is simply no sensible way to express a M:N:O relation in Java, not to speak of an M:N:O:P relation. While such relations are not easy to model with SQL tables either, they are very easy to model with SQL views (i.e. with JOINs). Once you have such a complex table source, you just add predicates on 3 out of 4 relations (e.g. M, O, P) and you get something like a sophisticated view on N.

The OR impedance mismatch already displays how M:N can create a mess, which is why I'm not trying to solve any of these problems in jOOQ. You know your SQL / Java domain models better than jOOQ, and you can always inject RecordMappers and since jOOQ 3.1 even a central RecordMapperProvider:

In a way, jOOQ forces you to cleanly separate your relational model (possibly driving a better relational design) from your OO model (possibly driving a better OO design) at the cost of having to map things yourself again. I'm gathering a lot of information to unleash a series of blog posts around these topics, i.e. how these mappers (ORMs and LINQesque query-it-all APIs) will always create an impedance mismatch.

Durchholz, Joachim

unread,
Jul 31, 2013, 11:28:02 AM7/31/13
to jooq...@googlegroups.com
>> (still not using Jooq yet, unfortunately):
>
> When will you finally use it! :-)

Heh. When I'm at a new job I fear... no resources for that shift :-(
But my current employer does have a job opening announced, maybe something will happen.

> Hmm, I'm surprised that you have to reload all data
> when you get an exception.

Well, you'll need to do that anyway. You're starting a new transaction, and somebody may have modified your records behind your back.

The real issue is that you can't reinsert the existing Pojos into the new Session; the new Session insists on its own, new set of Pojos.
Not a problem if you do short transactions^Wconversations exlusively. I.e. never keep any Pojo around for longer than immediately necessary, keep only primary keys. It's the standard model of operation for web services, so that isn't a problem for the standard application domain of Hibernate.
For long-running applications, you want to keep the Pojos around. You keep them in caches. But with Hibernate, you can't, because the Session may die and the Pojo becomes unusable. You could keep track of what caches those Pojos are in and replace them with new ones as a new Session comes up, but (a) this is wasteful because you reload even if the Pojo happens to be never written back but you don't know this, (b) all hopes of modularity go out of the window.

The best you can do is to always work with detached objects, and at commit time, always reload and merge() them. The trouble is that you lose the single-UPDATE optimization, you know have two round trips to the database. For every friggin' object.

And the Hibernate book devotes an entire chapter to long conversations...

> It's a common pattern to try to insert something, check for
> a constraint violation, and then insert something else or
> update something. [...] So you're saying, with Hibernate you
> cannot make use of exceptions from constraint violations, or
> raised from stored procedures?

The official word is that the Session should not be relied on. If you look carefully, you can't even roll it back or close it, the Hibernate docs say it's invalid. There is no distinction between "harmless" and "destroying" exceptions.

In practice, people ignore that and simply continue with a Session that had an exception. The trouble is that this will probably work, most of the time, except when it doesn't.

Regards,
Jo

Durchholz, Joachim

unread,
Jul 31, 2013, 11:40:29 AM7/31/13
to jooq...@googlegroups.com
> Every time I have tried to tackle these topics with
> jOOQ or before jOOQ, I ran into the limitations
> imposed by the object-relational impedance mismatch.

I've been wondering whether the mismatch isn't simply a consequence of mapping things wrongly.

Caution: Long, potentially incoherent post about totally unusual ways to address the issue; probably useless for Jooq or any other existing O/R framework. I'm sharing this for the perspective, not because it offers any solutions.

We currently all map relationships as references. To-one associations as normal member variables, to-many associations as some Collection.
That's what we usually want, but not always. Sometimes you really need to do the join on the Java side.

Now I'm wondering: Shouldn't we simply follow database logic?
Map each primary key to an Id data type. Set up Map<Id,Entity> objects, one for each mapped table. Write the getters that they go through the maps instead of through Java references. No setters - you manipulate relationships by changing Id values, not by bending pointers (and I guess that's the core difference).
Essentially, these Maps are just caches of what's in the database.
Like with any cache, coherency checking (a.k.a. concurrent update detection a.k.a. optimistic locking) would be the hardest thing to get right.

Such an O/R system would be quite different from the usual ones.

Just food for thought. Philosophy that doesn't fill anybody's bowl.
Thanks for reading :-)

Lukas Eder

unread,
Jul 31, 2013, 11:51:56 AM7/31/13
to jooq...@googlegroups.com



2013/7/31 Durchholz, Joachim <Joachim....@hennig-fahrzeugteile.de>

>> (still not using Jooq yet, unfortunately):
>
> When will you finally use it! :-)

Heh. When I'm at a new job I fear... no resources for that shift :-(
But my current employer does have a job opening announced, maybe something will happen.

> Hmm, I'm surprised that you have to reload all data
> when you get an exception.

Well, you'll need to do that anyway. You're starting a new transaction, and somebody may have modified your records behind your back.

The real issue is that you can't reinsert the existing Pojos into the new Session; the new Session insists on its own, new set of Pojos.
Not a problem if you do short transactions^Wconversations exlusively. I.e. never keep any Pojo around for longer than immediately necessary, keep only primary keys. It's the standard model of operation for web services, so that isn't a problem for the standard application domain of Hibernate.
For long-running applications, you want to keep the Pojos around. You keep them in caches. But with Hibernate, you can't, because the Session may die and the Pojo becomes unusable. You could keep track of what caches those Pojos are in and replace them with new ones as a new Session comes up, but (a) this is wasteful because you reload even if the Pojo happens to be never written back but you don't know this, (b) all hopes of modularity go out of the window.

The best you can do is to always work with detached objects, and at commit time, always reload and merge() them. The trouble is that you lose the single-UPDATE optimization, you know have two round trips to the database. For every friggin' object.

And the Hibernate book devotes an entire chapter to long conversations...

Aw. I try to be open minded on that subject, so I'm not going to comment :-)
 
> It's a common pattern to try to insert something, check for
> a constraint violation, and then insert something else or
> update something. [...] So you're saying, with Hibernate you
> cannot make use of exceptions from constraint violations, or
> raised from stored procedures?

The official word is that the Session should not be relied on. If you look carefully, you can't even roll it back or close it, the Hibernate docs say it's invalid. There is no distinction between "harmless" and "destroying" exceptions.

In practice, people ignore that and simply continue with a Session that had an exception. The trouble is that this will probably work, most of the time, except when it doesn't.

You probably mean this:

Stéphane Cl

unread,
Aug 4, 2013, 6:08:30 AM8/4/13
to jooq...@googlegroups.com
Hello,

Not a problem if you do short transactions^Wconversations exlusively. I.e. never keep any Pojo around for longer than immediately necessary, keep only primary keys. It's the standard model of operation for web services, so that isn't a problem for the standard application domain of Hibernate.

Exactly what I do, hibernate entities are not to be used as DTOs.

I am using both hibernate and jooq, in the same project but different sections. I use jooq mostly for reporting and displaying tabular data, I use hibernate mostly for my business logic because operations on a single table are quite rare.
I know that many people hate hibernate because they think it's too much magic, it's partly true but I'd say it's also probably the most misused framework on the planet. There are countless of posts everywhere showing that people rely too much on lazy loading and long lived sessions, they would like to deal with their database just like if it was fully loaded into memory, run out of resources and experience slowness because they want to stay in an object oriented world and totally ignore SQL, stored procedures. 
Imho hibernate itself is rarely the unique cause of disaster, you can carefully check display and check the SQL output or every operation, and if it's close enough to what you would write yourself, it can't be bad. 





Now what really prevents me from using jooq everywhere?  I stopped using mybatis the exact same week I learnt about jooq, but I could not do the same with hibernate for the following reason:
Jooq is excellent for issuing complex SQL, it has everything you could dream of, strong typing, a very readable DSL but it doesn't help much when you need to efficiently load and manipulate graphs. 
Suppose I want to load a collection of objects with 2 or more one-to-many relationships, I have the following options 

1) Fetch a collection of records from the main table, and use fetchX methods to navigate which is ok for 1 or 2 records but totally inefficient when used in a loop
2) Issue one query + 2 correlated queries queries and do the mapping in memory, which is error prone, requires a lot of code and has a huge negative impact on my productivity.

Given enough free time, I could think of a way to implement caching for the fetchXXX methods, but past the 2 or 3 basic use cases, I'd be reinventing hibernate.

It doesn't mean there isn't any solution in between (I mean between JPA and Jooq), an excellent .Net commercial ORM called LLblgen which is also based on generated code offered a great, predictable graph loading AP without relying on obscure caching mechanisms.

Durchholz, Joachim

unread,
Aug 5, 2013, 5:12:54 AM8/5/13
to jooq...@googlegroups.com
> I know that many people hate hibernate because they think
> it's too much magic, it's partly true but I'd say it's
> also probably the most misused framework on the planet.

I guess both the hate and the misuse come from misrepresentation. The "Hibernate in Action" book even goes to great lengths to explain how to do "long-running conversations that span multiple transactions".

> There are countless of posts everywhere showing that people
> rely too much on lazy loading and long lived sessions, they
> would like to deal with their database just like if it was
> fully loaded into memory,

Well, there's the promise that "you can code with entities just like with Pojos".
In some ways you actually can, but in others you can't, and the docs don't spell out the border. Again, that's more a documentation problem.

> run out of resources and experience slowness because they
> want to stay in an object oriented world and totally ignore
> SQL, stored procedures.

That's yet another issue. You can't use stored procedures if you want to keep migration costs under control. Or, put another way: If you use stored procedures, you should keep the business logic in the database, and then any shitty ORM will do and you don't need Hibernate that much anymore.

> Imho hibernate itself is rarely the unique cause of
> disaster, you can carefully check display and check
> the SQL output or every operation, and if it's close
> enough to what you would write yourself, it can't be bad.

Well it isn't. It's generating different projections and joins that those that I'd do, and I can't easily make it generate the code I want it to.
Now if Hibernate were better at generating good SQL than I am, that would actually be a good thing, but Hibernate's SQL is sometimes very inferior.

Hibernate also doesn't even begin to handle bulk queries. In the end, you write your business logic twice, once in Java for Hibernate and once in SQL for the nightly batch runs... THAT part of the impedance mismatch remains in full force.
Now again that's where Hibernate collects hate points: The docs promise that it's the end to that mismatch.

In general, I think Hibernate fails to clearly spell out the limits of what it can do, so people come to it with expectations that Hibernate can't match.
Not the peoples' fault I think.

> Suppose I want to load a collection of objects with 2
> or more one-to-many relationships, I have the following
> options
>
> 1) Fetch a collection of records from the main table,
> and use fetchX methods to navigate which is ok for 1 or
> 2 records but totally inefficient when used in a loop

Hehe, no, you wouldn't want single-record fetches.

You could fetch in groups, using
WHERE (pk-fields-of-child-table) IN ((fk-values), (fk-values), ...)
where fk-values are those foreign-key values found in the main table records you already have.

It's mildly ineficient because you need to transmit all the foreign key values back, but you can run the queries in parallel.
For interactive use, the latency isn't very nice because each table requires its own round trip; you want a single query for that use case.

> 2) Issue one query + 2 correlated queries queries and
> do the mapping in memory, which is error prone,
> requires a lot of code and has a huge negative impact
> on my productivity.

Is Jooq not able to notice that the same record got returned twice?
It should, but there are complications with unique keys that I'm not sure how they finally got sorted out.

> Given enough free time, I could think of a way to
> implement caching for the fetchXXX methods, but past
> the 2 or 3 basic use cases, I'd be reinventing hibernate.

You essentially reinvent Hibernate's first-level cache.
Which would be a Good Thing IMHO. I've been wanting to have direct access to that cache all the time: iterate over its records, inspect it for existence of a record, flush exactly what I want flushed, inspect the staleness status (known-current, known-stale, unknown), that kind of stuff.
(That's one of the things I dislike about Hibernate: it consists of a lot of building blocks, but the individual blocks aren't accessible, it's all hidden behind a facade and you can't properly look into what it's doing, or use some of its mechanisms in isolation where it would be useful.)

> It doesn't mean there isn't any solution in between
> (I mean between JPA and Jooq), an excellent .Net
> commercial ORM called LLblgen which is also based on
> generated code offered a great, predictable graph
> loading AP without relying on obscure caching mechanisms.

Sounds intriguing.
Are there any whitepapers around?

Stéphane Cl

unread,
Aug 5, 2013, 10:12:18 AM8/5/13
to jooq...@googlegroups.com
Hello,

I guess both the hate and the misuse come from misrepresentation. The "Hibernate in Action" book even goes to great lengths to explain how to do "long-running conversations that span multiple transactions".


Does it? I can't even imagine myself manipulating detached, explosive objects.
 
That's yet another issue. You can't use stored procedures if you want to keep migration costs under control. Or, put another way: If you use stored procedures, you should keep the business logic in the database, and then any shitty ORM will do and you don't need Hibernate that much anymore. (...) Hibernate also doesn't even begin to handle bulk queries (...)

Well, I am not a big fan of writing and maintaining stored procedures but sometimes it's the best way to handle bulk operations.
It's a bit like comparing DOM and SAX, you use one when you can, the other one when you have to.
 
Sounds intriguing.
Are there any whitepapers around?

Not sure, it has been a while since I left my previous job and I haven't used it since then.
It generated record classes that were somewhat similar to Jooq, along with a representation of their relationships.

Some info about fetching graphs can be found here (it's an old version, I am sure it has become much easier):

At loading time, you could basically tell the framework which branches of the object graph should be pre-loaded , this could go to any depth.
For example, from an OrderDetail entity it was possible to branch Order, from there you could branch Customer, then Address, Country, anything, the framework would then take care of efficiently filling the graph using either SQL IN clauses or Subqueries.

 

Durchholz, Joachim

unread,
Aug 5, 2013, 12:38:11 PM8/5/13
to jooq...@googlegroups.com
>> I guess both the hate and the misuse come from
>> misrepresentation. The "Hibernate in Action"
>> book even goes to great lengths to explain how
>> to do "long-running conversations that span
>> multiple transactions".
>
> Does it?

The revised edition does.
Christian Bauer, Gavin King 2007: Java Persistence with Hibernate.

pp. 401 ff. "Working with detached objects"
No mention of the StaleObjectStateException there.
But I stand corrected: You can indeed reattach an existing Pojo to a Session. Via Session#lock. Talk about intuitive naming...

pp. 423 ff. "Working with detached entity instance"
Oh somebody tell me what the difference between an "object" and an "entity instance" is... oh right, it's the same chapter, but about JPA this time. Except no mention of lock() this time.

But that's just tangential. The conversation beef is chapter 11:
pp. 476 ff. "Implementing conversations"
Talks about all variants:
pp. 478 ff.: Conversation = Transaction = Session
pp. 486 ff.: One conversation holding multiple Transactions = Sessions, detached instances survive between Transactions/Sessions
pp. 489 ff.: Conversation = Session holding multiple Transations, objects stay attached (but that doesn't work as soon as the Session dies due to any exception)
pp. 497 ff.: Next chapter (JPA)

> I can't even imagine myself manipulating
> detached, explosive objects.

Well, I *thought* I could use the model described on pp. 489 ff. Except Sessions are too fragile - and no mention of that in the entire book. (Read that thing multiple times.)

>> That's yet another issue. You can't use stored
>> procedures if you want to keep migration costs
> under control. Or, put another way: If you use
>> stored procedures, you should keep the business
>> logic in the database, and then any shitty ORM
>> will do and you don't need Hibernate that much
>> anymore. (...) Hibernate also doesn't even begin
>> to handle bulk queries (...)
>
> Well, I am not a big fan of writing and maintaining
> stored procedures but sometimes it's the best way
> to handle bulk operations.

100% agreed.

>> Sounds intriguing.
>> Are there any whitepapers around?
>
> Not sure, it has been a while since I left my previous
> job and I haven't used it since then.
> It generated record classes that were somewhat similar
> to Jooq, along with a representation of their
> relationships.

I guess you need that.
Record classes to represent rows, and a relationship representation to know what foreign keys to follow.

Stéphane Cl

unread,
Aug 5, 2013, 2:10:52 PM8/5/13
to jooq...@googlegroups.com

The revised edition does.
Christian Bauer, Gavin King 2007: Java Persistence with Hibernate.

pp. 401 ff. "Working with detached objects"
No mention of the StaleObjectStateException there.
But I stand corrected: You can indeed reattach an existing Pojo to a Session. Via Session#lock. Talk about intuitive naming...

pp. 423 ff. "Working with detached entity instance"
Oh somebody tell me what the difference between an "object" and an "entity instance" is... oh right, it's the same chapter, but about JPA this time. Except no mention of lock() this time.

But that's just tangential. The conversation beef is chapter 11:
pp. 476 ff. "Implementing conversations"
Talks about all variants:
pp. 478 ff.: Conversation = Transaction = Session
pp. 486 ff.: One conversation holding multiple Transactions = Sessions, detached instances survive between Transactions/Sessions
pp. 489 ff.: Conversation = Session holding multiple Transations, objects stay attached (but that doesn't work as soon as the Session dies due to any exception)
pp. 497 ff.: Next chapter (JPA)

> I can't even imagine myself manipulating
> detached, explosive objects.
 
Well, I *thought* I could use the model described on pp. 489 ff. Except Sessions are too fragile - and no mention of that in the entire book. (Read that thing multiple times.)


I guess the reason behind this is that when a database exception is thrown, it is too difficult to figure out which parts of the session objects graphs are invalid or should be restored. This would require a very complex system, if at all feasible.
In the hibernate philosophy, you should just discard the session, log the exception somewhere and perhaps show some message or throw a 500 error. 
In many apps it's actually acceptable, many DB framework wraps jdbcExceptions into some random unchecked exception class because in most cases, other than apologizing to the user, there is not much that can be done.

I agree it's a problem when you want to treat some predictable exceptions as non fatal (issuing a delete, watching for FK violation, and print "no sorry, record is in use")...
As for detaching, reattaching objects, I'd just say no thanks! Detached objects are to be considered stuffed with TNT, I'd rather use IDs and DTOs which I know are safe and serializable. In a transaction oriented app it's usually fine if you keep it stateless. 
 
I guess you need that.
Record classes to represent rows, and a relationship representation to know what foreign keys to follow.

I like this approach that let you choose which relations you need and control when they are loaded.

The biggest problem is that  when using such records classes some parts of the graphs may not have been loaded, so you'll always need to know how the data were loaded. 
That being said, if you keep your data loading stuff close enough to the code using it and don't pass half filled data objects everywhere, you should be fine. I am not a big fan of reusable DAO classes anyway, I only create them for the most common and simple entities.
 

Lukas Eder

unread,
Aug 6, 2013, 4:06:10 AM8/6/13
to jooq...@googlegroups.com, Stéphane Cl
Hi guys,

Just a short note from me on this interesting discussion. While I won't delve into the Hibernate pros and cons ;-), I'll happily discuss LLBLGen on this user group with you. I have found it to be quite similar to jOOQ, in various aspects. It's good to see that C# also leaves room for a tool like these, even if C# has LINQ. C# developers tend to love LINQ-to-Objects for its conciseness, similar as I expect Java developers loving the new Java 8 Streams API combined with Lambdas.

But there is a lot of controversy towards LINQ-to-SQL, which shows another type of impedance mismatch in such approaches. That of an API trying to abstract a querying language in to a "higher-level" language (e.g. LINQ, JPQL), omitting many of the underlying technology's features (SQL). I've blogged about this mismatch here:

Feel free to extend this discussion to a SQL/jOOQ to HQL/JPQL comparison :-)

Anyway, I'm glad that you outline the pros of LLBLGen. I'd be very curious about other good aspects of that framework, since you've made experience with it!

Cheers
Lukas


2013/8/5 Stéphane Cl <stefa...@gmail.com>

--

Stéphane Cl

unread,
Aug 6, 2013, 5:37:23 AM8/6/13
to jooq...@googlegroups.com, Stéphane Cl
Hi lukas,
I used llblgen before linq came out, at this time, there were many commercial ORMs available and nhibernate was there as a free alternative. Unlike in the java world, non free frameworks are quite popular and developers don't mind paying for useful stuff.

What I liked most about llblgen was it's stongly typed querying API which was a direct benefit of code generation (just like Jooq).
Being more SQL focused, jooq would have been a slightly better tool for writing complex queries and fetching tabular data.
 
Llblgen chose an higher level of abstraction but without abusing the POJO principle, chose a mix between object and SQL, went further in handling relationships and graphs.

Here is an exemple taken from their current documentation about how you would load a collection of Customers, along with Their Address and a collection of orders

// QuerySpec alternative
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Customer
		.Where(CustomerFields.Country=="Germany")
		.WithPath(CustomerEntity.PrefetchPath.Orders
				.WithSubPath(OrderEntity.PrefetchPathOrderDetails), 
			 CustomerEntity.PrefetchPathVisitingAddress);
	adapter.FetchQuery(q, customers);
}

Notice the .WithPath parameter...
The framework would then load the Customers, and then fill the Order relationship using a second query, with either an SQL IN or an EXISTS subquery depending on how much customers were returned.

Here is an example that goes even further :

var employees= new EntityCollection<EmployeeEntity>();
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Employee
		.WithPath(EmployeeEntity.PrefetchPathOrders
				.WithOrdering(OrderFields.OrderDate.Descending())
				.WithLimit(1));
	adapter.FetchQuery(q, employees);
}
 
Of course you can do that using Jooq too, but you'll write the queries yourself and implement a nice, non reusable algorithm for populating your graph but let's face it, everyone hates writing that kind of code. I do that many times a day, and that's the reason why I am still using hibernate.

Below is an interesting feature for Jooq :

using(var adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Order
		   .From(QueryTarget.InnerJoin(OrderEntity.Relations.CustomerEntityUsingCustomerId))
		   .Where(CustomerFields.CustomerId==myCustomer.CustomerId);
	adapter.FetchQuery(q, myCustomer.Orders);
}

Notice the OrderEntity.Relations.CustomerEntityUsingCustomerId which is a generated object...
Wouldn't that be nice to have FK-based relations in the jooq Table classes and be able to pass them to the join(..) method in the query DSL rather than relying on join(X).onkey() ?

To conclude I'd say that llblgen is a good showcase of what a data access solution can do without reyling on osbcure concepts. 
I don't think we can ask Jooq to do the same things, they are both different tools, but some help when dealing with graphs would greatly complement it and as shown above, it may be possible to do without implementing a crazy caching mechanism.

Durchholz, Joachim

unread,
Aug 6, 2013, 7:38:06 AM8/6/13
to jooq...@googlegroups.com
>> pp. 489 ff.: Conversation = Session holding multiple
>> Transations, objects stay attached (but that doesn't
>> work as soon as the Session dies due to any exception)
>>
>> Well, I *thought* I could use the model described on
>> pp. 489 ff. Except Sessions are too fragile - and no
>> mention of that in the entire book.
>
> I guess the reason behind this is that when a database
> exception is thrown, it is too difficult to figure out
> which parts of the session objects graphs are invalid
> or should be restored.

That has been my thinking, too. But:
1) There are ways to find out whether a JDBC session and the transaction are still alive. It's hard because many JDBC drivers have bugs, but in the worst case, you can simply change something inconsequential that will be undone on rollback, and query that.
2) You still can't put the Hibernate session back into the state it was before the failure. You have no way to inspect the first-level cache to find out what objects were changed and would need to be set up again! You have to do all that at the application level.

> In the hibernate philosophy, you should just discard
> the session, log the exception somewhere and perhaps
> show some message or throw a 500 error.

That's my thinking, too - and that's why I conclude that Hibernate simply isn't built for long-running conversations, despite Bauer/King claiming the opposite.
(For me, the nasty thing was that I had my application designed around long-running conversations. I discovered this fragility after a year of happy coding. Hibernate rightfully earned my hate for that!)

> As for detaching, reattaching objects, I'd just say
> no thanks! Detached objects are to be considered stuffed
> with TNT, I'd rather use IDs and DTOs which I know are
> safe and serializable.

Hm... well, sort of.
You want all kinds of caches, e.g. for filling in drop-down lists. They might go stale, but you don't want to load these objects from the database every time unless you have to.

You might say that drop-down lists aren't that important, and I'd say you're right :-)
... but they are just a model for the use case I'm actually have: a mostly-constant rules table that currently has a five-digit number of rows, that we really don't want reloaded on every request.
Of course, keeping track of changes in that table and updating the cache is one of the areas where ad-hockery abounds.

> if you keep your data loading stuff close enough to the
> code using it and don't pass half filled data objects
> everywhere, you should be fine.

I think this would work for 99% of application coding happening.
I do not think that this scales to arbitrary code complexity though, in particular if you have to observe space constraints - then you MUST work with incompletely-loaded object graphs.
I don't see this become a use case in business applications :-)
... but for the AI logic in game server programming, I do see this happening, on a regular basis actually.
(Databases and game servers aren't a match made in heaven anyway... you need Eventual Consistency, Integrity, and Durability, but no Immediate Consistency and Atomicity is just a nice-to-have. Probably a case for NoSQL. But then most game servers still run on classic databases, if only to allow ad-hoc SQL queries, and they just take the performance hit.)

Lukas Eder

unread,
Aug 8, 2013, 5:00:09 AM8/8/13
to jooq...@googlegroups.com, Stéphane Cl
Hi Stéphane,

2013/8/6 Stéphane Cl <stefa...@gmail.com>

Hi lukas,
I used llblgen before linq came out, at this time, there were many commercial ORMs available and nhibernate was there as a free alternative. Unlike in the java world, non free frameworks are quite popular and developers don't mind paying for useful stuff.

I think the Java world wouldn't mind paying for some useful stuff either. Sometimes, I get the feeling that much of (Free!) Open Source will lead to a big long-term maintenance nightmare for consumers / developers. While getting quite a few reads, I didn't get many comments on this article, unfortunately:
 
What I liked most about llblgen was it's stongly typed querying API which was a direct benefit of code generation (just like Jooq).
Being more SQL focused, jooq would have been a slightly better tool for writing complex queries and fetching tabular data.
 
Llblgen chose an higher level of abstraction but without abusing the POJO principle, chose a mix between object and SQL, went further in handling relationships and graphs. 

Here is an exemple taken from their current documentation about how you would load a collection of Customers, along with Their Address and a collection of orders

// QuerySpec alternative
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Customer
		.Where(CustomerFields.Country=="Germany")
		.WithPath(CustomerEntity.PrefetchPath.Orders
				.WithSubPath(OrderEntity.PrefetchPathOrderDetails), 
			 CustomerEntity.PrefetchPathVisitingAddress);
	adapter.FetchQuery(q, customers);
}

Notice the .WithPath parameter...
The framework would then load the Customers, and then fill the Order relationship using a second query, with either an SQL IN or an EXISTS subquery depending on how much customers were returned.

I see. It's like a proprietary JOIN syntax. If I get this correctly, the "WITH PATH" syntax will join dependent entities producing nested tables for every result customer, instead of duplicating customer entries...
I think that something like this would be interesting for jOOQ in the long run. Let's put that on the roadmap:

Here is an example that goes even further :

var employees= new EntityCollection<EmployeeEntity>();
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Employee
		.WithPath(EmployeeEntity.PrefetchPathOrders
				.WithOrdering(OrderFields.OrderDate.Descending())
				.WithLimit(1));
	adapter.FetchQuery(q, employees);
}
 
Of course you can do that using Jooq too, but you'll write the queries yourself and implement a nice, non reusable algorithm for populating your graph but let's face it, everyone hates writing that kind of code. I do that many times a day, and that's the reason why I am still using hibernate.

True.
 
Below is an interesting feature for Jooq :

using(var adapter = new DataAccessAdapter())
{
	var qf = new QueryFactory();
	var q = qf.Order
		   .From(QueryTarget.InnerJoin(OrderEntity.Relations.CustomerEntityUsingCustomerId))
		   .Where(CustomerFields.CustomerId==myCustomer.CustomerId);
	adapter.FetchQuery(q, myCustomer.Orders);
}

Notice the OrderEntity.Relations.CustomerEntityUsingCustomerId which is a generated object...
Wouldn't that be nice to have FK-based relations in the jooq Table classes and be able to pass them to the join(..) method in the query DSL rather than relying on join(X).onkey() ?

The onKey() syntax supports referencing explicit foreign key references, e.g.

    TABLE_A.join(TABLE_B).onKey(GENERATED_FOREIGN_KEY_REFERENCE);

Maybe this feature is not visible enough, from the docs? But it would probably be nice to avoid redundancy by omitting TABLE_B entirely, e.g.

    TABLE_A.joinOnKey(GENERATED_FOREIGN_KEY_REFERENCE);

I'll just put that up on the roadmap as well:

To conclude I'd say that llblgen is a good showcase of what a data access solution can do without reyling on osbcure concepts. 
I don't think we can ask Jooq to do the same things, they are both different tools, but some help when dealing with graphs would greatly complement it and as shown above, it may be possible to do without implementing a crazy caching mechanism.

I agree. The added value is obvious, and these features don't contradict jOOQ's mission as they are still very "relational". Should jOOQ ever be ported to C#, it would be good to have the right features ready to compete with LLBLGen Pro

Thanks a lot for your insight!
Cheers
Lukas

Lukas Eder

unread,
Aug 9, 2013, 4:02:14 AM8/9/13
to jooq...@googlegroups.com, Stéphane Cl
Hi Stéphane,

2013/8/6 Stéphane Cl <stefa...@gmail.com>
Hi lukas,

One thing that I keep forgetting to mention is the fact, that jOOQ can be integrated very easily with third-party libraries such as model mapper:

This is particularly useful when "pre-fetching" subpaths from a given entity. While Hibernate is then also capable of persisting the whole graph in a single unit of work, a jOOQ-ModelMapper solution would at least help loading the graph.

Cheers
Lukas

Stéphane Cl

unread,
Aug 9, 2013, 5:00:34 AM8/9/13
to jooq...@googlegroups.com, Stéphane Cl
Hi Lukas,
I had a quick look but from what I understand, mapping a base entity having for example 2 one-many relationships would still be a nightmare.
Using a join would get you tons of duplicates (basically a cartesian product between both relations) and using subqueries is still going to have you write in-memory PK-FK based joins.

However, I find modelmapper interesting for achieving for the exact opposite : turning a graph into a DTO.
Best

Lukas Eder

unread,
Aug 9, 2013, 5:13:18 AM8/9/13
to jooq...@googlegroups.com, Stéphane Cl
Hi Stéphane,

2013/8/9 Stéphane Cl <stefa...@gmail.com>

Hi Lukas,
I had a quick look but from what I understand, mapping a base entity having for example 2 one-many relationships would still be a nightmare.
Using a join would get you tons of duplicates (basically a cartesian product between both relations) and using subqueries is still going to have you write in-memory PK-FK based joins.

Huh, I haven't given this proper thought. Yes, you're probably right. The show-case is probably good for linear "graphs" only, not for graphs with cartesian products in them. I guess that's where LLBLGen shines, as the proprietary query extensions holds all the relevant meta-data to know how to "untangle" the cartesian product between the >1 joined relationships.
 
However, I find modelmapper interesting for achieving for the exact opposite : turning a graph into a DTO.

Great! Should you play around with it any further (in the context of databases / DTOs / etc), feel free to document your experience also here on this user group.
 
Cheers
Lukas

Stéphane Cl

unread,
Aug 9, 2013, 6:43:44 AM8/9/13
to jooq...@googlegroups.com, Stéphane Cl

Huh, I haven't given this proper thought. Yes, you're probably right. The show-case is probably good for linear "graphs" only, not for graphs with cartesian products in them. I guess that's where LLBLGen shines, as the proprietary query extensions holds all the relevant meta-data to know how to "untangle" the cartesian product between the >1 joined relationships.


Exactly...  Efficiently loading graphs often requires multiple queries (eg: 1 for the base entity, and 1 for each of the branches, except one-to-one that can be obtained through joins). The base idea is that eliminating duplicates from large resultsets isn't going to be that much faster than issuing a second SQL query, not talking about network traffic and memory bloat. IN (....) statement or EXISTS subqueries are usually well optimized in modern rdbms. 

I am not sure that llblgen actually used joins for 1:1 and 0:1 at the time I was using it, it followed the one branch -> one query principle which was already good enough in most use cases, so much better out of the box than any N+1.

Lukas Eder

unread,
Aug 9, 2013, 8:40:33 AM8/9/13
to jooq...@googlegroups.com, Stéphane Cl



2013/8/9 Stéphane Cl <stefa...@gmail.com>


Huh, I haven't given this proper thought. Yes, you're probably right. The show-case is probably good for linear "graphs" only, not for graphs with cartesian products in them. I guess that's where LLBLGen shines, as the proprietary query extensions holds all the relevant meta-data to know how to "untangle" the cartesian product between the >1 joined relationships.


Exactly...  Efficiently loading graphs often requires multiple queries (eg: 1 for the base entity, and 1 for each of the branches, except one-to-one that can be obtained through joins). The base idea is that eliminating duplicates from large resultsets isn't going to be that much faster than issuing a second SQL query, not talking about network traffic and memory bloat. IN (....) statement or EXISTS subqueries are usually well optimized in modern rdbms. 

I am not sure that llblgen actually used joins for 1:1 and 0:1 at the time I was using it, it followed the one branch -> one query principle which was already good enough in most use cases, so much better out of the box than any N+1.

Yes, this solution turns N+1 into 1+1, which is acceptable. I'd have to think about nesting pre-fetch paths, though, i.e. when a pre-fetch path contains another pre-fetch path...
Probably, rather than designing some wicked SQL syntax extension for this, there should be simple API methods, which can fetch "parents" or "children" in a single query, based on a Result<R>. This would align well with the existing


Maybe Result.fetchParents(ForeignKey) and Result.fetchChildren(ForeignKey) ? In-memory join operations could then be done through Result.intoGroups() methods


I'll register this idea as #2686:

Durchholz, Joachim

unread,
Aug 9, 2013, 9:02:27 AM8/9/13
to jooq...@googlegroups.com
> The base idea is that eliminating duplicates
> from large resultsets isn't going to be that
> much faster than issuing a second SQL query,
> not talking about network traffic and memory
> bloat.

That's actually not always true. Depending on network latencies, the additional round trips can be devastating.
E.g. with a 10 millisecond roundtrip and 10 queries, you're already at 0.1 seconds, which can be above acceptability.

Of course it depends on whether latency is relevant in the first place, e.g. batch processing usually cares about throughput rather than latency can parallelize to make the latency irrelevant.

It also depends on how much wiggle room you have with latency.
If users are used to a second-long delay -> no problem.
If graph depth is only two or three -> no problem.
If latencies are around 1 ms -> no problem (unless graph depth is above 50 or so).

> IN (....) statement or EXISTS subqueries are usually well optimized in modern rdbms.

That works but only up to a limit; I have a vague memory that it's limited to something like 2000 entries.
General graph loading code would need a strategy for that.

OTOH if you're loading thousands of rows, you probably want to have everything running on the database anyway and don't graph load.
Things get interesting if you need to do it in Java anyway, then you'll still end with graph loading even if it's slow.


tl;dr: There are some kinds of requirements where doing subselects or IN queries has issues.

Lukas Eder

unread,
Aug 9, 2013, 9:06:34 AM8/9/13
to jooq...@googlegroups.com



2013/8/9 Durchholz, Joachim <Joachim....@hennig-fahrzeugteile.de>

> The base idea is that eliminating duplicates
> from large resultsets isn't going to be that
> much faster than issuing a second SQL query,
> not talking about network traffic and memory
> bloat.

That's actually not always true. Depending on network latencies, the additional round trips can be devastating.
E.g. with a 10 millisecond roundtrip and 10 queries, you're already at 0.1 seconds, which can be above acceptability.

Stéphane was referring to fetching 2 or more subpaths, which usually denormalises results to produce cartesian products. If we're talking about 10 subpaths, issuing 10 queries is certainly better than producing a cartesian product from 10 entities just to remove duplicates again in Java memory...

Durchholz, Joachim

unread,
Aug 9, 2013, 9:19:35 AM8/9/13
to jooq...@googlegroups.com
> Stéphane was referring to fetching 2 or more
> subpaths, which usually denormalises results
> to produce cartesian products.

Ah sorry, I had thought this was being proposed for each element of a join path.
Hadn't read closely enough, sorry for the spam.
I agree that cartesian product blowup would be worse than latency.

I believe that the point about limits to the list of values still stands.

Lukas Eder

unread,
Aug 9, 2013, 9:23:25 AM8/9/13
to jooq...@googlegroups.com



2013/8/9 Durchholz, Joachim <Joachim....@hennig-fahrzeugteile.de>
You mean, limits to the number of bind values? Yes, there are some limits:

- Oracle IN predicate: 1000 elements 
- Ingres: 1024 total bind values
- SQLite: 999 total bind values
- Sybase ASE: 2000 total bind values
- SQL Server 2008 R2: 2100 total bind values

jOOQ can transform SQL if any of the above events occur, though, either by splitting the IN predicate, or by inlining bind values.

Stéphane Cl

unread,
Aug 9, 2013, 12:08:09 PM8/9/13
to jooq...@googlegroups.com
I am well aware that IN queries have their limitations too.
Here is how llblgen used to solve this problem :

1) It would fill the main collection first, eg :

SELECT * 
FROM orders 
WHERE orders.customerID = @customerID
ORDER BY orders.checkout_date

2) then it would decide which query to use for fetching the relation

a) In case there less than 50 orders (you could configure how many values exactly) It would us an IN predicate like this one:

SELECT *
FROM orderdetails
WJERE orderdetails.orderID IN ( 1, 5, 8, 20 ...)

b) In cas there is more orders than that, it would re-use the first query 

SELECT * 
FROM orderdetails 
WHERE orderdetails.orderID IN (

     SELECT orderId 
     FROM orders 
     WHERE orders.customerID = @customerId
)

Maybe it could be better to use EXISTS, but it's already a nice alternative to an infamous N+1. 
Best

Stéphane Cl

unread,
Aug 9, 2013, 1:36:46 PM8/9/13
to jooq...@googlegroups.com
Sorry guys for all the typos in my last answer.

@Joaquim

Ah sorry, I had thought this was being proposed for each element of a join path. 
Hadn't read closely enough, sorry for the spam. 
I agree that cartesian product blowup would be worse than latency. 

To be honest, even in the case there no cartesian product ( eg a collection of customers with their orders) it would already be nice to have.
Using the record API, you would be trading a N+1 for a 1+1. As you said, 1+1 may not beat a JOIN with some processing. (let's call that 1+0 ).

As a starting point, taking an easy route such as implementing prefetch paths using one query per relation, ignoring whether it's 1:1 or 1:n would be excellent. 
100% agreed that in some cases, it's only second to best behind a join. However it's better than any N+1 and it's probably easier to implement a generic system that would reuse your initial query results (or reuse the initial query itself in a subquery) than a sophisticated that would modify your initial SQL query to include the necessary joins and be smart enough to know how to decompose the resultset into a graph.

Lukas Eder

unread,
Aug 10, 2013, 3:17:34 AM8/10/13
to jooq...@googlegroups.com

2013/8/9 Stéphane Cl <stefa...@gmail.com>

I am well aware that IN queries have their limitations too.
Here is how llblgen used to solve this problem :

1) It would fill the main collection first, eg :

SELECT * 
FROM orders 
WHERE orders.customerID = @customerID
ORDER BY orders.checkout_date

2) then it would decide which query to use for fetching the relation

a) In case there less than 50 orders (you could configure how many values exactly) It would us an IN predicate like this one:

SELECT *
FROM orderdetails
WJERE orderdetails.orderID IN ( 1, 5, 8, 20 ...)

b) In cas there is more orders than that, it would re-use the first query 

SELECT * 
FROM orderdetails 
WHERE orderdetails.orderID IN (

     SELECT orderId 
     FROM orders 
     WHERE orders.customerID = @customerId
)

Aha, I see... I must've missed the idea behind using IN / EXISTS from your previous discussions. I wonder if there are any significant caveats when transforming the original SQL (1) into the subselect (2b) for arbitrary SQL statements (allowing for GROUP BY, LIMIT, etc.). Again, I feel that implementing something on a CRUD level might be much simpler and more effective, than adding a new synthetic SQL clause.

I'm sure the LLBLGen guys had thought about this thoroughly, though, given the maturity of their products.

Maybe it could be better to use EXISTS, but it's already a nice alternative to an infamous N+1.

Sophisticated databases should be able to transform an IN predicate into an equivalent EXISTS predicate, although the difference might not be relevant in the execution plans. In other databases, NOT EXISTS can perform worse than LEFT JOINs or IN:

I don't think there's a general preference rule to prefer one or the other SQL rendering, which holds true across all DB vendors...

Cheers
Lukas

Best


Le vendredi 9 août 2013 15:23:25 UTC+2, Lukas Eder a écrit :



2013/8/9 Durchholz, Joachim <Joachim....@hennig-fahrzeugteile.de>

> Stéphane was referring to fetching 2 or more
> subpaths, which usually denormalises results
> to produce cartesian products.

Ah sorry, I had thought this was being proposed for each element of a join path.
Hadn't read closely enough, sorry for the spam.
I agree that cartesian product blowup would be worse than latency.

I believe that the point about limits to the list of values still stands.

You mean, limits to the number of bind values? Yes, there are some limits:

- Oracle IN predicate: 1000 elements 
- Ingres: 1024 total bind values
- SQLite: 999 total bind values
- Sybase ASE: 2000 total bind values
- SQL Server 2008 R2: 2100 total bind values

jOOQ can transform SQL if any of the above events occur, though, either by splitting the IN predicate, or by inlining bind values.

--

Lukas Eder

unread,
Jan 6, 2015, 4:45:20 AM1/6/15
to jooq...@googlegroups.com
Some feedback for this discussion. The state of the design vision around nested collections in jOOQ will be aligned with what the SQL standard has to offer. The SQL standard allows for MULTISET usage in SELECT clauses, which is the SQL way to implement a nested collection. The relevant issue is this one here:

JPA's joined nested collections are a workaround to materialise object graphs. LLBLGen's prefetch paths are an alternative syntax for what should really be multisets.

We're currently evaluating how MULTISETs can be integrated in the jOOQ API, and more importantly, how they can be emulated (at least partially), where they're not natively supported in the underlying ORDBMS. Databases with native support are: CUBRID, Informix, Oracle.

Lukas Eder

unread,
Jul 22, 2021, 4:00:05 AM7/22/21
to jOOQ User Group
And a few years later, MULTISET and MULTISET_AGG support have finally been added to jOOQ 3.15, see:


I'm really excited about this new feature. I think this is how SQL oriented folks really want to work with nested collections, as it fits the relational paradigm very well, at least if those ORDBMS extensions are accepted.

Cheers,
Lukas
Reply all
Reply to author
Forward
0 new messages