On Record, ResultSet, and Converters

45 views
Skip to first unread message

Robert DiFalco

unread,
Jan 8, 2015, 12:41:16 PM1/8/15
to jooq...@googlegroups.com
I'm reposting part of this from an email thread. I have a complex dynamic query where I ultimately map from a ResultSet instead of using Record. Lukas was curious why I was using ResultSet instead of Record to create my beans. Here's my reply:

Thanks Lukas, on Record vs. RecordSet this is probably because I am not familiar with all of JOOQ, especially on the Converter classes. I did TRY Record but here were my issues:

1. Using your Record class creates several temporary objects per-field on every row. In particular two ConvertAll converters. These seemed unnecessary for me since I could just optimally call #getLong or #getString, etc. on the ResultSet without extra overhead. 

2. The ConvertAll implementation is not super efficient for Numbers. For example, I use INT in my database, but my Object model uses Long so that if I change the implementation I will not impact clients. To convert from Integer to Long you first get the row value as an Integer from the ResultSet. Then you perform a #toString on it, then you convert it to a BigInteger, then finally get the Long value. I'd suggest adding a case to ConvertAll something like this:

     if (Number.class.isAssignableFrom(fromClass) {
         return (U)((Number)from).longValue();
     }

You might want to think about this for all the number to number conversions. 

I thought about creating a synthetic Record for this query that I could create dynamically (to match the dynamic nature of the query), then I could control the field names, types, and bindings. There is probably a better way to do it but I'm still exploring.

3. One other reason, but not too important if they were equally efficient, is that it was easier to write my lambda to rs.getLong(pos) instead of record.getValue(pos, Long.class).

Is there a simple inline way to specify converters as fields are added to a result set in such a way that they are used when results are read? Or do I have to do it globally in Configuration? Basically, I would like NOT to have to specify a Type to record#getValue, since if the type is right in the field, it should be in Record correctly. The double ConvertAll instantiation from record#getValue(int,Class<?>) to #convert0 seems a bit much.

Thanks for being so responsive and open to my opinions!


Lukas Eder

unread,
Jan 8, 2015, 1:47:17 PM1/8/15
to jooq...@googlegroups.com
2015-01-08 18:41 GMT+01:00 Robert DiFalco <robert....@gmail.com>:
I'm reposting part of this from an email thread. I have a complex dynamic query where I ultimately map from a ResultSet instead of using Record. Lukas was curious why I was using ResultSet instead of Record to create my beans. Here's my reply:

Thanks Lukas, on Record vs. RecordSet this is probably because I am not familiar with all of JOOQ, especially on the Converter classes. I did TRY Record but here were my issues:

1. Using your Record class creates several temporary objects per-field on every row. In particular two ConvertAll converters.

That's true, but they go out of scope almost immediately, so they are GC'ed very quickly. Have you profiled this area and found a significant issue or is this more of a general concern?
 
These seemed unnecessary for me since I could just optimally call #getLong or #getString, etc. on the ResultSet without extra overhead.

*Some* data type conversion still needs to be done when you call ResultSet.getLong() or ResultSet.getString(). Especially in the ResultSet.getLong() case, as you're unboxing a Long, and then perhaps boxing it again.
 
2. The ConvertAll implementation is not super efficient for Numbers. For example, I use INT in my database, but my Object model uses Long so that if I change the implementation I will not impact clients. To convert from Integer to Long you first get the row value as an Integer from the ResultSet. Then you perform a #toString on it, then you convert it to a BigInteger, then finally get the Long value. I'd suggest adding a case to ConvertAll something like this:

     if (Number.class.isAssignableFrom(fromClass) {
         return (U)((Number)from).longValue();
     }

You might want to think about this for all the number to number conversions. 

You're absolutely right. The current implementation is just the fallback for random types that are not a Number. I have registered an issue for this:
 
I thought about creating a synthetic Record for this query that I could create dynamically (to match the dynamic nature of the query), then I could control the field names, types, and bindings. There is probably a better way to do it but I'm still exploring.

You could probably do that, but beware of our understanding of "backwards-compatibility":

We might be adding new methods to the org.jooq.Record type between minor releases

3. One other reason, but not too important if they were equally efficient, is that it was easier to write my lambda to rs.getLong(pos) instead of record.getValue(pos, Long.class).

Is there a simple inline way to specify converters as fields are added to a result set in such a way that they are used when results are read?

Yes, if you're using code generation, then you can register Converters (or Bindings, or both) directly with your Fields:
 
Or do I have to do it globally in Configuration? Basically, I would like NOT to have to specify a Type to record#getValue, since if the type is right in the field, it should be in Record correctly.

That's already the case. The <T> type information that you provide with your SELECT statement is going to be the type that is materialised in the Result / Record for that type.
 
The double ConvertAll instantiation from record#getValue(int,Class<?>) to #convert0 seems a bit much.

I'll further explaine in the other discussion that you've started. In short: After 6 years of existence and almost 4000 issues on GitHub, there are reasons for most of these things :) (although there is also legacy, and a couple of bugs, such as the above inefficient Number conversion)

Thanks for being so responsive and open to my opinions!

Well, thank YOU for taking the time to writing them down! I will soon blog about the value of the "fresh user" to a community that has already gotten used to many of the quirks of a platform. Every friction that a fresh user reports is a friction to be taken very seriously.

Cheers
Lukas

Robert DiFalco

unread,
Jan 8, 2015, 2:12:49 PM1/8/15
to jooq...@googlegroups.com
Thanks for the information! I have some type questions but I'll tack them onto the other thread.

I haven't profiled it, I probably should. It was just a general concern that built after seeing the number conversions and just in generally being more familiar with the ins and outs of result sets -- I knew what I was dealing with implementation-wise there. I know what the PostgreSQL driver does with result sets and I was having some trouble stepping through the JOOQ result processing code with a debugger (there's a lot of code). But I'm more familiar with it now.  

And FWIW, my first thought is always that (a) there is a reason for each thing JOOQ does that evolved over time and (b) with a feature rich library I am taking the wrong approach based on not understand the feature set yet in a holistic manner. Then I work from there. :)

Robert DiFalco

unread,
Jan 8, 2015, 7:03:27 PM1/8/15
to jooq...@googlegroups.com
I just did a little performance testing.

FWIW, using Record is about 100ms per query slower than ResultSet. 

But worse yet using DSLContext is about 250ms slower than raw Connection for a simple joined query. For Hibernate/JPA, using a NamedNativeQuery is about the same time as using a raw JDBC connection and about 300ms faster than JOOQ.

About 120ms is extra overhead for Result vs. ResultSet and about 130ms is parsing the actually query from the fluent interface.  The time includes getting a pooled connection, committing, and closing it.

I'll try to put it into an isolated benchmark when I get a chance, right now the benchmark is intermixed into my server with a lot of proprietary stuff in it.  

Robert DiFalco

unread,
Jan 8, 2015, 7:25:14 PM1/8/15
to jooq...@googlegroups.com
FWIW, for this test I did not do ANY POJO mapping. For the JOOQ with Record version I just did #fetch(r->{return Test(r.value1(), r.value2(), r.value3()});

Lukas Eder

unread,
Jan 9, 2015, 2:15:13 AM1/9/15
to jooq...@googlegroups.com
2015-01-09 1:03 GMT+01:00 Robert DiFalco <robert....@gmail.com>:
I just did a little performance testing.

FWIW, using Record is about 100ms per query slower than ResultSet. 

But worse yet using DSLContext is about 250ms slower than raw Connection for a simple joined query. For Hibernate/JPA, using a NamedNativeQuery is about the same time as using a raw JDBC connection and about 300ms faster than JOOQ.

For a single query? That would be extremely surprising, or perhaps, specific to a particular API usage that might be out of the ordinary. It would be interesting to see this in action with a sample project...

Robert DiFalco

unread,
Jan 9, 2015, 2:21:25 AM1/9/15
to jooq...@googlegroups.com
I will try to put one together tomorrow morning PST. I could very well be doing something wrong. 

Sent from my iPhone
--
You received this message because you are subscribed to a topic in the Google Groups "jOOQ User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jooq-user/nPsO0HFQ-rg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lukas Eder

unread,
Jan 9, 2015, 3:06:33 AM1/9/15
to jooq...@googlegroups.com
Thank you very much.
This issue is fixed on GitHub master, btw: https://github.com/jOOQ/jOOQ/issues/3909
It will be merged to 3.5.2, 3.4.5, and 3.3.5

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.

Robert DiFalco

unread,
Jan 9, 2015, 12:01:18 PM1/9/15
to jooq...@googlegroups.com
As you suspected, there was something wrong with my timing code. Here's a more formal comparison for a query returning a single record. It is much better. I'm not surprised that JPA is faster given that it is using a pre-parsed query with @NamedQuery. I'm getting some weird overhead with @Transactional and JOOQ that I need to figure out. For what it's worth, when the number of records returned is around 250 the difference between the two disappears. 

JOOQ-NP means no-parse, just sending a query string and processing Results. JOOQ-RS does a parse like JOOQ but processes a ResultSet directly instead of a JOOQ Result object.

JPA     (avg=755.4us, total=3.776s, samples=4998, ops=1323.97)
JOOQ    (avg=824.4us, total=4.121s, samples=4998, ops=1213.11)
JOOQ-NP (avg=790.1us, total=3.949s, samples=4998, ops=1265.64)
JOOQ-RS (avg=821.9us, total=4.108s, samples=4998, ops=1216.65)
-- JPA     is 9.14% faster than JOOQ   
-- JOOQ-NP is 4.33% faster than JOOQ   
-- JOOQ-RS is 0.29% faster than JOOQ   

Lukas Eder

unread,
Jan 10, 2015, 6:33:22 AM1/10/15
to jooq...@googlegroups.com
Hi Robert,

Mere numbers don't really help assessing where time is lost. While I trust you have set up your benchmarks as well as possible, there might be a lot of hidden caveats that lead to significant overheads that would not appear in production, normally. Things like:

- Debug logging being turned on
- Tests not being bootstrapped (jOOQ needs to load all schema metadata, and caches a lot of stuff, just like JPA, of course)

Ideally, for us to help you and verify your results, it would be great if you could publish your complete benchmarks in a completely reproducible way, e.g. on GitHub.

I'm aware you private-messaged me parts of your benchmark code, but for the sake of traceability of this discussion, also in the future by other readers, it would be best if all relevant information would be contained and available from this Google Groups discussion.

Cheers,
Lukas

--

Lukas Eder

unread,
Jan 14, 2015, 11:25:00 AM1/14/15
to jooq...@googlegroups.com
Hello,

I did a bit of profiling myself, and I've found a couple of hotspots, which I'm summarising here:

There are a couple of places where excessive memory is consumed by not reusing references to certain objects. For example:

Avoid wrapping Field[] in a new RowImpl every time a Record is created

Avoid creating unnecessary Iterators and ArrayLists in AbstractStore.getAttachables()

Replace ArrayList<ExecuteListener> by ExecuteListener[] in ExecuteListeners to avoid excessive Iterator creation

AbstractScope creates an excessive amount of HashMaps, internally


Other hotspots include unnecessary processing, such as:

Add RenderKeywordStyle.AS_IS

There are more hotspots, which I'll resolve as we go.

I believe that outside of a benchmark, the overhead is probably negligible, but I can certainly see that there is an overhead to be noticed in a benchmark. The largest overhead in memory consumption is caused by the fact that all values are wrapped in boxed types, e.g. java.lang.Integer. This will inevitably put some pressure on the GC.

The largest overhead in CPU usage is caused by various occasions of accessing Context.data(), a HashMap that is used while rendering SQL, for the different QueryParts to communicate with each other. These features are needed for emulation of complex SQL clauses. There is certainly room for optimisation there as well, but the improvement will not be easy. I'll keep looking for potential improvements.

Also, there is some risk of contention in SchemaMapping, which lazy-initialises its own data. As few users default to actually using the SchemaMapping, I'm sure there is potential for bypassing that feature when it is not active.


I'll keep you posted on this thread as soon as I discover anything else.

Cheers
Lukas

Robert DiFalco

unread,
Jan 14, 2015, 1:38:30 PM1/14/15
to jooq...@googlegroups.com
Very cool, I will check out these issues.

I had a question about SchemaMapping. I noticed some "synchronized" blocks in there. But I could not figure out how these are used in a contended manner. Will these be shared by any DSLContext that shares the same Configuration? 

It seems like you could make this non-blocking by using ConcurrentMap#computeIfAbsent. If you can't use Java 8 you can just use #get and #putIfAbsent. 

Since it's rarely contested for a single mapping entry this would really move things along when many queries need to access this structure at once on different tables. Right now, even though they map different stuff, they all back up on the entire map. There is an edge condition where you will needlessly create a duplicate map, but this is an edge case that is far outweighed by not requiring every thread wait on the entire map. 

Lukas Eder

unread,
Jan 15, 2015, 2:15:42 AM1/15/15
to jooq...@googlegroups.com
Yes, the org.jooq.SchemaMapping reference is shared by any DSLContext that uses the same Configuration.

I'd first like to see the actual benchmark that leads to contention before we fix this. Perhaps I'm still missing something. My own (single-threaded, so far) benchmarks show that *some* time is spent in the actual map.contains() and map.get() methods, and that time can certainly be optimised when users do not specify any mapping at all. But the synchronized blocks are guarded by double-checked locking, so apart from the application bootstrapping phase, I suspect that you shouldn't enter those blocks too often in production.

Obviuously, benchmarks tend to be strongly biased towards being in a bootstrapping phase. It's hard to avoid that effect.

I prefer not to use ConcurrentMaps whenever I can. The long-term overhead such maps impose on read operations is far more significant than the bit of time saved while bootstrapping essentially read-only data (caches).
Reply all
Reply to author
Forward
0 new messages