Time To First Row on large ResultSet

105 views
Skip to first unread message

Val E

unread,
Nov 9, 2020, 3:56:17 PM11/9/20
to H2 Database
I'm playing around with a large H2 data set.The db in question is 40gb+, the table has 300 million records.+ Via JDBC.

When I do a simple  "select * from mylargetable" it takes 10-15 minute before I see the first row, and the data starts flowing. 

This was unexpected as H2 has been extremely performant in all other situations.

Is there something I can do to reduce this Time To First Row? If I add a limit on the query, the results start comming back faster, ex limit 10000 comes back sub second, 1000000 comes back in a few seconds, etc... The TTFR keeps growing as the limit increases.

Is H2 trying to fully fill some buffer before starting to return data?



val

unread,
Nov 10, 2020, 12:03:30 AM11/10/20
to H2 Database
On further inspection I can see that H2 is writing a large temp file, before my results start flowing. 20+ Gigs of data in about 23 minutues. Likely the whole contents of my large table rewritten to disk.

I'm guessing this is  how the h2 client/driver (not the db engine it self) behaves to handle a ResultSet.  It will fully serialize the contents of the ResultSet to disk first and then serve that. Perhaps this is necessary to support scrollable ResultSets? If this was not taking place on SSD it would have taken hours too see a result. And I dont need the ResultSet to support cursors/scrolling.

I guess I expected something more like ResultSet streaming, perhaps H2 doesnt support that.

And before anyone says that it is bad practice to have such a large result set; this is being used for Machine Learning. I'm dumping the data into a Neural Network. 

330180000 records, 
Time to First Row: 23 minutes, 27.473 seconds 

val

unread,
Nov 10, 2020, 12:31:40 AM11/10/20
to H2 Database
Well here I go into a monologue,  answering my own question.

After searching for many fruitless hours, I finally stumbled on a useful post,  here, https://groups.google.com/g/h2-database/c/rgpD_y3Xp7w

Adding LAZY_QUERY_EXECUTION=1  to the connection url is closer to what I was looking for.

TTFR is now 0.017seconds

Interestingly enough the over all elapsed time to iterate over all 300million rows was ~20 minutes, that's comparable to the previous TTFR of 23 minutes that it took to first serialize the  whole ResultSet to disk. In that case the whole elapsed time was 29 minutes, 23 minutes to build result set and 6 minute to iterate the 300 million records.

I wonder where the bottleneck is. What would it take to bring this 20 minutes total iteration time down to the 6 minutes it takes if you prebuild the ResultSet first. 

Noel Grandin

unread,
Nov 10, 2020, 1:52:39 AM11/10/20
to h2-da...@googlegroups.com, val
> On Tuesday, November 10, 2020 at 12:03:30 AM UTC-5 val wrote:
> I'm guessing this is how the h2 client/driver (not the db engine it self) behaves to handle a ResultSet. It will
> fully serialize the contents of the ResultSet to disk first and then serve that. Perhaps this is necessary to

No, it's just the easiest way to handle multiple resultset performantly, because then the engine doesn't have to
remember any state about the resultset.

On 2020/11/10 7:31 am, val wrote:
> Adding *LAZY_QUERY_EXECUTION=1 * to the connection url is closer to what I was looking for.

Yeah, that is currently our best answer for extremely large result sets, note that it only works for relatively simple
queries, and it will likely be quite bad performance-wise if you have more than connection attempting to hit the same table.

The alternative is to grab chunks of the resultset at a time, using OFFSET...LIMIT....

>
> I wonder where the bottleneck is. What would it take to bring this 20 minutes total iteration time down to the 6 minutes
> it takes if you prebuild the ResultSet first.
>

You'd have to run a profiler, either our simple one (see our performance page), or one of the other ones, to see where
the bottleneck is.

TBH, H2 is mostly optimised for small quick queries, not really large ones.

Andreas Reichel

unread,
Nov 10, 2020, 3:03:23 AM11/10/20
to h2-da...@googlegroups.com, val
Noel and Val, good morning.

Would you like to give us a hint, where exactly this "temporally serialization of a resultset" happens in the code?
We could try to modify it so it writes Flat ASCII files or CSV files instead of binary blobs (on demand, per hint), which would enable us to read line by line and access the first rows while the full content is still written.

Best regards
Andreas

val

unread,
Nov 10, 2020, 12:21:47 PM11/10/20
to H2 Database


I haven't traced it through all the way.    org.h2.result.LocalResultImpl#addRowsToDisk is a good starting point.  

By the way I was using the word serialize 'loosely', its not using Java serialization to serialize the result set. From a quick code review, its placing the result sets into 
org.h2.mvstore.db.MVPlainTempResult
org.h2.mvstore.db.MVSortedTempResult

So its using these MVStore classes to temporarily persist the ResultSet to disk.

Based on information here http://www.h2database.com/html/mvstore.html it maybe possible to start returning rows from the MVStore while the full content is still being written. That's probably easier then replacing this whole implementation with a new mechanism like CSV or ASCII; as it would already contains the  ResultSet meta data like types, sizes, etc.. which would need a brand new custom implementation to be encoded in csv/ascii

Evgenij Ryazanov

unread,
Nov 10, 2020, 8:13:24 PM11/10/20
to H2 Database
Hello.

There is nothing to do with temporary results in H2. If you have large queries and they may return parts of results early, use `LAZY_QUERY_EXECUTION=TRUE`. If some query doesn't support lazy execution, there is nothing to do with it in H2 itself. With `LAZY_QUERY_EXECUTION=TRUE` H2 uses LocalResult if and only if rows need a post-processing, such as sorting, distinct filtration, or OFFSET / FETCH filtration and these operations can't be performed early, for example, because there are no compatible indexes or some previous step needs a complete set of rows. In all these cases rows passed to LocalResult are not the same rows as will be returned by ResultSet.

val

unread,
Nov 11, 2020, 12:14:38 AM11/11/20
to H2 Database
If I understand what Evgenij is saying, that temp file is how the h2-db-engine behaves, afterall, it is NOT the result set, but is a internal temporary H2 structure.

I still dont understand why ResultSet iteration so much faster from this temporary structure, compared to directly from the database using LAZY_QUERY_EXECUTION. We are talking 6 minutes vs 20 minutes. I can understand if there were other requirements like sorting, filtering or joining; then I could see how that would require a temporary structure. But in this case the query is a  basic SELECT * FROM TABLE, why would the LAZY_QUERY_EXECUTION approach not be able to iterate all the rows in the same 6 minutes as with the temporary structure.

Perhaps the rows in the database are stored differently in the db than in the temporary structure that makes them easier/cheaper to iterate?

Separately; should we use LAZY_QUERY_EXECUTION=TRUE by default? If it can already detect if it needs to use a temporary structure or not; is there any drawback? Its a bit of an odd programming pattern to have to setup different connection urls depending on what you are trying to select. 

Noel Grandin

unread,
Nov 11, 2020, 12:48:50 AM11/11/20
to H2 Database

You can turn LAZY_QUERY_EXECUTION on and off using
   SET LAZY_QUERY_EXECUTION ON/OFF

But, as I said earlier, it has restrictions - it locks internal data structures for the duration of retrieveing the resultset, so it is not great for setups that need multiple concurrent queries.


Andrei Tokar

unread,
Nov 13, 2020, 12:05:27 PM11/13/20
to H2 Database
Hi Val,
The reason for the execution time difference that you see between full scan of the original table (LAZY_QUERY_EXECUTION, 20min) vs. scan of the intermediate temp table (6 min), could be related to the fact that database may be very sparsely populated (as a result of inserts/updates during it's construction), whereas temp table supposedly is pretty dense (built in special "append" mode). I wonder, what is the size of that temp file? It would show you amount of real data you have, and I suspect it's somewhat smaller than 40+GB in you database. You can try do "SHUTDOWN COMPACT" (assuming you are using v.1.4.200 and have a backup), and most likely size of the db and it's scan time will go down after that (to ~6 min for lazy exec?), unless I am missing something and lazy execution has some significant overhead.
Reply all
Reply to author
Forward
0 new messages