Dear All, compliments of the day.
Very sorry for a long text ahead.
Using the H2 database for some reporting, I stumbled over a few performance concerns.
For a small bank, we use the H2 database as backend for an IFRS compliant accounting software and build with it the financial reports, like trial balance and disclosures. (With the same software, our prime customers use Oracle on quite large servers with more than 24 core and 128 GB Ram).
The reporting related workflow executes 6 simple steps:
Report Definition:
1) Define the Report Source Query and persist that into a table. If that table does not exist, it is built by a CREATE TABLE ... AS SELECT ... FROM statement. If that table exists, an INSERT INTO ... SELECT ... FROM is used instead.
Also that table will be indexed automatically, based on the WHERE clause.
2) Define many small Aggregate Query Definitions based on that Source Query Table
3) Define the mapping between the result sets and columns of these Aggregate Query Definitions and Excel Spreadsheets and store everything in an Report Template XML file
Report Compilation:
4) Select the Report Template XML file
5) Execute the Report Source Query
6) Execute the Aggregate Queries and transfer the data into the Excel file
In general, step 6) will be very fast and step 5) can be monitored and tuned in order to make it fast enough (usually we use the "Long-Ops", Oracle ADDM and SQL Tuner and apply SQL Profiles)
Now my challenge: The relevant customer is small enough (30k live accounts) and H2 performs extremely well normally. Also the Trialbalance and the Disclosures are built well and within 1-2 minutes -- but only on the first execution after the program start.
Whenever I start exactly the same report directly after the first report is built (without any change of data or report definitions, without any concurrent activity etc.), it will build the Trial Balance fast and then it is stuck forever on the Report Source Query for the Disclosures. I have no idea why. Killing the program and restarting it will give the same result: first execution very fast, following execution does not return data forever, but only for that Disclosure Report. Trialbalance is still fast and both queries (Trialbalance and Disclosures) are fairly complex with a lot of joins and sub-queries.
My questions are: how come? Should a repeated execution not be faster because of the query cache? And how can I see and monitor, what the database is actually doing and what execution plan has been applied for the running queries? (I know how to get an execution plan before execution, but not ex-post for an already running query.)
Are there any tools similar to "Long-Ops" and/or V$SESSION?
And how am I supposed to monitor and analyze such situations?
Important for the records: I hate Oracle and love H2 database very much and mention the Oracle tool-set only as illustration, what I am looking for. I am convinced that H2 can handle the load and the queries very well in general, but only something goes wrong after the first executions. I want to find and fix that cause.
Thank you so much already and best regards!
Andreas