Impossibly slow INSERT when there's a computed SELECT column

135 views
Skip to first unread message

JND

unread,
Apr 12, 2012, 8:48:32 AM4/12/12
to h2-da...@googlegroups.com
Hi,

Wondering if this rings a bell with anyone. I have some tables with a few million rows each. It's all speedy enough unless I have a computed column (for example VARCHAR(5) AS (SELECT ... FROM indexed_relation...). To begin with, if I add an extra million rows to a table that has no computed column, the operation only takes a few seconds. If I then add a computed column to that table, for example a computed column that selects on a related table (using PK), the 'alter table' takes a few seconds. So there is no problem with the computed select column per se. But then, if I insert another million rows, the operation takes hours (if it ever finishes at all...). So inserting rows when there is an existing computed select column is unusably problematic. Just thought someone else might have definitely or definitely NOT experienced this same issue. It's consistent for H2 1.2 as well as 1.3. (I'll try to file a test case and bug report in the next few days, but have to get other things done to meet a deadline, having lost a whole afternoon and most of today to tracing this issue.)

James

Noel Grandin

unread,
Apr 12, 2012, 1:08:03 PM4/12/12
to h2-da...@googlegroups.com
http://h2database.com/html/features.html#computed_columns

"A computed column is a column whose value is calculated before storing. "

> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/h2-database/-/tS1mfNz04fEJ.
> To post to this group, send email to h2-da...@googlegroups.com.
> To unsubscribe from this group, send email to
> h2-database...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.

JND

unread,
Apr 12, 2012, 8:09:56 PM4/12/12
to h2-da...@googlegroups.com
On Friday, April 13, 2012 1:08:03 AM UTC+8, Noel Grandin wrote:
http://h2database.com/html/features.html#computed_columns

"A computed column is a column whose value is calculated before storing. "

That is correct Noel, that's why I'm using them. As I described, they're very fast except when they're inexplicably not, and the latter is the problem.
 

Thomas Mueller

unread,
Apr 13, 2012, 12:14:37 AM4/13/12
to h2-da...@googlegroups.com

Hi,

I think it's rather uncommon to use a query for a computed column. What is your use case? What does the query look like exactly?

I don't know what the problem could be. The most obvious things is to ensure the query is fast, but I guess you have already verified that, right? Then, I would probably use the built-in profiler or just "jps -l" and "jstack -l <pid>" to get a few full thread dumps. That way you can find out what exactly is slow.

Regards,

Thomas

--

Noel Grandin

unread,
Apr 13, 2012, 1:54:03 AM4/13/12
to h2-da...@googlegroups.com
Read the bit about "the value is calculated before storing"

Thats why your inserts are slow.

> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/h2-database/-/Vein3YJLMqcJ.

JND

unread,
Apr 13, 2012, 2:04:11 AM4/13/12
to h2-da...@googlegroups.com
Nope, it's very fast in terms of what I wrote: “If I then add a computed column to that table, for example a computed column that selects on a related table (using PK), the 'alter table' takes [merely] a few seconds.” This is based on my understanding that the values are calculated upon ADD COLUMN. If you’re saying that the values are instead calculated on the fly (not during ADD COLUMN) then it’s different than the documentation. The performance problem only occurs with INSERT and MERGE. I’ll work on the profiling and bug report when I get the chance.

Noel Grandin

unread,
Apr 13, 2012, 3:36:28 AM4/13/12
to h2-da...@googlegroups.com, JND
The computed column is literally calculated whenever page data is written to disk i.e. whenever an INSERT or UPDATE or similar query is executed.

So, no, we don't calculate the data when doing an "alter table add column".

On 2012-04-13 08:04, JND wrote:
Nope, it's very fast in terms of what I wrote: �If I then add a computed column to that table, for example a computed column that selects on a related table (using PK), the 'alter table' takes [merely] a few seconds.��This is based on my understanding that the values are calculated upon ADD COLUMN.�If you�re saying that the values are instead calculated on the fly (not during ADD COLUMN) then it�s different than the documentation. The performance problem only occurs with INSERT and MERGE. I�ll work on the profiling and bug report when I get the chance.


On Friday, April 13, 2012 1:54:03 PM UTC+8, Noel Grandin wrote:
Read the bit about "the value is calculated before storing"

Thats why your inserts are slow.

--

JND

unread,
Apr 13, 2012, 3:43:51 AM4/13/12
to h2-da...@googlegroups.com, JND
On Friday, April 13, 2012 3:36:28 PM UTC+8, Noel Grandin wrote:
> So, no, we don't calculate the data when doing an "alter table add column".

In my case, as mentioned in my post, the table already contains millions of rows. So when I add the computed column to the existing table with those rows, you are saying the new calculated values are NOT calculated for the existing rows during ADD COLUMN. So calculation is deferred? Since the ADD COLUMN is fast, and the calculated values are reliably present upon SELECT, the implementation is ideal. Is it possible to get this behaviour with INSERT?

Noel Grandin

unread,
Apr 13, 2012, 3:51:16 AM4/13/12
to h2-da...@googlegroups.com, JND
Not at present.

We'd need a new feature that basically said "don't store/cache this
computed column"

If you want to work on such a thing, the place to start looking in the
code is probably by working backwards from this method:
org.h2.table.Column#computeValue(Session, Row)

You're also welcome to try running a profiler, and seeing if you can
spot something. It's quite possible your use case is triggering a
performance issue we haven't seen before.

JND

unread,
Apr 13, 2012, 9:16:44 AM4/13/12
to h2-da...@googlegroups.com
On Friday, April 13, 2012 12:14:37 PM UTC+8, Thomas Mueller wrote:
The most obvious things is to ensure the query is fast, but I guess you have already verified that, right? Then, I would probably use the built-in profiler or just "jps -l" and "jstack -l <pid>" to get a few full thread dumps. That way you can find out what exactly is slow.

Thanks Thomas, yes the query is fast. I've now derived a test case and obtained the common stack traces so I can file a bug report.

JND

unread,
Apr 13, 2012, 9:58:34 AM4/13/12
to h2-da...@googlegroups.com, JND
Hi Noel,

I've produced two test cases from this issue:

1. Computed columns (including computed SELECT) are evaluated and stored during ADD COLUMN. This happens rapidly and I'm pleased with the performance in my real-life scenario. So there is 'no reason' for bulk INSERT to be slow.

create table test1 (a1 identity, b1 int);
insert into test1 values(1, -1);
create table test2 (a2 identity);
insert into test2 values(1);
alter table test2 add column b2 int as select b1 from test1 where a1=a2;
truncate table test1; -- erase b1 to test if b2 has already been stored...
select * from test2; -- ...yes it has

2. I can reproduce slow INSERT without computed column, if the table contains an indexed column. In my real-life scenario, the computed column was a red herring because it simply changed the number of indexed columns in my destination table. Although 'obviously' INSERT is slower than INSERT SORTED, the problem on my test machines (H2 1.3.166) is that INSERT for 30 000 rows is 20-80 times slower than INSERT SORTED. In my real-life scenario, although I was using INSERT SORTED, the destination table includes an addition indexed column that is not part of the sort.


James
Reply all
Reply to author
Forward
0 new messages