Problem with SELECT ... ORDER BY no data, revisited!

sverhagen

unread,

Nov 5, 2009, 8:27:51 PM11/5/09

to H2 Database

Hi, all.

Tonight I've encountered two (independent) production systems going
into failure on an issue that has remarkable resemblance with a
resolved H2 database issue from the past:

References:

Group:
http://groups.google.com/group/h2-database/browse_frm/thread/41c475eb55a299d1

Change log for 1.1.111:
"Queries that are ordered by an indexed column returned no rows in
certain cases (if all rows were deleted from the table previously, and
there is a low number of rows in the table, and when not using other
conditions, and when using the default b tree index)."

We've indeed also seen this problem in the past with 1.1.108. That's
the exact reason for upgrading to 1.1.111. We've not seen the problem
anymore for a while until mysteriously seeing it twice in one night,
now.

Because of Java 1.4 we're currently still at 1.1.111. I also don't see
anything in the change log of later versions that is clearly related
to this problem. I've verified to be absolutely sure that these
machines are running the exact 1.1.111 version.

Problem that we're seeing is a table that's reduced to a small number
of rows (2 in the example below) on which a certain SELECT query
unexpectedly gains no records in result:
SELECT count(*) FROM embedded_events; --> count = 2
SELECT * FROM embedded_events ORDER BY ID; --> 'no rows'

Workarounds:
SELECT * FROM embedded_events ORDER BY ID DESC; --> '2 rows'
SELECT * FROM embedded_events ORDER BY ID+0; --> '2 rows'

Given these were production systems I had at some point to cut corners
and restart stuff. I did make a copy of one of the databases, while it
was still running and giving us the faulty results. Not sure if that's
of any use (I've placed it on another machine and am not seeing the
problem anymore).

Can I provide any more information to allow the gurus here to analyze
this problem?

Regards, Sander.

Thomas Mueller

unread,

Nov 9, 2009, 3:12:41 PM11/9/09

to h2-da...@googlegroups.com

Hi,

> Because of Java 1.4 we're currently still at 1.1.111.

Did you try Retrotranslator? See also
http://www.h2database.com/html/build.html#portability

> I also don't see anything in the change log of later versions that is clearly related
> to this problem.

That's true, I don't see what it could be... You are not using views
that use GROUP BY? If yes it could be the problem described in version
1.1.112: "GROUP BY queries with a self-join (join to the same table)
that were grouped by columns with indexes returned the wrong result in
some cases."

> SELECT count(*) FROM embedded_events; --> count = 2
> SELECT * FROM embedded_events ORDER BY ID; --> 'no rows'

In theory it could be a corrupted index file, after a power failure...
If that's the case the problem will disappear after deleting the
.index.db file. To avoid corrupted index files, append ;LOG=2 to the
database URL. See also
http://www.h2database.com/html/grammar.html#set_log

> (I've placed it on another machine and am not seeing the problem anymore).

The reason could be that the index was re-built when opening due to a
bug that was fixed in version 1.1.112: "...indexes were always
re-built unnecessarily when opening"

Regards,
Thomas

sverhagen

unread,

Nov 10, 2009, 3:38:25 AM11/10/09

to H2 Database

Hi, thanks for that.

> > Because of Java 1.4 we're currently still at 1.1.111.
>
> Did you try Retrotranslator? See alsohttp://www.h2database.com/html/build.html#portability

Retrotranslation scares me. Anyway, we've recently moved to a target
platform that would actually allow me to run the database under 1.6.
I'm just not entirely seeing the point of moving to a newer version
(effectively taking regression risks) if there's nothing in the change
log indicating that I really should? Also, the fact that this is not
something that's happening on a daily basis (apparently) I have no
way, after upgrading to whatever newest version, to tell that we've
actually improved anything :-(

> > (I've placed it on another machine and am not seeing the problem anymore).
>
> The reason could be that the index was re-built when opening due to a
> bug that was fixed in version 1.1.112: "...indexes were always
> re-built unnecessarily when opening"

Here again I was using 1.1.111, so I'd assume not.

Could this database dump be, however, of any use to you?

> That's true, I don't see what it could be... You are not using views
> that use GROUP BY?

No views, just a few tables. Simple queries.

> In theory it could be a corrupted index file, after a power failure...

If that would be the case than it has at least been running
(seemingly) fine for days before causing troubles. Instead, it looks
really much alike the problem I referred to earlier (the one
supposedly fixed in 1.1.111).

Best regards, Sander.

Thomas Mueller

unread,

Nov 12, 2009, 1:30:13 PM11/12/09

to h2-da...@googlegroups.com

Hi,

> if there's nothing in the change
> log indicating that I really should?

Not necessarily, that's true. The only issue that affects stability
may be in version 1.1.114: In some situations, an
ArrayIndexOutOfBoundsException was thrown when adding rows. This was
caused by a bug in the b-tree code.

> (effectively taking regression risks)

Upgrading is always a risk. The risk can be reduced with test cases.
There are quite many test cases for H2. If you want to reduce the
risk, I suggest to implement and run tests (for example randomized
tests).

>> The reason could be that the index was re-built when opening due to a
>> bug that was fixed in version 1.1.112: "...indexes were always
>> re-built unnecessarily when opening"
>
> Here again I was using 1.1.111, so I'd assume not.

> Could this database dump be, however, of any use to you?

I'm not sure if it will help in this case. It only helps if the
problem can be reproduced.

In any case, I suggest backup a database from time to time to SQL
script using SCRIPT TO '...'.

Regards,
Thomas

sverhagen

unread,

Nov 13, 2009, 11:15:48 AM11/13/09

to H2 Database

This is all well and fine, but are we going to undertake anything to
reproduce and fix it, or leave it to this? I can possibly write this
off as a one-time fluke, but the general trust in H2 as a database is
getting big hits within our organisation, due to this incident. I'm
generally very positive about H2, but this incident is leading to a
lot of questions now :-(

On Nov 12, 7:30 pm, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

Thomas Mueller

unread,

Nov 14, 2009, 4:49:10 AM11/14/09

to h2-da...@googlegroups.com

Hi,

> but are we going to undertake anything to
> reproduce and fix it, or leave it to this?

Of course it would be great if we can find out what the root cause of
the problem was.

- What is your database URL?
- Do you use Tomcat or another web server?
Do you unload or reload the web application?
- Did you use multiple connections?
- With which version of H2 was this database created?
You can find it out using:
select * from information_schema.settings where name='CREATE_BUILD'
- Do you use any settings or special features (for example, the setting
LOG=0, or two phase commit, linked tables, cache settings)?
- Did the application run out of memory (once, or multiple times)?
- Is the application multi-threaded?
- What operating system, file system, and virtual machine
(java -version) do you use?
- Is it (or was it at some point) a networked file system?
- How big is the database (file sizes)?
- Is the database usually closed normally, or is process terminated
forcefully or the computer switched off?
- Is it possible to reproduce this problem using a fresh database
(sometimes, or always)?
- Are there any other exceptions (maybe in the .trace.db file)?
Could you send them please?

> I can possibly write this off as a one-time fluke

I wouldn't. Stability is very important. Let's see what we can do.

Regards,
Thomas

sverhagen

unread,

Nov 15, 2009, 8:33:20 AM11/15/09

to H2 Database

On Nov 14, 10:49 am, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

> - What is your database URL?

jdbc:h2:tcp://127.0.0.1:9080/projectname

> - Do you use Tomcat or another web server?
> Do you unload or reload the web application?

No, we're running an application from Main(), which is the only user
of the database. (Except for the occasional service engineer tunneling
into the web UI of H2.)

> - Did you use multiple connections?

There's one application connecting to the database using the following
setup:

<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"
destroy-method="close">
<property name="driverClassName" value="org.h2.Driver"/>
<property name="url"><value>${database.url}</value></property>
<property name="username"><value>${database.user}</value></
property>
<property name="password"><value>${database.password}</value></
property>
<property name="maxActive" value="90"/>
</bean>

As far as I can tell we have no more than two threads working on the
table that gave us the problems (doing a sort of provider/consumer
pattern).

> - With which version of H2 was this database created?

CREATE_BUILD 111
We're recreating the database on every software re-install, so we're
not carrying much legacy behind us in that respect.

> - Do you use any settings or special features (for example, the setting
> LOG=0, or two phase commit, linked tables, cache settings)?

I would not say so. All default setting.
This is our command-line:

java -cp .:bin/h2.jar -Xmx30m -Dh2.bindAddress=localhost -
Duser.timezone=Europe/Amsterdam org.h2.tools.Server -tcp -tcpPort 9080
-web -webPort 8080

> - Did the application run out of memory (once, or multiple times)?

There is a small chance that our application did, although not likely
after recent changes to the amount of assigned memory. Also, H2 is not
running in-process. The H2 application I've never seen running out of
memory.

> - Is the application multi-threaded?

Answered above.

> - What operating system, file system, and virtual machine
> (java -version) do you use?

uname -a
Linux 000bab24fa03 2.6.24-24-generic #1 SMP Sat Aug 22 01:06:14 UTC
2009 i686 GNU/Linux

cat /etc/issue
Ubuntu 8.04.3 LTS \n \l

java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) Client VM (build 14.2-b01, mixed mode, sharing)

> - Is it (or was it at some point) a networked file system?

No, this is Compactflash

> - How big is the database (file sizes)?

37M projectname.18.log.db
11M projectname.19.log.db
7.3M projectname.data.db
37M projectname.index.db
4.0K projectname.lobs.db
99 projectname.lock.db
(This taken from a randomly running system.)

> - Is the database usually closed normally, or is process terminated
> forcefully or the computer switched off?

Never closed normally. Power down or "kill -9" are common scenarios.

> - Is it possible to reproduce this problem using a fresh database
> (sometimes, or always)?

No, it appears to be a rare problem.

> - Are there any other exceptions (maybe in the .trace.db file)?
> Could you send them please?

Will send the database files privately.

Thomas Mueller

unread,

Nov 17, 2009, 2:17:27 PM11/17/09

to h2-da...@googlegroups.com

Hi,

Everything looks 'normal', except for:

> Compactflash

> Never closed normally. Power down or "kill -9" are common scenarios.

I'm not saying this is a file system problem, but it's just that I
didn't test this scenario well enough. I didn't test with compact
flash at all, and I didn't test the power down enough. I do run power
failure tests now regularly with the "page store" engine, but I didn't
before. The "kill -9" case is tested (part of the regular unit test),
but not many test cases are written.

I suggest to avoid specially "power down". Also "kill -9" is a bit
risky (just from the relatively low number of test cases).

Regards,
Thomas

sverhagen

unread,

Nov 17, 2009, 4:35:25 PM11/17/09

to H2 Database

> I'm not saying this is a file system problem, but it's just that I
> didn't test this scenario well enough. I didn't test with compact
> flash at all,

Do you know any reasons here to assume Compactflash to work
differently than, say, a common hard disk, given that it's all ext3 so
that H2 may not see a difference?

> and I didn't test the power down enough. I do run power
> failure tests now regularly with the "page store" engine, but I didn't
> before. The "kill -9" case is tested (part of the regular unit test),
> but not many test cases are written.

Well, we're currently in no position to avoid power down. It's
inherent to our application that personnel walks to these particular
machines, open up the cabinet doors and hit the power button. We can
tell them not to do so, but they'll decide to know better. We
currently have no support in our hardware platform for graceful
shutdown in case of any such power interruption.
Because of the risk (well, it's actually a certainty) of unexpected
power down, we never bothered to shutdown in any other way than a
"kill -9".

Back to the original issue, can we do something to analyze that one
further, or perhaps even implement some analysis for when it ever
happens again?

Thanks for your continuing attention in respect to this matter.
Regards, Sander.

Thomas Mueller

unread,

Nov 23, 2009, 5:43:53 PM11/23/09

to h2-da...@googlegroups.com

Hi,

> Do you know any reasons here to assume Compactflash to work
> differently than, say, a common hard disk, given that it's all ext3 so
> that H2 may not see a difference?

The behavior on power failure is different when using SSD (compared to
regular hard disks). See also
http://www.yaffs.net/yaffs-nand-specific-flash-file-system-introductory-article

> Well, we're currently in no position to avoid power down.

OK, then I suggest to use online backups, or another database that is
well tested for power down. Kill -9 is very different than power down
by the way. There is a test case for kill -9. But kill -9 will not
write data to the wrong page on the disk (unlike power down when using
hard disks). However version 1.1.x of H2 is not well tested for power
down; I do test power down with H2 version 1.2.x now (but using a
regular hard disk; I didn't test with compact flash so far). I guess
99% robustness can be achieved (but I don't think 100%; let's see).

Regards,
Thomas

sverhagen

unread,

Nov 23, 2009, 6:59:18 PM11/23/09

to H2 Database

On Nov 23, 11:43 pm, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

> OK, then I suggest to use online backups, or another database that is
> well tested for power down

Well, I appreciate your honesty. For the moment I'm still liking H2
database. But hey, yes, that 1% may kill me ;-)

I'll try going for the scenario in which problems are just the least
likely.

We've already resorted to using the "...ORDER BY ID, ID" workaround.

Somewhere down this discussion you've mentioned the automatic re-
generation of indexes on startup. It appears that I'm accidently
having those (first fixed in 1.1.112). Would that something to want
after a restart, given that those may still often be triggered by a
power-down, hence possibly corrupting my indexes? Is there some kind
of switch starting 1.1.112?

Another thing to do then would be to go to 1.2.x, given that you've
started testing kill -9 scenarios.

Are there any other measures that you can recommend to:
1. Reduce the chance on reoccurence?
2. Be able to post the best possible reports whenever this happens
again in the future?

Thanks again, Sander.

David Brown

unread,

Nov 23, 2009, 7:41:28 PM11/23/09

to h2-da...@googlegroups.com

On Mon, Nov 23, 2009 at 03:59:18PM -0800, sverhagen wrote:
>On Nov 23, 11:43�pm, Thomas Mueller <thomas.tom.muel...@gmail.com>
>wrote:
>> OK, then I suggest to use online backups, or another database that is
>> well tested for power down
>
>Well, I appreciate your honesty. For the moment I'm still liking H2
>database. But hey, yes, that 1% may kill me ;-)

You may just find that Thomas is just a bit more honest about
reliability than other databases seem to be.

I just got a JVM up an running on a small device that uses SD as it's
primary storage. I may be able to come up with some kind of test to
see how it deals with bad powerdowns.

David

sverhagen

unread,

Nov 23, 2009, 8:03:54 PM11/23/09

to H2 Database

On Nov 24, 1:41 am, David Brown <h2...@davidb.org> wrote:
> I just got a JVM up an running on a small device that uses SD as it's
> primary storage. I may be able to come up with some kind of test to
> see how it deals with bad powerdowns.

I'm not even on such a small footprint device anymore, although my
project started out on one. (ARM9, 128MB RAM. Yeah, yeah, size is
relative.) We had this problem on the old platform with a pre-1.1.111,
now had it on the new platform with 1.1.111. The new platform is
nothing that'll make you lift your eyebrows: plain x86 architecture,
some Celeron Mobile something. I've also been thinking to offer some
test setup, but I can hardly imagine that that will make any of your
hearts beat any faster, given the commodity kind of hardware we're
using now (except for the CF, perhaps). Anyway, I'm happy to help, as
long as it's from a H2 user's perspective; I'm not seeking to become
any kind of expert on its inner workings :-)

Sander.

David Brown

unread,

Nov 24, 2009, 2:31:44 AM11/24/09

to h2-da...@googlegroups.com

On Mon, Nov 23, 2009 at 05:03:54PM -0800, sverhagen wrote:

>On Nov 24, 1:41�am, David Brown <h2...@davidb.org> wrote:
>> I just got a JVM up an running on a small device that uses SD as it's
>> primary storage. �I may be able to come up with some kind of test to
>> see how it deals with bad powerdowns.
>
>I'm not even on such a small footprint device anymore, although my
>project started out on one. (ARM9, 128MB RAM. Yeah, yeah, size is
>relative.)

Interestingly, the only thing really small about this device is the
physical footprint. It runs at 1Ghz, has 1GB of RAM, and 16GB of SD.
SD is used because of physical size.

David

Reply all

Reply to author

Forward