Use of IN

87 views
Skip to first unread message

essence

unread,
May 12, 2012, 1:13:33 AM5/12/12
to H2 Database
I am getting a bit lost, Trying to improve efficiency of my delete
statements, which has an IN selection, and an id int primary key.

In the manual it has:

PreparedStatement prep = conn.prepareStatement(
"SELECT * FROM TABLE(X INT=?) T INNER JOIN TEST ON T.X=TEST.ID");
prep.setObject(1, new Object[] { "1", "2" });
ResultSet rs = prep.executeQuery();


and on a search with Google i found a message from Tom:

PreparedStatement prep = conn.prepareStatement(
"SELECT * FROM users WHERE login IN (?)");
prep.setObject(1, new Object[] { "1", "2" });
ResultSet rs = prep.executeQuery();


but in my delete query I get an exception:

org.h2.jdbc.JdbcSQLException: Data conversion error converting "(1,
2)"; SQL statement:
Caused by: java.lang.NumberFormatException: For input string: "(1, 2)"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:
48)
at java.lang.Long.parseLong(Long.java:410)
at java.lang.Long.parseLong(Long.java:468)
at org.h2.value.Value.convertTo(Value.java:796)


I am trying to filter on an attribute which has an id, of type int.

I have tried

ps1.setObject(1, new Object[]{new Integer(1),new
Integer(2)});

and
ps1.setObject(1, new Integer[]{1,2});

no joy.

My prepared statement is:

ps1 = connection.prepareStatement("delete from
doublearray where doublearray.wellresultid IN (?)");

essence

unread,
May 12, 2012, 1:37:10 AM5/12/12
to H2 Database
ps. my latest and probably fastest so far is a bit crude:

ps1 = connection.prepareStatement("delete from
doublearray where doublearray.wellresultid = ?");
for (int i = 0; i < arrayList.size(); i++) {
ps1.setInt(1, (Integer) arrayList.get(i));
ps1.addBatch();
}
ps1.executeBatch();
ps1.clearBatch();

essence

unread,
May 12, 2012, 12:01:34 PM5/12/12
to H2 Database
I have improved it a bot by doing it 5 at a time with

ps5 = connection.prepareStatement("delete from
doublearray where doublearray.wellresultid IN (?,?,?,?,?)");


and then using code above to do the rest.

I am generally deleting about 400 rows.

The critical thing is that the table has a column with a array type,
which holds about 3000 values.

So if H2 has to read a row before deleting, the read may be the
bottleneck which is harder to avoid.

I read somewhere about how a merge can be used.

Are there any plans to avoid the read in a delete?

essence

unread,
May 13, 2012, 3:55:41 AM5/13/12
to H2 Database
ps One thing i forgot to add, the delete performance is MUCH faster on
a table without any array data type. I can;t remember where I read it,
but Thomas said that a delete always reads data, which would explain
this difference.

What about doing a trick like doing an update first to remove the
array data, (set it to null) and then delete the rows?

Or does an update also read data before updating it?

essence

unread,
May 13, 2012, 4:27:02 AM5/13/12
to H2 Database
Just trued it.

Good news, delete is MUCH faster (x 100 or so) after I have set the
array data to null.

Bad news, the update is just as slow as the original delete so nothing
overall gained.

I think this simply confirms that a delete reads the whole row
including the array, and so does an update.

I think Thomas has said there are no short term plans to change this
behaviour, but there are some tricks with a merge which I might look
at.

Thomas Mueller

unread,
May 18, 2012, 4:39:59 AM5/18/12
to h2-da...@googlegroups.com
Hi,

Generally, you should try not to use conditions, and an index on very
large column (for example a very large text column, or an array column
with 3000 values). If you need to delete multiple rows, you could try:

delete from test where id in (select x from table(x int=?))

Regards,
Thomas

essence

unread,
May 19, 2012, 12:55:02 AM5/19/12
to H2 Database
I think my conclusions were that it is nothing to do with conditions
or indexes or anything, the bottleneck is that when deleting a row,
the complete row is read, and if that row contains a column with a
large array, the reading of the array is the bottleneck (even though
it is about to be deleted).

Is this conclusion correct? If so, will it be any time soon that this
behaviour (redundant reading of columns) is changed? I didn;t see it
on any road map.

Finally, how do i send donations to support the work of this great
database!

On May 18, 9:39 am, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

Thomas Mueller

unread,
May 21, 2012, 1:36:24 PM5/21/12
to h2-da...@googlegroups.com
Hi,

the bottleneck is that when deleting a row,
the complete row is read, and if that row contains a column with a
large array, the reading of the array is the bottleneck (even though
it is about to be deleted).

Yes. However I don't think converting the byte array to a row is the problem. I guess the problem is that the page is read (from disk). This is unavoidable unless if you delete all rows within a page.

What you could do is delete all rows of a table, using "truncate" or "drop table".

Regards,
Thomas

essence

unread,
May 26, 2012, 4:44:51 PM5/26/12
to H2 Database
OK, thanks for clarifying, the problem is indeed disc access, and the
longer the row the fewer rows are on a page, I guess, so effectively
long arrays will slow down deletion because more page reads need to be
performed.

I have only pursued this as a counterpart to all the advice about IN.
Getting rid of IN does not always help!

Interesting that you keep the array data on the same page as the rest
of the row, I assume? I thought some databases put this stuff (and
blobs etc.) somewhere different, maybe for this same reason.

On May 21, 6:36 pm, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

Rami Ojares

unread,
May 27, 2012, 7:43:43 AM5/27/12
to h2-da...@googlegroups.com
Indeed. There's a lot to wonder in this world.
I have been always wondering why H2 supports array data structure at all
that does not nicely fit into the relational model.

- Rami

essence

unread,
May 28, 2012, 1:30:57 AM5/28/12
to H2 Database
We are going a bit off topic, but I use arrays to hold arrays (!) of
about 3000 float objects. I have never done the test, but I assume
that holding this in conventional relational tables would have a
negative impact on performance. Retrieval performance is vital for my
application. Plus memory starts to become an issue - do I want to
store all those other indices for each float value?

Maybe I am making the normal mistake of assuming bottlenecks before
doing benchmarks (in my experience, my initial guesstimate of
bottlenecks is ALWAYS incorrect).

Use of arrays in relational databases has been around since Ingres in
the mid 80's, they had some very nice user defined objects way back
then, and some ideas on how to extend the query language (I think you
could define your own extensions). Postgres continues to lead in the
area. Such a shame Ingres fell by the wayside. The grandfathers of
relational databases (http://en.wikipedia.org/wiki/
Michael_Stonebraker) were behind these concepts, they wrote a
manifesto for 3rd (or 4th?) generation relational databases. Aha,
found it, worth reading by anybody interested in databases, a classic
paper.

http://www.cl.cam.ac.uk/teaching/2003/DBaseThy/or-manifesto.pdf

Rami Ojares

unread,
May 28, 2012, 3:04:32 AM5/28/12
to h2-da...@googlegroups.com
Definately off-topic but who's counting?
I suggest The Third Manifesto by Chris Date and Hugh Darwen.
http://www.thethirdmanifesto.com/

The simple reason why arrays do not fit in the relational model is
that in relational model values are stored as attributes (column) of a
relation (table) in tuples (row).

The attribute values are supposed to be atomic.
They can be arbitrarily complex but that complexity should be opaque
to the relational query engine.

Type specific accessor methods can operate on the internals of the type.

Note: This is not a deficiency. An array can be represented in a
relational model
as a relation with 2 attributes.

CREATE TABLE ARRAY_1(position INT PRIMARY KEY, value varchar(or whatever))

The user needs to manage the position if he wants to keep indexing
contiguous and starting from 0 or 1.
He could also calculate the position at query time.

One issue that complicates the ARRAY type is that one needs to define
the type of array elements (subtype?)
Copying from java generics: ARRAY<INT>(maxlength)
Then one would need to define multiple accessor methods to arrays so
that arrays could be used
as first class citizen in queries.

Gotta go, but tell me why cant you model your array data into a normal
table using the conventional types?
What is the feature you need from an array?

- Rami

essence

unread,
May 29, 2012, 2:06:56 AM5/29/12
to H2 Database
Shit, I wrote a response, but lost it.

Date's manifesto is a lot less entertaining, with fewer bon mots, than
Stonebraker's, so I am staying with Stonebraker.

I have always loved the phrase 'they simply need educating'. So true.

On May 28, 8:04 am, Rami Ojares <rami.oja...@gmail.com> wrote:
> Definately off-topic but who's counting?
> I suggest The Third Manifesto by Chris Date and Hugh Darwen.http://www.thethirdmanifesto.com/

Rami Ojares

unread,
May 29, 2012, 9:06:53 AM5/29/12
to h2-da...@googlegroups.com
On 29.5.2012 9:06, essence wrote:
> Date's manifesto is a lot less entertaining, with fewer bon mots, than
> Stonebraker's, so I am staying with Stonebraker.

Boring? Maybe.
Correct? Definately.
For wit and wordplay I read Shakespeare ;-)

- Rami

essence

unread,
May 30, 2012, 12:34:02 AM5/30/12
to H2 Database
It is only the naive who want correctness, I want pragmatism and
performance.

All it takes is a benchmark, use an array data type or put all the
values in rows,

Time a query, time a deletion, look at memory.


So you want to store an image as a byte/bit in each row?

You can keep your 'correctness'.

Rami Ojares

unread,
May 30, 2012, 8:04:34 AM5/30/12
to h2-da...@googlegroups.com
I do apologize if I came across as arrogant or impudent.
I do not think that this is a competition.
And as a fairly liberal person I do enjoy when people can do things in a
way that suits them.

All I am really saiyng is that Date's book have changed my way of viewing
the relational databases. The clearest and most practical book I have
ever read about sql is Date's "A Guide to SQL Standard".
Looks small, looks boring but the content is pure knowledge.

So while I do agree that a programmer needs to test his works in practice
and work out the final solution often by trial and error, I also have
experienced
those fantastic moments when an idea has been so solid and logical
that the code flows out beautifully from clearly defined logical principles.

"So you want to store an image as a byte/bit in each row"

Atomicity is always a matter of perspective.
Atomic to whom?
Image can be seen as atomic from the dbms perspective, when it has the same
operators defined for it just like an integer. Like equality, order
(which it does not have),
and so on and so forth. So using array or any other type is completely
fine with me
as long as the type specific ideas do not start their journey as part of
the relational machinery.

To be fair that was not your intention. You just wanted that the dbms
would do this type
specific optimization for the array type. And frankly there is nothing
incorrect about that! :-)
I simply reacted against array because often people start pushing all
kinds of datastructures
into relational model. If you want to use datastructures freely then I
would recommend using
an object database. Or maybe JackRabbit where Thomas is also involved.

The point of relational database is to use the relation as the
datastructure with which
you can model anything.

And finally rather than speak of correctness I would talk about
designing systems
whose complexity does not grow exponentially when applied to more and
more complex tasks.

- Rami

essence

unread,
Jun 1, 2012, 12:58:08 AM6/1/12
to H2 Database
Thanks, Rami, I don;t think we are in disagreement, although I can
come across as a bit harsh myself! I went through all the object v
relational database stuff in the early 1990's, and always tended to be
on the side of relational vendors who want to extend, rather than the
revolutionary object vendors. I remember UniSQL and something from HP
whose name I forget, they have both fallen by the wayside, but Oracle
carries on (never been a fan of Oracle, always was an Ingres
supporter, but that shows the power of marketing v technology).

But I still love the way Stonebraker writes - 'they just need
educating' - how many times have we all wished we had the balls to say
that? Stonebraker has held fast to the relational principles and the
need for query languages, so there is probably not too much space
between him and Codd/Date.

Completely off topic, but I was a participant in some workshops about
'what is a good data model' which is rarely talked about. This got
into ISO standards the EXPRESS language, and Shell published some
useful work, it's amazing how there are lots of books about the
semantics of SQL (the nuts and bolts), but almost nothing about how to
evaluate data models for quality and extensibility and maintainability
and support for different business purposes.

There have emerged quite sophisticated templates for modelling various
aspects - my main interest was plant and machinery and maintenance and
procurement. Not a simple topic. I can give references if interested.
OK, you forced me, here it is, I participated in many of the workshops
he chaired in the early 1990's.

http://www.amazon.com/Developing-High-Quality-Data-Models/dp/0123751063

How did we get here from 'IN' ?
Reply all
Reply to author
Forward
0 new messages