Re: Java Object ORDER BY and VARCHAR comparisons

357 views
Skip to first unread message

Steve McLeod

unread,
Jun 17, 2012, 6:42:04 AM6/17/12
to h2-da...@googlegroups.com
Hi Christopher,

Comparing an Object with a String using toString() seems to me like a misuse of toString(). It will solve your particular case, but I don't think it is a good general approach.

You do make a good point that if a Java Object that implements Comparable is stored in a column, then one should expect that ordering would use the Object's compareTo(...) method.

In your case, I'd recommend adding an additional database column to the table. This column would be for the String representation, with type VARCHAR, and would be for the purposes of ordering.

Regards,

Steve


On Saturday, 16 June 2012 15:31:31 UTC+2, Christopher Deckers wrote:
Hi,

We are using H2 (h2-2012-05-23) and we faced a few issues.

One of them is that I replaced a VARCHAR field with a Java object (OTHER) that contains the string and some additional information.
The problem now is that it breaks existing ORDER BY clauses and equality tests with VARCHARs, though it implements the CompareTo method (for objects of same type) and the toString() method.

I think that comparing the serialized bytes is nonsense for the Java object case, and I think that a Java Object's toString and compareTo method should be used when appropriate.

As a temporary(?) solution, I changed the following code:

* In "Value.getHigherOrder(int t1, int t2)", line 325, I added:

        if(t1 == Value.STRING && t2 == Value.JAVA_OBJECT || t2 == Value.STRING && t1 == Value.JAVA_OBJECT) {
            return Value.STRING;
        }


* In "ValueJavaObject", I added these:

    @Override
    protected int compareSecure(Value v, CompareMode mode) {
        if(v.getType()  == Value.STRING) {
            Object o = Utils.deserialize(getBytesNoCopy());
            return o.toString().compareTo(v.getString());
        }
        if(v.getType() == Value.JAVA_OBJECT) {
            Object o = Utils.deserialize(getBytesNoCopy());
            if(o instanceof Comparable) {
                Object o2 = Utils.deserialize(v.getBytesNoCopy());
                if(o.getClass() == o2.getClass()) {
                    return ((Comparable)o).compareTo(o2);
                }
            }
        }
        return super.compareSecure(v, mode);
    }

    @Override
    public String getString() {
        return Utils.deserialize(getBytesNoCopy()).toString();
    }


Note: I think JAVA_OBJECT should take low order only for String case which is why I did not try to change "Value.getOrder(int type)".

What do you think of my solution, can there be any problems with what I changed?
Is there something that could be done in H2 itself to solve my issue?

Cheers,
-Christopher

Christopher Deckers

unread,
Jun 17, 2012, 7:38:08 AM6/17/12
to h2-da...@googlegroups.com
Hi Steve,

> Comparing an Object with a String using toString() seems to me like a misuse
> of toString(). It will solve your particular case, but I don't think it is a
> good general approach.

Well, if like Comparable a marker interface would remove that feel of
misuse, then so be it. But toString is not that bad to me because, as
the Javadoc puts it: "Returns a string representation of the object.".
Someone using a Java object in a database is already an advanced user,
and comparing an object with a string would be logical to compare the
string representation of that object rather than its serialized bytes.
Not implementing the toString method just falls into the category of
errors like not implementing hashCode and equals when using HashMaps.


> In your case, I'd recommend adding an additional database column to the
> table. This column would be for the String representation, with type
> VARCHAR, and would be for the purposes of ordering.

First, this is a workaround. Second, if that workaround could work in
my case, I would do it :) Unfortunately, my use case is more complex
so I have to modify H2 itself...

I am basically auto converting an existing tabular structure to tables
in H2 with type detections and so on. There has to be a strict mapping
between the fields of my system and the columns of the tables so I
cannot insert additional columns. Simple types worked fine except for
one special type (and there could be more coming...) for which I
started using the Java object approach.

Conceptually, there has to be a way for a user-defined type (Java
Object) to also define how it behaves:
- how to convert it to other types.
- how to compare it, whether against other Java objects or other SQL types.

I think Comparable is a logical choice, and I think comparison to
other types would make sense for the String type, for which the
toString method is in my view a good candidate (cf. its Javadoc, not
its misuse by users).

Cheers,
-Christopher

Noel Grandin

unread,
Jun 25, 2012, 8:19:21 AM6/25/12
to h2-da...@googlegroups.com, Christopher Deckers
Hi

I'd be willing to take the Comparable part of this patch if it included
an addition to our unit-test framework. (hint, hint :-)

Also, see
http://www.h2database.com/html/build.html#providing_patches

Regards, Noel Grandin
> --
> You received this message because you are subscribed to the Google
> Groups "H2 Database" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/h2-database/-/hhqWqrVav4cJ.
> To post to this group, send email to h2-da...@googlegroups.com.
> To unsubscribe from this group, send email to
> h2-database...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.


Christopher Deckers

unread,
Jul 14, 2012, 4:22:03 AM7/14/12
to Noel Grandin, h2-da...@googlegroups.com
Hi Noel,

> I'd be willing to take the Comparable part of this patch if it included an
> addition to our unit-test framework. (hint, hint :-)

I was about to set it up to work on it when I saw the following:

"Version 1.3.168 (2012-07-13)
New system property "h2.serializeJavaObject" (default: true) that
allows to disable serializing Java objects, so that the objects
compareTo and toString methods can be used."

So, instead I downloaded the latest, installed it in our system, and
set the property: System.setProperty("h2.serializeJavaObject",
"false");

Then I tried something like:
SELECT * FROM Data WHERE Object = 'SomeText'

and got the error:
Hexadecimal string contains non-hex character: "SomeText"; SQL
statement: SELECT * FROM Data WHERE Object = 'SomeText' [90004-168]

Have I done something wrong or is there something I am missing?

Cheers,
-Christopher

Sergi Vladykin

unread,
Jul 14, 2012, 7:09:31 AM7/14/12
to h2-da...@googlegroups.com, Noel Grandin
Hi,

This is known limitation. Currently H2 does not know how to convert string to your object type other then to parse hexadecimal data. For such a query you have to use prepared statement and set parameter as .setObject(1, yourObject, Types.JAVA_OBJECT). yourObject here is not a String but object of type you stored in database. If you want to find object whos toString() method returns "SomeText" you can write it as ...where 'SomeText' = cast(Object as varchar).
Read more at http://h2database.com/javadoc/org/h2/constant/SysProperties.html#h2.serializeJavaObject

Sergi  

Thomas Mueller

unread,
Jul 14, 2012, 7:14:17 AM7/14/12
to h2-da...@googlegroups.com
Hi,

The current implementation of this feature is geared towards high
performance, which means trying to avoid serialization (at least for
in-memory databases), trying to avoid calling toString, and rather
calling the classes compareTo, hashCode, and equals methods.

Regards,
Thomas
> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/h2-database/-/O8dKpofboP4J.

Christopher Deckers

unread,
Jul 14, 2012, 8:15:08 AM7/14/12
to h2-da...@googlegroups.com
> This is known limitation. Currently H2 does not know how to convert string
> to your object type other then to parse hexadecimal data.

This is not the problem. The real problem is that Strings are trying
to be transformed to the Java object rather than transforming the Java
object to String. When the object is not serialized (cf newly
introduced system property), the object has a String representation
which should be used when comparing to strings.

My original implementation, which was not performant because it was
deserializing for every operation (and for which I was planning to do
what you essentially did), at least got this right (in my opinion). I
believe that transforming a String as if it were the bytes of a
serialized object is nonsense, especially in the case where we don't
serialize.

Hope this helps,
-Christopher

Christopher Deckers

unread,
Jul 14, 2012, 8:18:34 AM7/14/12
to h2-da...@googlegroups.com
Hi Thomas,

> The current implementation of this feature is geared towards high
> performance, which means trying to avoid serialization (at least for
> in-memory databases), trying to avoid calling toString, and rather
> calling the classes compareTo, hashCode, and equals methods.

There is no performance problem to consider that comparing String and
Non-Serialized Java Object should use String as the comparison type
instead of the object. This comparison would then obviously use the
toString method of the Non-Serialized Java Object for the comparison
with the String.

Cheers,
-Christopher

Thomas Mueller

unread,
Jul 14, 2012, 10:34:04 AM7/14/12
to h2-da...@googlegroups.com
Hi,

If you are interested in the string representation, why don't you just
use varchar? It's good that you care about your use case, but there
are other uses cases to be considered.

> The real problem is that Strings are trying
> to be transformed to the Java object rather than transforming the Java
> object to String.

The rules about which type is 'higher' or 'lower' than other types is
something very basic, and I wouldn't want to change it just because it
might be a bit easier for a few cases (well, just for this case). When
types don't match, the value of the lower type is converted to the
value of the higher type. The order of types is implemented in
Value.getOrder(int type). It is roughly: null, varchar, int, double,
timestamp, object, array. If varchar would be at the very end of the
list, then the following example would not find the row in the table:

drop table test;
create table test(ts timestamp primary key);
insert into test values('2001-01-01 10:00:00');
select * from test where ts = '2001-01-01 10:00:00';

because the string representation of the timestamp is actually
'2001-01-01 10:00:00.0' (ends with '.0').

Another reason why the database tries to convert 'SomeText' to a java
object is so the database can use an index if there is an index for
the given column. So basically the database, to allow using an index,
needs to use:

where columnName = cast('SomeText' as <columnType>)

Otherwise you would end up in a table scan each time. In the same way,
for the following example, an index is used:

create table test(id int primary key);
insert into test values(1), (10), (100);
select * from test where id = '1';

It would be very strange if the database would use different
conversion rules depending on whether there is an index or not.

> This comparison would then obviously use the
> toString method of the Non-Serialized Java Object for the comparison
> with the String.

That would mean you can't use an index on that column for such
queries. For you that might be acceptable, but I think it's not.

Regards,
Thomas

Christopher Deckers

unread,
Jul 15, 2012, 6:37:15 AM7/15/12
to h2-da...@googlegroups.com
Hi Thomas,

> If you are interested in the string representation, why don't you just
> use varchar?

I would if I could, but I can't so I won't.


> It's good that you care about your use case, but there
> are other uses cases to be considered.

Yes, let's come to that, and let's be pragmatic.
It's good that you care about the current logic at the low level
implementation, but there is actual usage to be considered.


> The rules about which type is 'higher' or 'lower' than other types is
> something very basic, and I wouldn't want to change it just because it
> might be a bit easier for a few cases (well, just for this case).

The current implementation is, in my view, based on a wrong
assumption: transitivity of type ordering. You consider that if A < B
and B < C, then A < C. So you assigned constants and all was well.
Suddenly, the Java Object comes into play.

Type conversions should be handled by pairs, and not as if there was a
global order. Only for the case of the Java Object, String vs Java
Object should use String. I will explain below my thinking.
But first, my simple not-intrusive change is this:

in "Value.getHigherOrder(int t1, int t2)":
// Start
if (!SysProperties.SERIALIZE_JAVA_OBJECT && (t1 == Value.STRING && t2
== Value.JAVA_OBJECT || t2 == Value.STRING && t1 ==
Value.JAVA_OBJECT)) {
return Value.STRING;
}
// End
int o1 = getOrder(t1);
int o2 = getOrder(t2);
return o1 > o2 ? t1 : t2;


> Another reason why the database tries to convert 'SomeText' to a java
> object is so the database can use an index if there is an index for
> the given column. So basically the database, to allow using an index,
> needs to use:
>
> where columnName = cast('SomeText' as <columnType>)

Again, low-level implementation focus... So according to you, users
are going to do:
SELECT * FROM SomeTable WHERE MyObject='<insert bytes here as a string>'

Yeah, right :)


> That would mean you can't use an index on that column for such
> queries. For you that might be acceptable, but I think it's not.

If a String is to be used for comparison, so be it. Or improve the
index on Object mechanism to also index the String comparison. But you
cannot say that "users can only use byte representation because that
way they can benefit of the index".


Let's move on to why I think this String approach is desirable. Let's
use a hypothetical example so we understand better each other.

Suppose we have this class (pseudo code):
City {
long id;
String name;
toString() { return name; }
compareTo(c) { compare names, then ids}
}

A column is using that City class. Users can join, and so on, which
allows to find the relation of cities from different tables. If a user
selects it, 2 possibilities:
- the tool displaying the result has a high level understanding of
that object and has a custom display (even contextual navigation,
etc.).
- the tool displays the string representation. Note that they know it
is more than the string displayed because of the type of the column
and eventually other factors (documentation, conventions on naming,
etc).

Now, if a VARCHAR were used, then "Paris" could appear several times,
and joins would not work.
When the user sees "Paris", and wants to know more about it, then he
could select cities = 'Paris', and join on it.

The assumption you made that the toString method returns a unique
String that can be mapped to the object is wrong. A String
representation of an object is not necessarily unique. If it were,
then VARCHAR would be used. And having a unique String representation
like "Paris [7834671]" would make it useless in the tools.

You said that Strings should try to convert to the Java Object, but if
the implementor created the Java Object, they probably also exposed a
function to create a City instance: TO_CITY(long, String).
Index would work in that case, but this is for advanced usage because
in most cases the String representation is enough for a user to drill
down or understand the object instances they are looking at. They also
don't have to figure out all the other fields to know to create the
city object.


I hope you will understand that having a string representation makes
it user friendly to view results of a SELECT, and that users expect to
be able to filter using that information.


Hope this helps,
-Christopher

Sergi Vladykin

unread,
Jul 15, 2012, 2:49:09 PM7/15/12
to h2-da...@googlegroups.com
Hi,

As I wrote you can write your query as
select * from data where cast(object as varchar) = 'bla'
and it is correct because you clearly understand here that index on field "object" will never be used because your actual objects were indexed but not results of the cast. I think that introduction of special case you suggest will make database behavior less predictable for user. Also at some point conversion rules may change so that H2 will not try to parse hexadecimal string but just return String object (and for other types such as int, bigint, etc.. the same behavior, e.g. return Integer, Long) because I think current behavior with hexadecimal is meaningless in almost all cases. Then your logic will be inconsistent because if one stores in the same column String '123' and Long 123 and tries to find only String '123' query will return Long as well while if one will try to find Long 123 query will return only Long. Good software design is about having as little special cases as possible. And all you trying to win here is just few keystrokes.
I'm against such a change.

Sergi

Christopher Deckers

unread,
Jul 15, 2012, 5:12:53 PM7/15/12
to h2-da...@googlegroups.com
Hi Sergi,

> As I wrote you can write your query as
> select * from data where cast(object as varchar) = 'bla'

Yes I know, but what I am looking for is a compelling argument. My logic is:
- String to Object is inherently not possible.
- Object to String is possible.
- Why does H2 try to convert String to Object and not vice versa?

> Also at some point
> conversion rules may change so that H2 will not try to parse hexadecimal
> string but just return String object (and for other types such as int,
> bigint, etc.. the same behavior, e.g. return Integer, Long) because I think
> current behavior with hexadecimal is meaningless in almost all cases.

Yes, I fully agree: I don't see where that would mean anything. But
returning the object directly does not tell how to adapt types when
types are not compatible (see below).

> Then
> your logic will be inconsistent because if one stores in the same column
> String '123' and Long 123 and tries to find only String '123' query will
> return Long as well while if one will try to find Long 123 query will return
> only Long.

Ah, at last! A good argument :)

This could be a problem indeed, but I wonder if in practice people
would use simple Java types as mixed content in an OTHER column. In
any case, I think I now have a good case for suggesting the change of
behavior from our fork at work to use explicit cast.

Let's push the discussion further though.
This comment reminds me that I only store a particular type in a
column, no mixed types, and I found it was a limitation that I could
not say "this column is of Java Object Type X". In a way, I am really
looking for a Custom Type definition and Java Object seems like the
begining of such functionality. If a Class could be attached to the
OTHER column type, then we could specify "Object.class" for the
current behavior, or "City.class", etc.

Maybe such definition could go as far as being able to define a
TypeConverter. Not only could we compare Strings to Objects, but we
could potentially compare numbers, etc.


> Good software design is about having as little special cases as
> possible.

Highest order between String and Object is subjective, saying it is a
special case is arguable. In fact, Java Object is a special case, most
database engines do not have them.


> And all you trying to win here is just few keystrokes.

Assume SomeID is a BIGINT.

SELECT * FROM Data WHERE SomeID = '547839'
vs
SELECT * FROM Data WHERE SomeID = CAST('547839' AS BIGINT)
SELECT * FROM Data WHERE CAST(SomeID AS VARCHAR) = '547839'
SELECT * FROM Data WHERE SomeID = CAST('547839' AS VARCHAR)

I think we saved a few keystrokes with the first form. Normal users do
not necessarily understand this level of details. And what they can do
with numbers, they may expect to do with Objects.


So, to sum up, we are down to 2 issues here:

1/ Current message does not tell anything meaningful to normal users.
"Hexadecimal string contains non-hex character" should read "Automatic
conversion to String is not supported, explicitely cast using
CAST(field AS VARCHAR)".

2/ Whether there should be a mechanism to convert types.

I don't really care about 2/ for my cases, but maybe 1/ could be improved.

Cheers,
-Christopher

Rami Ojares

unread,
Jul 15, 2012, 5:51:29 PM7/15/12
to h2-da...@googlegroups.com

> Let's push the discussion further though.
> This comment reminds me that I only store a particular type in a
> column, no mixed types, and I found it was a limitation that I could
> not say "this column is of Java Object Type X". In a way, I am really
> looking for a Custom Type definition and Java Object seems like the
> begining of such functionality. If a Class could be attached to the
> OTHER column type, then we could specify "Object.class" for the
> current behavior, or "City.class", etc.

See:
http://www.h2database.com/html/grammar.html?highlight=domain&search=domain#create_domain
Domain is the sql term for type
That's the concept around which custom types and inheritance should be
built.
Currently the implementation is minimal.
But that's the place where it should be extended if one so desires.
Hacking some few features into one part of the database (namely the
Object/OTHER type) would make H2 more difficult to master.
When defining a type one would need to provide to DBMS the following
- Set of values the type can have (that's essentially how type is defined)
- representation(s) of the type. That is how those values are
represented as literals
- storage of the type. That is how the db should store the value
"physically".

If the DBMS already supports the basic types like "list of characters",
"integers", "list of bytes", boolean etc.
Then all other types can be derived (inherited) from those basic types
using constraints on the set of values of the parent type.

Further some basic functions for a type should be defined like equality,
often ordering related types like < and >.
And finally casting functions that define to which types can this type
be converted.

Autocasting is typical for scripting languages and personally I don't
think it fits well within dbms because it can be the source of hard to
find problems in complex scenarios.

> Assume SomeID is a BIGINT.
>
> SELECT * FROM Data WHERE SomeID = '547839'
> vs
> SELECT * FROM Data WHERE SomeID = CAST('547839' AS BIGINT)
> SELECT * FROM Data WHERE CAST(SomeID AS VARCHAR) = '547839'
> SELECT * FROM Data WHERE SomeID = CAST('547839' AS VARCHAR)
>
> I think we saved a few keystrokes with the first form. Normal users do
> not necessarily understand this level of details. And what they can do
> with numbers, they may expect to do with Objects.

Rather than make the dbms undocumented and unpredictable (trying to
guess what the user really wants)
I would rather order the novice database user to learn and understand
the concept of types.


> 1/ Current message does not tell anything meaningful to normal users.
> "Hexadecimal string contains non-hex character" should read "Automatic
> conversion to String is not supported, explicitely cast using
> CAST(field AS VARCHAR)".

+1
Good documentation/error messages never hurt

- Rami Ojares

Christopher Deckers

unread,
Jul 16, 2012, 2:36:18 AM7/16/12
to h2-da...@googlegroups.com
Hi Remi,

> Domain is the sql term for type
> That's the concept around which custom types and inheritance should be
> built.

Thanks, declaring my own type is an improvement already:
CREATE DOMAIN MYOBJECT AS OTHER

> If the DBMS already supports the basic types like "list of characters",
> "integers", "list of bytes", boolean etc.
> Then all other types can be derived (inherited) from those basic types using
> constraints on the set of values of the parent type.

But there is no composition (struct) so OTHER to the rescue.


By-the-way, my first attempt failed.
I have "DATABASE_TO_UPPER" set to "FALSE", so I naturally thought I could do:
CREATE DOMAIN MyObject AS OTHER
but then this statement failed:
CREATE TABLE "MyTable" (AnObject MyObject)

I also tried without success:
CREATE DOMAIN "MyObject" AS OTHER
CREATE TABLE "MyTable" (AnObject "MyObject")

I understand that case sensitivity is special, and for the type
declaration this is mostly internal so I can use an all-caps form. I
just wanted to let you know my findings.

Cheers,
-Christopher

Rami Ojares

unread,
Jul 16, 2012, 6:48:37 AM7/16/12
to h2-da...@googlegroups.com

> By-the-way, my first attempt failed.
> I have "DATABASE_TO_UPPER" set to "FALSE", so I naturally thought I could do:
> CREATE DOMAIN MyObject AS OTHER
> but then this statement failed:
> CREATE TABLE "MyTable" (AnObject MyObject)

Seems like a bug ...
Anyway this is what the documentation has to say about DATABASE_TO_UPPER

"Setting this to 'false' is experimental."

My advice: Be more insensitive, I mean case-insensitive.

- Rami

Christopher Deckers

unread,
Jul 16, 2012, 7:09:47 AM7/16/12
to h2-da...@googlegroups.com
Hi Rami,

> "Setting this to 'false' is experimental."

I know, but it is this very setting that allows us to use H2.

> My advice: Be more insensitive, I mean case-insensitive.

Yeah, we do what we can.

Do not underestimate case sensitivity though, because it allows
interesting integration scenarios.

We extended our data export tooling (which is very rich, has a lot of
business logic and is fairly easy to use) to support output to
embedded H2 for SQL-based tools consumption. The data export model is
case sensitive. This integration required the use of Java Object for
some specific data with business logic, because there is a 1-1
relation between that model and H2.
We also have intermediate steps that use H2 for internal consumption
(SQL query to extract and re-feed our tool).

So far, experimental or not, we are very pleased with H2!

Cheers,
-Christopher

Thomas Mueller

unread,
Jul 20, 2012, 11:58:10 AM7/20/12
to h2-da...@googlegroups.com
Hi,

DATABASE_TO_UPPER: it would be really nice if the type declaration /
usage would also work as expected. At some point (but not in the near
future, maybe H2 version 2.0) I would like to make DATABASE_TO_UPPER
the default. Patches are welcome of course!

Regards,
Thomas
> --
> You received this message because you are subscribed to the Google Groups "H2 Database" group.
Reply all
Reply to author
Forward
0 new messages