Re: BINARY sorting is incorrect

115 views
Skip to first unread message
Message has been deleted

Noel Grandin

unread,
Dec 11, 2012, 2:10:22 AM12/11/12
to h2-da...@googlegroups.com, Matthew Painter
Tricky.

In this case
   org.h2.value.ValueBytes#compareSecure
is calling
   org.h2.util.Utils#compareNotNull(byte[], byte[])
to do the comparison, and it treats the data as a sequence of signed bytes.

Since, for example, 0x99 == -103, it naturally sorts below 0x09.

You should really not be relying on any kind of sort order when it comes to BINARY columns, any sort order we could define over it would always be unsuitable to someone.

On 2012-12-10 15:14, Matthew Painter wrote:
Hi all,

Just wanted to check that this is a bug:

create table bin( x binary(1) );
insert into bin(x) values (x'09'),(x'0a'),(x'99'),(x'aa');
select * from bin order by x;

X  
99
aa
09
0a
(4 rows, 0 ms)

The sorting should of course be:

X  
09
0a
99
aa
(4 rows, 0 ms)

Am I right in thinking this is a bug? Or is there an obscure setting to correct his?


Thanks :)


Matt --
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/Kib7PA3uF4wJ.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Matthew Painter

unread,
Dec 11, 2012, 3:30:26 PM12/11/12
to h2-da...@googlegroups.com, Matthew Painter
This is contrary to the behaviour in all databases/column stores I've ever come across. As such I'd view it as a bug.

The logic to change is pretty small (see below). HBase, HSQLDB, SQL Server, and MySQL all implement actual binary sorting. 

With this feature, we can't use H2 with MySQL compatibility for testing, and we'll have to stick to HSQLDB.

I guess I could always write a patch :)

M

public static int unsignedByteToInt(byte b) {
return (int) b & FF;
}

public static int compare(final byte[] o1, int offset1, int length1, final byte[] o2, int offset2, int length2) {
final int len = Math.min(length1, length2);
for (int i = 0; i < len; i++) {
byte byte1 = o1[offset1];
byte byte2 = o2[offset2];
if (byte1 != byte2) {
return unsignedByteToInt(byte1) > unsignedByteToInt(byte2) ? 1 : -1;
}
offset1++;
offset2++;
}
final int d = length1 - length2;
return d == 0 ? 0 : (d < 0 ? -1 : 1);

Thomas Mueller

unread,
Dec 12, 2012, 12:11:12 AM12/12/12
to H2 Google Group
Hi,

I understand the current behavior is unexpected (it's also unexpected to me). Maybe we should change it, but then such a change wouldn't be backward compatible, so we would have to wait for the next major release (1.4 I guess).

Regards,
Thomas

Sergi Vladykin

unread,
Dec 12, 2012, 1:48:57 AM12/12/12
to h2-da...@googlegroups.com
Hi,

I guess we can create respective system property which will default to current behavior and remove it in 1.4.

Sergi

Matthew Painter

unread,
Dec 20, 2012, 2:32:37 PM12/20/12
to h2-da...@googlegroups.com
When's 1.4 due? Working out whether or not to fork for now...?

Matthew Painter

unread,
Mar 27, 2013, 2:38:13 PM3/27/13
to h2-da...@googlegroups.com
Hi all,

Did anything ever come of this? I would be super keen to use H2 if it only did correct binary sorting!

Thanks

Matt

Thomas Mueller

unread,
Mar 27, 2013, 2:52:10 PM3/27/13
to H2 Google Group
Hi,

I'm sorry, it will take a while until version 1.4 is out. If you want, you could add a system property to the code that allows you to switch the behavior for version 1.3 - patches are welcome!

Regards,
Thomas






--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.

To post to this group, send email to h2-da...@googlegroups.com.

Matthew Painter

unread,
Mar 27, 2013, 3:21:45 PM3/27/13
to h2-da...@googlegroups.com
Thanks for the update. It is on my list of things to do - just very low down right now!

Thanks again

matt

Noel Grandin

unread,
Mar 28, 2013, 4:27:56 AM3/28/13
to h2-da...@googlegroups.com, Matthew Painter
I have created a new command

SET BINARY_COLLATION UNSIGNED

which should do what you want.

Matthew Painter

unread,
Mar 28, 2013, 4:29:55 AM3/28/13
to Noel Grandin, h2-da...@googlegroups.com
Awesome! I will give it a go :)

Thanks so much!

Matt


--
Sent from Gmail Mobile
Reply all
Reply to author
Forward
0 new messages