How to calculate md5sum in a computed column?

1,326 views
Skip to first unread message

Christian MICHON

unread,
Apr 13, 2012, 3:51:52 PM4/13/12
to h2-da...@googlegroups.com
Hi,

I need to detect repetitions and updates in (relatively) small H2 tables.

As such, I currently implemented using ruby and an external ORM a md5sum calculation of all the keys concatenated into a string following their natural order, which is stored back into a specific key.

This involves a lot of communication with the H2 server, and I wish to have it done natively (using java itself).

I wish now to use:
1/ computed columns, so this can be done on the fly within H2, not with ruby/ORM.
2/ a function alias to java md5sum function (I've not found how to do that either yet)

Is this possible? Could someone jumpstart me on this issue? I've been performing some search on this group, without finding any clue.

1 important point to note: this table has no primary key, and I need to calculate the md5sum only once (at insertion inside this table).

Thanks in advance.
Christian

Christian MICHON

unread,
Apr 13, 2012, 6:30:15 PM4/13/12
to h2-da...@googlegroups.com
On Friday, April 13, 2012 9:51:52 PM UTC+2, Christian MICHON wrote:
Hi,

I need to detect repetitions and updates in (relatively) small H2 tables.

As such, I currently implemented using ruby and an external ORM a md5sum calculation of all the keys concatenated into a string following their natural order, which is stored back into a specific key.

This involves a lot of communication with the H2 server, and I wish to have it done natively (using java itself).
(...)


After many experiments, I finally cracked it.

Sharing it here, in case someone else finds this useful one day...

CREATE ALIAS MD5 FOR "org.apache.commons.codec.digest.DigestUtils.md5Hex(java.lang.String)";
CREATE TABLE TEST(ID IDENTITY, NAME VARCHAR(255), MD5 VARCHAR(32) AS MD5(NAME));
INSERT INTO TEST (NAME) VALUES('The quick brown fox jumps over the lazy dog');
SELECT * FROM TEST ORDER BY ID;

Really powerful tool this H2 db... Congratulations again...

Christian 

Thomas Mueller

unread,
Apr 19, 2012, 12:04:36 PM4/19/12
to h2-da...@googlegroups.com
Hi,

H2 supports SHA256 natively, I would use it instead of MD5: http://h2database.com/html/functions.html#hash 

Regards,
Thomas


--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/UV44utlLiAYJ.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Christian MICHON

unread,
Apr 19, 2012, 12:37:01 PM4/19/12
to h2-da...@googlegroups.com
Yes, I knew about it.

Yet, this gives me 2 problems:
- increase of 32 char to 64 char for storing the hash. Maybe I should store this in a different way, like binary to save half the space?
- I cannot figure out how to have the same results between hash() in H2 and sha256 from OpenSSL/ruby

As said, I use ruby and in ruby  OpenSSL::Digest::Digest.new("sha256") == "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Could you please show me how to get the same value using hash() in H2?

I tried call hash('SHA256',stringtoutf8('sha256'),1) without success...

Regards,
Christian
To unsubscribe from this group, send email to h2-database+unsubscribe@googlegroups.com.

Christian MICHON

unread,
Apr 19, 2012, 12:50:29 PM4/19/12
to h2-da...@googlegroups.com
Ah... got it: SHA256 is equivalent to ruby's sha2, and not sha256 from openssl (this was completely wrong).

In ruby:
>> require 'digest/sha2'
=> true
>> Digest::SHA2.hexdigest 'sha256'
=> "5d5b09f6dcb2d53a5fffc60c4ac0d55fabdf556069d6631545f42aa6e3500f2e"

In H2:
CALL HASH('SHA256', STRINGTOUTF8('sha256'), 1);
gives:
X'5d5b09f6dcb2d53a5fffc60c4ac0d55fabdf556069d6631545f42aa6e3500f2e'  
5d5b09f6dcb2d53a5fffc60c4ac0d55fabdf556069d6631545f42aa6e3500f2e

So we have a match.

My 1st question still interests me: how to efficiently store this sha256 in H2? Which would be a good data type? 

Regards,
Christian

Steve McLeod

unread,
Apr 20, 2012, 6:54:39 AM4/20/12
to h2-da...@googlegroups.com
Hi Christian,

Is disk space/database size really a concern for your application? If not, just storing in a varchar(64) makes your code simple. 

Christian MICHON

unread,
Apr 20, 2012, 9:56:08 AM4/20/12
to h2-da...@googlegroups.com
Hi Steve,

yes, after x millions of records, it would be nice to have the checksum taking minimal space.

I've some code making this happen in jruby (it was not easy to convert into byte array as I'm using prepared statements for initial insertions).

Comparing now performance and size.

Thanks

Christian

vrota...@gmail.com

unread,
Apr 21, 2012, 2:07:18 AM4/21/12
to h2-da...@googlegroups.com
On Fri, Apr 20, 2012 at 4:56 PM, Christian MICHON <christia...@gmail.com> wrote:
Hi Steve,

yes, after x millions of records, it would be nice to have the checksum taking minimal space.


H2 has a BINARY SQL type mapped to byte[]. Most probably, you can't go lower than that

--
   Vasile Rotaru

Thomas Mueller

unread,
Apr 24, 2012, 2:19:06 AM4/24/12
to h2-da...@googlegroups.com
Hi,

Yes, BINARY is the right data type for a hash / checksum.

Regards,
Thomas
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.

Christian MICHON

unread,
Apr 24, 2012, 3:37:19 AM4/24/12
to h2-da...@googlegroups.com
Hi,

As mentioned, I was already looking at byte arrays, therefore I was
implying I was already using BINARY(16) for md5sum.

It now works and gives decent runtime performance, while saving more
than 200MB on my 2GB+ H2 database.

Thanks.
Christian

Reply all
Reply to author
Forward
0 new messages