Storing of UTF-8 glyphs in H2

924 views
Skip to first unread message

Dieter Blerms

unread,
Nov 19, 2015, 10:42:43 PM11/19/15
to H2 Database
Hello,

atm I'm developing a Java-Application with an embedded H2 DB. The DB is migrated from mySQL (kinda dictionary). The SQL-dump has about 500.000 lines (5 tables). The mySQL DB has about 15 MB and the migrated H2 DB has about 45 MB.
After exporting the DB, all UTF8 glyphs are escaped with the \u syntax. So for each glyph i have now 6 chars to represent it. I also tried several other Java based DBs like HSQLDB - same result. After some investigations i found out, that Java
uses UTF16.

Is there a way to avoid the \u 6-char syntax for each non-ASCII glyph and store it instead in pure UTF-8?

Regards,
Dieter

Noel Grandin

unread,
Nov 20, 2015, 1:34:08 AM11/20/15
to h2-da...@googlegroups.com
H2 internally stores character data in UTF8.

Dieter Blerms

unread,
Nov 20, 2015, 8:08:26 AM11/20/15
to H2 Database
Am Freitag, 20. November 2015 07:34:08 UTC+1 schrieb Noel Grandin:
H2 internally stores character data in UTF8.

Thx for the quick answer.

Is there a way to get also a UTF8 db dump instead of replacements like 

STRINGDECODE('\u ...

?

Noel Grandin

unread,
Nov 20, 2015, 8:26:10 AM11/20/15
to h2-da...@googlegroups.com

Dieter Blerms

unread,
Nov 20, 2015, 9:36:43 AM11/20/15
to H2 Database

Am Freitag, 20. November 2015 14:26:10 UTC+1 schrieb Noel Grandin:
http://h2database.com/html/grammar.html#script

First of all: thank you Noel Grandin for your help and the quick answers.

I tried the following:

SCRIPT TO '~/backup_test3.sql' CHARSET 'UTF-8'

and this

SCRIPT TO '~/backup_test3.sql' CHARSET 'UTF8'

The output SQL is always the same: UTF chars escaped with \u syntax.

Is the syntax correct I used or what did I wrong?

Regards and thanks in advance
Dieter
 

Noel Grandin

unread,
Nov 20, 2015, 11:40:22 AM11/20/15
to H2 Database
Ah, my apologies that doesn't affect the escaping of the strings. But I don't really understand your problem. Why does the syntax in the dump file matter? It's just a temporary format before you load it into something else.
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Dieter Blerms

unread,
Nov 20, 2015, 11:59:21 AM11/20/15
to H2 Database
Well, the database size increases more an more. For e.g.:

- The dump size is about 33 MB (with escaped utf-8 chars). When i import this dump in a new, embedded H2 db, the DB-size is about 29 MB.
- When i close the db (via H2 web console), the DB-size is about 47 MB.

So what happens when closing the DB that the file size increases from 29 MB to 47 MB?

I will investigate the behaviour for a in-memory instance of this DB.

Regards,
Dieter

Dieter Blerms

unread,
Nov 20, 2015, 12:26:27 PM11/20/15
to H2 Database


Am Freitag, 20. November 2015 17:59:21 UTC+1 schrieb Dieter Blerms:

So what happens when closing the DB that the file size increases from 29 MB to 47 MB?

I think i found out what happens:
as the tables are cached tables, the index tables are stored in memory and after closing the db they will be stored (for persistence).

Is this right?

I was just wondering, why the DB size is so much bigger as for mySQL (there the size of the closed DB is 25 MB - dump size is about 15 MB).

The Java-App I'm writing shall be an offline version of a php-website. The (read-only) db access there is the mySQL database. The Java-App should be as small as possible and run on Linux, Windows and Mac. I wanted to keep the migrated H2-db in memory for faster db queries. Now i think it would be better to keep it as a embedded, file based db and just keep the query-results in memory.

Sorry for all this. I'm new to H2 and similar db-systems. 

regards
Dieter 

Thomas Mueller

unread,
Nov 20, 2015, 12:56:56 PM11/20/15
to h2-da...@googlegroups.com
Hi,

What problem do you want to solve?

The "\uxxxx" you should only see in the SQL script, not in the actual database file.

> The mySQL DB has about 15 MB and the migrated H2 DB has about 45 MB.

For H2, did you shrink the database using "shutdown defrag"?

Is there a way to get also a UTF8 db dump instead of replacements 

No, right now creating a SQL script in that way is not supported. However, you could use CSVWRITE, create your own export, or convert the SQL script. But why would you need it?

Regards,
Thomas


On Friday, November 20, 2015, Noel Grandin <noelg...@gmail.com> wrote:
H2 internally stores character data in UTF8.

Dieter Blerms

unread,
Nov 20, 2015, 5:10:53 PM11/20/15
to H2 Database


Am Freitag, 20. November 2015 18:56:56 UTC+1 schrieb Thomas Mueller:
Hi,

What problem do you want to solve?

The "\uxxxx" you should only see in the SQL script, not in the actual database file.

> The mySQL DB has about 15 MB and the migrated H2 DB has about 45 MB.

For H2, did you shrink the database using "shutdown defrag"?

This actually worked :-) 
DB size after SHUTDOWN DEFRAG is 20 MB.


Is there a way to get also a UTF8 db dump instead of replacements 

No, right now creating a SQL script in that way is not supported. However, you could use CSVWRITE, create your own export, or convert the SQL script. But why would you need it?

I can import a SQL-dump with UTF-8 chars in it (without \u syntax). When i export it again, the \u syntax is used. This makes the dump unreadable and raises the file size to more than 50%.
So I don't really need it without the \u syntax. But I also can't see any advantage in it. Have you planned to get this feature working for the future (SCRIPT TO '<filename>' CHARSET 'UTF8')?

Thx for your help :-) 
Reply all
Reply to author
Forward
0 new messages