Standalone SQL parser based on the H2 Parser code

Mike Goodwin

unread,

Nov 11, 2008, 7:23:15 PM11/11/08

to h2-da...@googlegroups.com

Hi Thomas,

I have converted org.h2.command.Parser so that it returns sql abstract syntax objects - i.e. cut-down versions of many of the objects under
   org.h2.value
   org.h2.expression
   org.h2.command.dml

I have done this as there does not appear to be any suitable open source option for this. I spent quite a lot of time looking and decided that the h2 code was the best starting point.

http://stackoverflow.com/questions/141499/any-java-libraries-out-there-that-validate-sql-syntax

Currently only select queries are supported, but adding more support is not difficult (copy-paste and modify code from the original h2 parser).

The question is, would you be interested in hosting the code in the h2 project itself? Or if you prefer I will find another home for it on googlecode/sourceforge or similar.

thanks!

Mike

Thomas Mueller

unread,

Nov 14, 2008, 11:33:01 AM11/14/08

to h2-da...@googlegroups.com

Hi,

One problem is that the database verifies at some point if the tables,
columns, functions and so on actually exist. I guess you don't need
this?

> would you be interested in hosting the code in the h2
> project itself? Or if you prefer I will find another home for it on
> googlecode/sourceforge or similar.

I don't really know... It could be a separate project, or a sub-module
in H2 (for example in the 'tools' section).

I'm wondering if there is a better solution than an 'hard-coded'
abstract syntax tree. H2 has a BNF parser and uses it for the
auto-complete feature. Maybe this would make more sense? That way you
can easily change the syntax (just change one file, help.csv
currently) without having to re-compile. But you wouldn't have regular
Java classes any longer.

Regards,
Thomas

Mike Goodwin

unread,

Nov 15, 2008, 8:37:24 AM11/15/08

to h2-da...@googlegroups.com

Hi,

> One problem is that the database verifies at some point if the tables,
> columns, functions and so on actually exist. I guess you don't need
> this?

Actually it takes a database interface, as an input, which in turn
produces schema interfaces, schemas produce tables. Schemas and
databases are simple interfaces. So it is up to the utiliser if they
wish to be bothered about the existence of tables/columns (default
implementations exist that are not bothered).

> I don't really know... It could be a separate project, or a sub-module
> in H2 (for example in the 'tools' section).

Its not a big deal, so feel free to say no, or perhaps there is just a
dusty corner where you can put it. I understand that its added
complication. I just know that every now and then people want to do
this kind of thing. Although often the requirements can sound a bit
dubious - looking around people wanting to parse sql have mentioned
security checking. There are other cases such as people wanting to do
SQL migration projects and so on. In anycase doing it is useful to me
in its own right, it is only a couple of days work.

> I'm wondering if there is a better solution than an 'hard-coded'
> abstract syntax tree.

I'd say 'hard coded' objects are a feature and a convenience, and
ultimately you will want to represent the sql in the most handy way
possible (and even several ways through polymorphism).

>H2 has a BNF parser and uses it for the
> auto-complete feature.

Yeah, I had a go with both bits. The bnf and the parser. The parser
just seemed like the least effort and also like the most flexible
solution. If you want to write something that understands as many
dialects as possible I would say grammars are not ideal (I may be
wrong as I have not worked with them a lot, but observations about
what actually is in use seem to confirm this POV).

thanks,

Mike

jim schmidt

unread,

Nov 16, 2008, 12:14:44 PM11/16/08

to h2-da...@googlegroups.com

An immediate application that comes to mind would be to enhance the Hibernate reverse engineering of views for the .hbm.xml files (External meta data for Object Relational Mapping). The foreign key relationship for views are not available from the jdbc Database Metadata . Obviously in many cases there is no such relationship as in aggregation in which no primary key is preserved, but there are a great many cases where this would be useful in order to facilitate the reverse engineering of large schemas. I am working on Aerospace ERP model and it has 450+ views that could be enhanced in this fashion. If anyone is interested in colloborating on an open source hibernate reverse engineering enhancment based on the parsing of the sql, drop me a line. Getting the view text from Oracle is trivial. I love using H2 embedded mode for unit tests. Thanks Thomas for a wonderful product.

Mike Goodwin

unread,

Nov 16, 2008, 6:43:11 PM11/16/08

to h2-da...@googlegroups.com

Hi Jim,

I am not working with hibernate so I won't be much help with that, but
I am all for completing/improving the parser bit of it. The source
archive is only about 50kb, I won't email it to the list. Would you
like me to email it to you perhaps? It will include a couple of simple
examples?

thanks,

Mike

Thomas Mueller

unread,

Nov 18, 2008, 4:02:47 PM11/18/08

to h2-da...@googlegroups.com

Hi,

Just email it to me, I will have a look. Maybe I can add it to src/tools.

Regards,
Thomas

Thomas Mueller

unread,

Nov 24, 2008, 1:16:49 PM11/24/08

to h2-da...@googlegroups.com

Hi,

Thanks for the code! It looks good, but I think I will not add it to
H2 currently.

If the parser should be compatible with the regular H2 parser,
somebody would have to maintain it. I'm afraid I will not have time to
do that. On the other hand, maybe the parser shouldn't be fully
compatible with H2. The code you sent only supports DML (data
manipulation), and doesn't support DDL (data definition) statements.

Maybe you want to create a (new) open source project for this parser?

Another option is to change H2 to allow the 'agnostic' or 'tolerant'
behavior, that is to automatically create the required database
objects when required. Up to some point you can already do that, by
extending the schema class, and some people actually use this feature
already (in a commercial product). But there is no nice API yet. Then
a new utility class is required that changes the current schema, and
'hides' the existing schemas. Another option is to use interfaces
instead of classes, to make the parser implementation independent. But
it would make development a bit more tricky, so I like to avoid doing
that unless many people need it.

Regards,
Thomas

Mike Goodwin

unread,

Nov 24, 2008, 9:35:10 PM11/24/08

to h2-da...@googlegroups.com

> If the parser should be compatible with the regular H2 parser,
> somebody would have to maintain it. I'm afraid I will not have time to
> do that. On the other hand, maybe the parser shouldn't be fully
> compatible with H2.

May as well be fully compatible with at least one database! Though I'm
confused, h2 supports different modes, for different dialects of sql.
How complete are these and how complete do they aim to be?

> The code you sent only supports DML (data
> manipulation), and doesn't support DDL (data definition) statements.

The idea would be to add more as it is needed. For my current purposes
that would mostly just be DML and maybe not 100% of that.

> Maybe you want to create a (new) open source project for this parser?

Right, you are probably the best person to make that call. Though it
may end up with a very dry/derivative name, such as h2_based_parser,
its current provisional name. Also it would just be to put it out
there. Is that responsible? I don't know. Its simple enough that it
should be a good starting point for any developper, if its not yet
complete enough.

> Another option is to use interfaces
> instead of classes, to make the parser implementation independent. But

I think this would be the best engineered option, but it may go
against the principal of keeping h2 small/fast (unless you are up for
using a bytecode shrinker in which case it theoretically could mean no
increase - if the shrinker can subsume the interfaces).

> it would make development a bit more tricky, so I like to avoid doing
> that unless many people need it.

Personally I would have thought the abstraction would make things
simpler, easier to test in isolation ... etc. So aside from the
initial refactoring I would disagree. In anycase its not all that big
a deal. I'm guessing you won't but do say if you plan to go that
direction.

- mike

Reply all

Reply to author

Forward