Antlr3 MySQL Parser can not parse some sql statements

166 views
Skip to first unread message

Wang Jing

unread,
Jan 20, 2015, 7:12:06 AM1/20/15
to antlr-di...@googlegroups.com
Hi everyone!
    I am planning to build a simple sql parser to analyze mysql statements. I use antlr3 to build this parser. here is a piece of my antlr3 file.

    this file works fine for some sql statement, but others will cause a exception.

    for example, "set @@global.sysvar = 123" and "set @@session.sysvar = 123" works fine.

    but "set global sysvar = 123" and "set session sysvar" will cause a exception : Recognition exception MismatchedTokenException(0!=0) and Error node 3 (,1:4], resync=set global sysvarid = 123>/0)




grammar zhihu;

options 
{
    language=Java;
    output=AST;
    //ASTLabelType=CommonTree; 
    backtrack=true;
}

fragment A_ :    'a' | 'A';
fragment B_ :    'b' | 'B';
fragment C_ :    'c' | 'C';
fragment D_ :    'd' | 'D';
fragment E_ :    'e' | 'E';
fragment F_ :    'f' | 'F';
fragment G_ :    'g' | 'G';
fragment H_ :    'h' | 'H';
fragment I_ :    'i' | 'I';
fragment J_ :    'j' | 'J';
fragment K_ :    'k' | 'K';
fragment L_ :    'l' | 'L';
fragment M_ :    'm' | 'M';
fragment N_ :    'n' | 'N';
fragment O_ :    'o' | 'O';
fragment P_ :    'p' | 'P';
fragment Q_ :    'q' | 'Q';
fragment R_ :    'r' | 'R';
fragment S_ :    's' | 'S';
fragment T_ :    't' | 'T';
fragment U_ :    'u' | 'U';
fragment V_ :    'v' | 'V';
fragment W_ :    'w' | 'W';
fragment X_ :    'x' | 'X';
fragment Y_ :    'y' | 'Y';
fragment Z_ :    'z' | 'Z';

GLOBAL            : G_ L_ O_ B_ A_ L_  ;
SESSION            : S_ E_ S_ S_ I_ O_ N_  ;
SET : S_ E_ T_ ;

//SYSDATE                : S_ Y_ S_ D_ A_ T_ E_  ;
//SYSTEM_USER            : S_ Y_ S_ T_ E_ M_ '_' U_ S_ E_ R_  ;

EQ : '=';
SET_VAR : ':=' ;

COMMA : ',' ;
DOT : '.' ;

INTEGER_NUM: ('0'..'9')+ ;

ID:    
    ( 'A'..'Z' | 'a'..'z' | '_' | '$') ( 'A'..'Z' | 'a'..'z' | '_' | '$' | '0'..'9' )*
;

SYS_VAR 
:  GLOBAL ID
|  SESSION ID
| ('@@' |  ('@@' GLOBAL DOT) | ('@@' SESSION DOT)) ID 
;

WHITE_SPACE    : ( ' '|'\r'|'\t'|'\n' ) {$channel=HIDDEN;} ;

set_sysvar_statement:
    SET SYS_VAR (SET_VAR | EQ) INTEGER_NUM (COMMA SYS_VAR (SET_VAR | EQ) INTEGER_NUM)*
    ;



zhihu.g

Mike Lischke

unread,
Jan 20, 2015, 7:31:46 AM1/20/15
to antlr-di...@googlegroups.com
Wang,

the grammars I have seen for MySQL are all incomplete and/or, so I sat down and wrote a full MySQL grammar from scratch, which is used as part of MySQL Workbench (http://mysqlworkbench.org/). This is grammar is not only complete for the latest server but for all server versions since 4.0. You can even switch the feature set at runtime per version and per sql mode.

However, this grammar uses some actions and data structures that require the C target, but it should be easy to adjust that for other targets.

To get the grammar download MySQL Workbench source code and look in the library/mysql.parser/grammar folder. There are 2 grammars, MySQL.g and MySQLSimpleParser.g. The second one is a copy of the first one but without tree rewriting (for quick syntax checks) and shares the lexer with the main grammar.

Of course, nothing is bug free, so if you find any statement that doesn't parse well, please let me know. Btw. there's an update comming up soon with a few fixes (like for subselect parsing).

Am 20.01.2015 um 13:12 schrieb Wang Jing <wang...@gmail.com>:

Hi everyone!
    I am planning to build a simple sql parser to analyze mysql statements. I use antlr3 to build this parser. here is a piece of my antlr3 file.

    this file works fine for some sql statement, but others will cause a exception.

    for example, "set @@global.sysvar = 123" and "set @@session.sysvar = 123" works fine.

    but "set global sysvar = 123" and "set session sysvar" will cause a exception : Recognition exception MismatchedTokenException(0!=0) and Error node 3 (<13="">,1:4], resync=set global sysvarid = 123>/0)

All these statements work fine with my grammar. Just checked.


Wang Jing

unread,
Jan 20, 2015, 9:52:56 PM1/20/15
to
Ohhhh ~ Thanks very much!!!

That is just I need !!! Great Job!!

在 2015年1月20日星期二 UTC+8下午8:31:46,Mike Lischke写道:

Jim Idle

unread,
Jan 20, 2015, 11:36:31 PM1/20/15
to antlr-di...@googlegroups.com
Mike,

We should talk to Monty about this perhaps. At one point I was talking to him about re-doing the parser in ANTLR/C to squeeze out a little more performance, but we never pursued it. However I think it would take a C runtime lite to pare it down to the bare bones and at this point it might be better doing it with v4 and C++. Anyway, just a thought as you have put so much in the workbench and your work might be usable for more than that :)

Jim


On Wed, Jan 21, 2015 at 10:52 AM, Wang Jing <wang...@gmail.com> wrote:
Ohhhh ~ Thanks very much!!!

That is just all my need !!! Great Job!!


在 2015年1月20日星期二 UTC+8下午8:31:46,Mike Lischke写道:
Wang,

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Lischke

unread,
Jan 21, 2015, 3:31:51 AM1/21/15
to antlr-di...@googlegroups.com
Hey Jim,

We should talk to Monty about this perhaps. At one point I was talking to him about re-doing the parser in ANTLR/C to squeeze out a little more performance, but we never pursued it. However I think it would take a C runtime lite to pare it down to the bare bones and at this point it might be better doing it with v4 and C++. Anyway, just a thought as you have put so much in the workbench and your work might be usable for more than that :)

Good thought, but I know from discussions with the MySQL server team that people hesitate very much to do such a big switch. As you know, there is no formal method to test if the new parser behaves exactly like the old one. Even with many unit tests you can not be 100% confident. But I can at least offer that option to Monty...



Wang Jing

unread,
Jan 21, 2015, 5:23:25 AM1/21/15
to antlr-di...@googlegroups.com
Hi Mike

Thanks for your MySQL.g and MySQLSimple.g files. 

I'm planning to build a MySQL proxy. The proxy need a mysql sql parser to recognize the name of databases and tables.

I tried to extract some related files from mysql-workbench-community-6.2.4 and successfully builded these file to a static libarary. 


But when I test this libaray with some sql statement, I got a corruption and coredump. Here is my test file : https://github.com/wangjild/antlr3-mysql/blob/master/main.c

I just use antlr3 less than one month,  and spend a hole day, nothing found. 

If you have some time, please help me ~ Thank you !




在 2015年1月20日星期二 UTC+8下午8:31:46,Mike Lischke写道:
Wang,

Mike Lischke

unread,
Jan 21, 2015, 5:39:14 AM1/21/15
to antlr-di...@googlegroups.com
Hi Wang,

I tried to extract some related files from mysql-workbench-community-6.2.4 and successfully builded these file to a static libarary. 

I see you have not only used the pure generated lexer + parsers but included most of the mysql.parser project.


But when I test this libaray with some sql statement, I got a corruption and coredump. Here is my test file : https://github.com/wangjild/antlr3-mysql/blob/master/main.c

Well, in your test app you are using the raw parser without setting it up properly. So, either use the C++ parser wrapper (in mysql-parser.h/cpp) or create a full setup. See mysql-parser.cpp how this has to be done. It's essentialy that you set a proper context struct reference, as the parser relies on that.

Additionally, as written in the comment at the start of the grammar file, you have to implement a handfull of support routines like one for checking charsets (for string repertoires). You look up all of that in mysql-parser.cpp.  In your case you compiled the C++ wrapper along with everything else in your project, which will link the support routines with those defined in the C++ wrapper, but you don't instantiate it.

So, in summary: only compile the raw parser (you only need one not both) + lexer and do the setup as done in mysql-parser.cpp (or described in the grammar file) yourself, but do not include any of the cpp files. Or use all the C++ classes exclusively.

Wang Jing

unread,
Jan 21, 2015, 7:38:45 AM1/21/15
to antlr-di...@googlegroups.com
I got it! sorry!

在 2015年1月21日星期三 UTC+8下午6:39:14,Mike Lischke写道:
Reply all
Reply to author
Forward
0 new messages