SQL Parser:Antlr4 can't resolve ‘.*’ ,but can resolvle '. *' ? what maybe the problem?

47 views
Skip to first unread message

xu Bruce

unread,
Aug 2, 2015, 8:32:17 AM8/2/15
to antlr-discussion
 
 I want to create a SQL parser ,the problem is below:

(the full grammar can be found here:  https://github.com/ihainan/GBase-8a-MPP-Cluster-SQL-Parser/blob/master/src/main/java/cn/edu/bit/linc/sqlparser/antlr/uniformSQL.g4)

this is the simplified grammar :
grammar uniformSQL;

SELECT: 'select';
FROM: 'from';
DOT    : '.' ;
ASTERISK: '*' ;

ID:
     ( 'A'..'Z' | 'a'..'z' | '_' | '$' | '0'..'9' )+
;

schema_name            : any_name;
table_name            : any_name ;
any_name

 : ID
 | keyword
 ;

data_manipulation_statements:
      select_statement
;

select_statement:
        select_expression (  UNION (ALL | DISTINCT)?  select_expression )*
;

select_expression:
    SELECT
    ( ALL | DISTINCT  )?
    select_list
    (
        FROM table_references
        ( where_clause )?
        ( groupby_clause )?
        ( having_clause )?
    ) ?
    ( orderby_clause )?
    ( limit_clause )?
;

select_list:
      displayed_column ( COMMA displayed_column )*
       // |   ASTERISK  
;

table_spec:
    ( schema_name DOT )? table_name
;

displayed_column :
      ASTERISK
    | table_spec DOT ASTERISK     
    |
    ( column_spec (alias)? )
    |
    ( bit_expr (alias)? )
;

table_spec:
    ( schema_name DOT )? table_name
;

when input : select a.* from b

the output is :line 1:8 extraneous input '.*' expecting {<EOF>, ORDER, FROM, UNION, LIMIT, ';',}   the output treemap is followed:



And when input : select a. * from b    (the SQL has a BLANK between DOT and ASTERISK)
the SQL can be read correctly.  The treemap is as follows:



So why can't  the ANTLR resolve  " a.* " ?
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Terence Parr

unread,
Aug 2, 2015, 12:22:20 PM8/2/15
to antlr-di...@googlegroups.com
It shows “.*” as one token i think so likely your real grammar should split it into ‘.’ ‘*’.
T
<Auto Generated Inline Image 1.png>



And when input : select a. * from b    (the SQL has a BLANK between DOT and ASTERISK)
the SQL can be read correctly.  The treemap is as follows:

<Auto Generated Inline Image 2.png>


So why can't  the ANTLR resolve  " a.* " ?

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Auto Generated Inline Image 1.png><Auto Generated Inline Image 2.png>

Kevin Cummings

unread,
Aug 2, 2015, 2:30:00 PM8/2/15
to antlr-di...@googlegroups.com
On 08/02/15 08:32, xu Bruce wrote:
>
> I want to create a SQL parser ,the problem is below:
>
> (the full grammar can be found here:
> https://github.com/ihainan/GBase-8a-MPP-Cluster-SQL-Parser/blob/master/src/main/java/cn/edu/bit/linc/sqlparser/antlr/uniformSQL.g4)

Looking at your full grammar file, line 347 is:

ALL_FIELDS : '.*' ;

This token obviously conflicts with (and takes precedent)
over DOT ASTERISK when there is no intervening whitespace between them.
Perhaps you should replace references to:
DOT ASTERISK
by
(DOT ASTERISK | ALL FIELDS)
> *when input : select a.* from b*
> the output is :line 1:8 extraneous input '.*' expecting {<EOF>, ORDER,
> FROM, UNION, LIMIT, ';',} the output treemap is followed:
>
>
>
> *And when input : select a. * from b (the SQL has a BLANK between DOT
> and ASTERISK)*
> the SQL can be read correctly. The treemap is as follows:
>
>
>
> *So why can't the ANTLR resolve " a.* " ?*
>
> --
> You received this message because you are subscribed to the Google
> Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to antlr-discussi...@googlegroups.com
> <mailto:antlr-discussi...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Kevin J. Cummings
kjc...@verizon.net
cumm...@kjchome.homeip.net
cumm...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://www.linuxcounter.net/)

Jim Idle

unread,
Aug 2, 2015, 9:17:18 PM8/2/15
to antlr-di...@googlegroups.com
You have a rule:

ALL_FIELDS : '.*' ;

You cannot have that and the individual rules for those tokens. Also, that grammar is going to be very inefficient I think with all those case insensitive rules. A better solution is to write the rules in UPPERcase (or lower if you prefer), then create your own input stream that just overrides LA() and returns the uppercase version of the character. Your tokens will then span and return the text in the stream unaltered, so that you know what the user actually typed, but you will only need to match upper case.

Jim


--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

xu Bruce

unread,
Aug 2, 2015, 9:33:54 PM8/2/15
to antlr-discussion
  Kevin, Thanks very much, your suggestion helps a lot.

在 2015年8月3日星期一 UTC+8上午2:30:00,Kevin Cummings写道:

xu Bruce

unread,
Aug 2, 2015, 9:40:54 PM8/2/15
to antlr-discussion
 Thanks for your suggention Jim.

在 2015年8月3日星期一 UTC+8上午9:17:18,Jim Idle写道:
Reply all
Reply to author
Forward
0 new messages