In antlr nested function using java how to accept dot and special characters

Vikram A S

unread,

Dec 21, 2019, 9:00:24 AM12/21/19

to antlr-discussion

I'm having function with arguments grammer like below lexer and parser:

MyFunctionsLexer.g4

    lexer grammar MyFunctionsLexer;
    FUNCTION: 'FUNCTION';
    NAME: [A-Za-z0-9]+;
    DOT: '.';
    COMMA: ',';
    L_BRACKET: '(';
    R_BRACKET: ')';
    WS : [ \t\r\n]+ -> skip;

**MyFunctionsParser.g4**


    parser grammar MyFunctionsParser;
    options { tokenVocab=MyFunctionsLexer; }
    functions : function* EOF;
    function : function : FUNCTION '.' NAME '(' (argument (',' argument)*)? ')';
    argument: (NAME | function);

The above lexer and parser is working for me but how to accept functions having special characters like
.(dot)/(forwardslash)_(underscore)-(hiphen)*(asterick)\(backslash) other alphanumeric character. I added dot for including in the argument in lexer part like this NAME: [A-Za-z0-9.]+;` and `NAME: [A-Za-z0-9\\.]+; but it is giving error. So how can I add the special charcter.

Sample input and output:
Working input:


    1 INPUT:  FUNCTION.toString(String)
    2 INPUT: FUNCTION.getTimestamp()
    3 INPUT: FUNCTION.getSubstring(FUNCTION.toString("testtt"),0,1)

Not working input

    1 INPUT: FUNCTION.toString(input.test.csv)
    2 INPUT: FUNCTION.toString(input.test/.csv)
    3 INPUT: FUNCTION.toString(input-test_csv)
    4 INPUT: FUNCTION.concat(FUNCTION.redis(FUNCTION.toString("testtt")),"0",input-test_csv)

I am using Antlr4. How can I change the input for accepting these inputs which I will validate using visitor implementation?

Mike Cargal

unread,

Dec 21, 2019, 9:27:11 AM12/21/19

to antlr-discussion

You didn't show the error you received, but I suspect at least one problem is the, by adding dot to the pattern for name you've taken away the ability to recognizer your functions (which will always be followed by dots. Remember, that ANTLR will always try to find the longest matching token, so now FUNCTION.toString will match the NAME Lexer rule and consumes more of the input stream than the FUNCTION or DOT lexer rules, so it'll be a NAME token. There's a way to have ANTLR dump out the token stream when it tries to parse the input. You should probably look into that and get the all the tokenization right and then move on to the parse rules.

BTW... generally, names that allow for dots, don't allow them to begin with dots. If that's you intention, you'd need to rework the NAME rule to be something more like:

NAME: [A-Za-z0-9][A-Za-z0-9\.]+

(also, as I cut and pasted your rule, it doesn't look as though either were adding the dot to the rule. The first has an unescaped dot which matched any character, and the second looks like you attempted to escape the dot, but by using \\, you escaped the slash and left the dot unescaped.)

Definitely look at the token stream and get your tokenization right first.

som ghos

unread,

Dec 21, 2019, 10:03:47 AM12/21/19

to antlr-di...@googlegroups.com

Hi Any help where I can get Teradata Grammer Antlr file.

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/9f41bddb-9536-4045-abae-fe0fec72fb69%40googlegroups.com.

--

SOM
ANY TIME ANY PLACE

Vikram A S

unread,

Dec 22, 2019, 2:12:42 AM12/22/19

to antlr-discussion

Thank you for the response. The code which I have added in the question works if the argument is not having any dot or if it is a function. But now I want to modify according to the follwoing input.

Lexer:

lexer grammar MyFunctionsLexer;
FUNCTION: 'Functions';


NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;

Parser:

parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;

//function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';


function : FUNCTION '.' NAME '(' (argument (',' argument)*)? ')';
argument: (NAME | function );

I'm using Antlr4 and language is java. It will accept if the function is of the following format.

FUNCTION.getSubstring(FUNCTION.toString("testtt"),0,1)

But I want to check input like this:

FUNCTION.concat(FUNCTION.redis(FUNCTION.toString("testtt")),"0",input-test.csv)

But in the above function "0"

and input-test.csv input arguments are having "(double quote),0-9,dot(.),_(underscore) as argument. So, the lexer and parser which I am using will fail. So, can you please suggest the changes that I have to do to accept both nested function and these inputs which I have mentioned as an argument.

Below is the github link for my code.

https://github.com/VIKRAMAS/AntlrNestedFunctionParser.git

Mike Cargal

unread,

Dec 22, 2019, 8:37:47 AM12/22/19

to antlr-discussion

"testtt" and "0" look like they are intended to be strings (it would be quite the unusual grammar if they aren't)

Your Lexer doesn't define a String token. (You can take a look at just about any ANTLR grammar for an example of s String token definition.

Seriously use gRun with the -tokens option.

It also seems a bit like you're just trying to find your way bit by bit. I'd HIGHLY recommend getting the PragProg ANTLR books and going through it. It's not a tough read and will give you the foundation you need to build upon.

On Saturday, December 21, 2019 at 9:00:24 AM UTC-5, Vikram A S wrote:

Reply all

Reply to author

Forward