A simple parser using ANTLR

126 views
Skip to first unread message

Heet Sheth

unread,
Dec 22, 2015, 12:23:10 AM12/22/15
to antlr-discussion
I am at the beginning stage. I know what are the tools I can use, but don't know the specifics.
ANTLR4 - building parser and lexer files.
ANTLRWorks2 - to test the grammar

Now, how to make a simple grammar(both lexer and parser rules) for the following input?
Barack 123
(Text Numbers)

And how can I test the input file whether it's valid according to the parser or not.?
Do I need to write another java file to give input to parser and if valid stream then writing into another file, else descard?
Kindly explain the flow.

Mike Lischke

unread,
Dec 22, 2015, 5:04:17 AM12/22/15
to antlr-di...@googlegroups.com
Hi,

> I am at the beginning stage. I know what are the tools I can use, but don't know the specifics.
> ANTLR4 - building parser and lexer files.
> ANTLRWorks2 - to test the grammar
>
> Now, how to make a simple grammar(both lexer and parser rules) for the following input?
> Barack 123
> (Text Numbers)

Is this all you wanna parse or is the actual input more complex? If not then you don't need ANTLR. Simply split the line by whitespace and you are done.

Mike
--
www.soft-gems.net

Heet Sheth

unread,
Dec 22, 2015, 5:49:00 AM12/22/15
to antlr-discussion
Actual input string is a bit complex as shown below. (Actual language) 
Dec 17 14:00:00 103.56.229.11 firewall,info FFFW forward: in:<pppoe-mm.demo.649> out:sfp-sfpplus1.vlan113, proto TCP (ACK,PSH), 10.0.15.245:49831->103.235.46.39:443, NAT (10.0.15.245:49831->202.173.127.253:49831)->103.235.46.39:443, len 250

So, I guess I must use ANTLR to generate parser.
I want to do 2things.
1. To pass the string if the above format is found after parsing.
2. After passing, I need to print specific objects only as shown below in the particular file.
Expected output in a file: 
Dec , 17 , 14:00:00 , 103.56.229.11 , pppoe-mm.demo.649 , TCP , 10.0.15.245:49831 , 202.173.127.253:49831, 103.235.46.39:443
NOTE: So, we don't want to make any operation from above string, just want to take specific objects. -> We don't want hierarchical tree.

I am attaching my partial grammar file herewith (I'm working on it) , kindly have a look. Am I on the right track?
check.g4

Mike Cargal

unread,
Dec 23, 2015, 2:07:44 PM12/23/15
to ANTLR List
That looks like a log file.  My first comment would be to ask if the application producing that file has defined a formal grammar for that file (maybe not in ANTLR syntax, but a formal specification that commits that the output follows a prescribed format.  Most logging will have some consistency to the start of the log line (often configurable within the application or logging framework), and would possibly allow you to know that every log line begins with something like:

Dec 17 14:00:01 103.56.229.11

But the remainder of the log entry will be basically free form.  You might be able to take advantage of knowing that logs from of the same type follow a consistent pattern.

I’ve written a language using ANTLR, and sometimes have to do log file analysis.  For log file analysis, I always reach for something like Ruby and make heaven usage of regular expressions to look for the particular log entries I’m interested in.  It wouldn’t even come to mind to try and write an ANTLR grammar to pull a log file apart.  I’d do something like check for a regular expression at the start of a line to identify if it begins a log entry and then gather all the lines until the next line that matches that pattern (it’s not unusual, to see log entries that are more than one line long).  Then I can start applying regexes one by one to identify which type of log entry I’m looking at and pull out information I need based upon knowledge of consistency for that particular log.

If you really want to use ANTLR, you can probably work something out, but, if there’s no commitment on the part of the application producing the log file to follow a particular grammar, then I suspect you’ll be chasing exceptions for a very long time (or punt and put into a “gobble up everything until the next line (that starts with …)) type of rule, and then use it.

To me, the fact that you don’t need a parse tree result is probably a strong indication that ANTLR is overkill.

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<check.g4>

Heet Sheth

unread,
Dec 24, 2015, 8:59:44 AM12/24/15
to antlr-discussion
Yah, as @Mike mentioned, our log file consists of two types of data viz. success and failure. We are half way through. We have large database of log which we have to parse. We are now able to parse the correct input from the input file to the success.txt file. But the issue occurs when input is not satisfying our grammar. What are the ways by which we can resolve this issue? You can have a look at our project-code here
Information: We are working on Java. And not using any IDE, just a simple editor.

Heet Sheth

unread,
Dec 28, 2015, 8:04:43 AM12/28/15
to antlr-discussion

I completed my project using ANTLR4.


But as far as the whole project is concerned, I am supposed to remove the dependencies of .jar file. To the best of my knowledge, I cannot able to remove the dependencies of antlr4-jar file. So I'm planning to switch to the javaCC. Now that I'm having grammar and java files compatible with antlr4.


How can I switch it and what are the things that I must consider?



On Tuesday, December 22, 2015 at 10:53:10 AM UTC+5:30, Heet Sheth wrote:
Reply all
Reply to author
Forward
0 new messages