Custom GrammarMatcher Implementation Checking for Operatorz

29 views
Skip to first unread message

jeffd...@gmail.com

unread,
Dec 6, 2016, 2:12:22 PM12/6/16
to Jep Java Users
Hello,

My team is trying to implement a custom GrammarMatcher to pass to our Jep parser to determine if a stream of tokens is a valid expression in our product's macro language.

The first check for a valid expression we wanted to implement was simple: does the expression contain any operators (if there are no operators in the expression than in cannot be valid). If no operators were found, we wanted JEP to signal a grammar exception. If operators were found, we wanted JEP to move onto the other grammar checkers. We created a GrammarChecker implementing the match method as below. However, we ran into an issue that if the match method returned null when paramLookahead2Iterator.peekNext() was an operator token, the method would throw an exception. So it seems that whether or not the match method throws an exception on a null return value is dependent on the state of the Lookahead2Iterator. Is that correct? It was also unclear to us if we needed to return a valid node object using a NodeFactory and if we did, what type of Node should be returned (we saw in the implementation of some of the standard GrammarMatchers included in the jep jar that their match methods all returned a node of some type).

Is there any simple way we could create a matcher that could check to see if an operator is in the token stream without throwing a ParseException? We thought of making a copy of the Lookahead2Iterator parameter so that the initial iterator's state is not changed, however the Lookahead2Iterator did not have a constructor that made making a copy trivial. Thanks!

public Node match(Lookahead2Iterator<Token> paramLookahead2Iterator, GrammarParser paramGrammarParser) throws ParseException {
    boolean containsOperator = false;
    Token localToken = null;
    while (paramLookahead2Iterator.peekNext() != null) {
        localToken = paramLookahead2Iterator.peekNext();
        containsOperator = localToken.isOperator();
        if (containsOperator) {
            return null;
        }
        paramLookahead2Iterator.consume();
    }
    if (!containsOperator) {
        throw new GrammarException("No operator in expression.", localToken.getLineNumber(), localToken.getColumnNumber());
    }
    return null;
}

Richard Morris

unread,
Dec 7, 2016, 8:12:38 AM12/7/16
to Jep Java Users, jeffd...@gmail.com
My initial thought is that you possibly don't want to use a GrammerMatcher for this task.

The ConfigurableParser breaks parsing into three steps:
    tokenising - creating a set of tokens representing the input
    filtering  - mainly to remove comments and whitespace
    parsing - using grammatical rules to assemble the node tree

The default implementation bundles these together as a single methods: parse(java.io.Reader stream)but each stage can be performed separately. There are methods 
   public List<Token> scan(Reader stream) throws ParseException                   - does the tokenizing
   public Iterator<Token> filter(List<Token> input) throws ParseException          - does the filtering and produces an interator
   public Node parse(Iterator<Token> it) throws ParseException                         - assemples the node tree

If you want to implement your own syntax checker you could to this at the filtering stage, either by sub-classing or by calling the methods separately. There you have access the full List of Tokens and do not need to be constrained by just peeking the next one or two tokens.
 
Another thing you could try is to make sure Jep.setImplicitMul(boolean) is set to false.   This would mean its impossible to not have an operator between two numbers/variables. So "5 x" would raise an exception. This means apart from a single number or variable its impossible for  an input not to contain an operator.

The general contract of a GrammarMatchers is: can the next few tokens be assembled to form the type of node I am trying to construct. If so return the Node if not return null. On rare occasions should they throw an exception when the input is known to be wrong say when a round bracket matches a square bracket. They are not really intended for checking the whole input.

Often things are much easier to check once the input has been parsed and the node tree assembled. 

A more complex route might be to change the type of iterator used. Only having a two token lookahead is quite restrictive but serves the purpose for most maths expressions. It should be possible to create an itterator which any particular number of steps of lookahead, and one which could rewind if parsing a complex subexpression failed. To do this it would involve changing the way the ShuntingYard class is called. There is method ShuntingYard.html.setIterator(Lookahead2Iterator) which sets the iterator used. To use this you would have to create your own subclass of Lookahead2Iterator implement its all its methods like next(), nextnext() and consume() and add your own methods maybe something like Token lookahead(int n). 

To use the ShuntingYard with this you would need to do subclass ConfigurableParser and change the parse method to something like 

    public Node parse(Iterator<Token> it) throws ParseException {
    GrammarParser sy = gpf.newInstance(this); 
        MyItterator myitt = new MyItterator(it);
        sy.setIterator(myitt);
        Node node = sy.parseSubExpression();
        if( myitt.peakNext() != null) throw new ParseException();
        if( node == null ) throw new ParseException();
        return node;   
     }

I might make it easier to change the iterator in the next release which will be coming soon.

Hope thats of some help. It might help to give an example of the kind of input you are trying to parse to really get a feel for the best method for your problem.

Richard 

On Tuesday, 6 December 2016 19:12:22 UTC, jeffday wrote:
Hello,

My team is trying to implement a custom GrammarMatcher to pass to our Jep parser to determine if a stream of tokens is a valid expression in our product's macro language.

The first check for a valid expression we wanted to implement was simple: does the expression contain any operators (if there are no operators in the expression than in cannot be valid). If no operators were found, we wanted JEP to signal a grammar exception. If operators were found, we wanted JEP to move onto the other grammar checkers. We created a GrammarChecker implementing the match method as below. However, we ran into an issue that if the match method returned null when paramLookahead2Iterator.peekNext() was an operator token, the method would throw an exception. So it seems that whether or not the match method throws an exception on a null return value is dependent on the state of the Lookahead2Iterator. Is that correct? It was also unclear to us if we needed to return a valid node object using a NodeFactory and if we did, what type of Node should be returned (we saw in the implementation of some of the standard GrammarMatchers included in the jep jar that their match methods all returned a node of some type).



Reply all
Reply to author
Forward
0 new messages