How to parse string literals with line breaks?

49 views
Skip to first unread message

gattin...@googlemail.com

unread,
Sep 13, 2017, 6:13:19 AM9/13/17
to Jep Java Users
Hi!

Currently I'm facing the problem that Jep seems unable to parse expressions that contain string literals with line breaks (i. e. containing "\n" or "\r\n").

Here is a very simple test showing the problem:

@Test
public void jepTest() throws ParseException {
ConfigurableParser cp = new ConfigurableParser();
cp.addSingleQuoteStrings();
Jep jep = new Jep(cp);

String expression = "'line\nbreak'";
Node node = jep.parse(expression);
System.out.println(jep.toString(node));
}

Running the test will throw the following exception:
com.singularsys.jep.ParseException: Could not match text ''line'.

    at com.singularsys.jep.configurableparser.Tokenizer.nextTokenMultiLine(Unknown Source)
    at com.singularsys.jep.configurableparser.Tokenizer.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.parse(Unknown Source)
    at com.singularsys.jep.Jep.parse(Unknown Source)



If you remove the "\n" from the expression string then the test passes printing "linebreak" on the console.

Is there any setting within the ConfigurableParser to enable line breaks within string literals in expressions?

I appreciate any help on this problem.

Kind regards,
Marcus

gattin...@googlemail.com

unread,
Sep 13, 2017, 8:48:02 AM9/13/17
to Jep Java Users
I expect that the console output is:
line
break

This is imprtant because evaluating (not parsing) the expression with jep.evaluate(jep.parse(expression)) should be
line
break
too.

Richard Morris

unread,
Sep 13, 2017, 9:47:12 AM9/13/17
to Jep Java Users
I see you are trying to match a string broken into multiple lines. The way the tokenizer in the configurable parser works is it process a line at a time.
This generally makes things fairly efficient by does require some special procedures for working with multi-line elements.

There is a class com.singularsys.jep.configurableparser.matchers.MultiLineMatcher which can handle such cases. It used for working with lintiline /* .. */ comments but can be uses to match multiline strings.

Its constructor requires three arguments. 
    A TokenMatcher which matches the start of the element
    A TokenMatcher which matches the end of the element
    A TokenBuilder which constructs a token of the appropriate type.

An example of matching a multiline string would be

        // matches the start of the string 
        // a ' at the start of the string and no subsaquent ' 
     TokenMatcher start = new TokenMatcher() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
if(s.startsWith("'")
&& s.indexOf('\'', 1) == -1) 
return new StringToken(s,s.substring(1),'\'',false);
return null;
}

@Override
public void init(Jep jep) { }
    };

        // Matcher for the end of the string
        // looks for a ' somewhere in the string
    TokenMatcher end = new TokenMatcher() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
int pos = s.indexOf('\'');
if(pos >= 0)
return new StringToken(s.substring(0, pos+1),s.substring(0,pos),'\'',true);
return null;
}

@Override
public void init(Jep jep) { }
    };
   
        // Builds the result
    TokenBuilder tb = new TokenBuilder() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
return null;
}

@Override
public void init(Jep jep) {
}

                        // builds the token, the s here will be that matched by the start, that matched by the end and all intermediate lines
                        // String token constructor requires the whole text matched, then the string with quotes removed, then the delimiter
@Override
public Token buildToken(String s) {
return new StringToken(s,s.substring(1,s.length()-1),'\'',false);
}
        };

        // Now combine these into a MultiLineMatcher 
        TokenMatcher m =  new MultiLineMatcher(start,end,tb);

        // Initialise Jep and test
        ConfigurableParser cp = new StandardConfigurableParser();
        cp.addTokenMatcher(m);
        jep = new Jep(cp);
        String s1 = "'line\nbreak'";
        Node n1 = jep.parse(s1);
        jep.println(n1);


 The above implementation is a bit basic. It does not check for escaped single quotes. At the moment is the last matcher tested. You might need to adjust the order.

Hope it does what you need.

Richard

gattin...@googlemail.com

unread,
Sep 15, 2017, 12:46:06 PM9/15/17
to Jep Java Users
Hi Richard,

thank you for your answer. Unfortunately this approach does not work for little more sophisticated expressions.
  1. If the expression is "if(true, 'line\nbreak', 'else')" then the parsing fails with an exception.
  2. Furthermore it would be quite nice, if the parse would work with either a single or double quoted strings within an expression.
The first one is crucial for me, the second one a nice to have.

Hope you can point me in the right direction to solve it.

Kind regards,
Marcus

Richard Morris

unread,
Sep 15, 2017, 1:52:51 PM9/15/17
to Jep Java Users


On Friday, 15 September 2017 17:46:06 UTC+1, Marcus Gattinger wrote:
Hi Richard,

thank you for your answer. Unfortunately this approach does not work for little more sophisticated expressions.
  1. If the expression is "if(true, 'line\nbreak', 'else')" then the parsing fails with an exception.

That's odd. I've checked it on my end, just adding to end of above code and it works fine.

        String s4 = "if(true, 'line\nbreak', 'else')";
        Node n4 = jep.parse(s4);
        jep.println(n4);

Do you have a complete example where it fails.
  1. Furthermore it would be quite nice, if the parse would work with either a single or double quoted strings within an expression.
The first one is crucial for me, the second one a nice to have.

For the second you probably need to write a second Multiline matcher to match double-quoted strings. Identical to the above but with the ' changed to a ".
 

gattin...@googlemail.com

unread,
Sep 17, 2017, 3:06:36 PM9/17/17
to Jep Java Users
Richard,

here is the complete test code:

@Test
public void test() throws ParseException, EvaluationException {

// matches the start of the string
    // a ' at the start of the string and no subsequent '
    TokenMatcher start = new TokenMatcher() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
            if (s.startsWith("'") && s.indexOf('\'', 1) == -1) {

return new StringToken(s, s.substring(1), '\'', false);
}
return null;
}

@Override
public void init(Jep jep) { }
};

// Matcher for the end of the string
// looks for a ' somewhere in the string
TokenMatcher end = new TokenMatcher() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
int pos = s.indexOf('\'');
            if (pos >= 0) {

return new StringToken(s.substring(0, pos + 1), s.substring(0, pos), '\'', true);
}
return null;
}

@Override
public void init(Jep jep) { }
};

// Builds the result
TokenBuilder tb = new TokenBuilder() {
private static final long serialVersionUID = 1L;

@Override
public Token match(String s) throws ParseException {
return null;
}

@Override
public void init(Jep jep) { }

// builds the token, the s here will be that matched by the start, that matched by the end and all intermediate lines
// String token constructor requires the whole text matched, then the string with quotes removed, then the delimiter
@Override
public Token buildToken(String s) {
return new StringToken(s, s.substring(1, s.length() - 1), '\'', false);
}
};

// Now combine these into a MultiLineMatcher
TokenMatcher m = new MultiLineMatcher(start, end, tb);

    ConfigurableParser cp = new ConfigurableParser();
cp.addSingleQuoteStrings();
    cp.addTokenMatcher(m);

Jep jep = new Jep(cp);

    String expression = "if(true,'line\nbreak','else')";
Node node = jep.parse(expression);
String parsedExpression = jep.toString(node);
String evaluatedExpression = jep.evaluate(node).toString();
System.out.println(parsedExpression);
System.out.println(evaluatedExpression);
}

And here is the exception thrown in line
Node node = jep.parse(expression);

com.singularsys.jep.ParseException: Could not match text 'if(true,'line'.


    at com.singularsys.jep.configurableparser.Tokenizer.nextTokenMultiLine(Unknown Source)
    at com.singularsys.jep.configurableparser.Tokenizer.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.parse(Unknown Source)
    at com.singularsys.jep.Jep.parse(Unknown Source)



Regards,
Marcus

Richard Morris

unread,
Sep 17, 2017, 5:48:17 PM9/17/17
to Jep Java Users
Ah yes. The problem is in the set up of the ConfigurableParser.

In my code I have

TokenMatcher m =  new MultiLineMatcher(start,end,tb);
ConfigurableParser cp = new StandardConfigurableParser();
cp.addTokenMatcher(m);

In yours you have

 TokenMatcher m = new MultiLineMatcher(start, end, tb);

 ConfigurableParser cp = new ConfigurableParser();
 cp.addSingleQuoteStrings();
 cp.addTokenMatcher(m);


There is a big difference between ConfigurableParser(); and the StandardConfigurableParser();. The first has an empty set of rules - in can match precisely nothing. The StandardConfigurableParser() has a complete set of matching rules for comments, single and double quoted strings, etc. 

The full set of things it matches is

        cp.addHashComments();
        cp.addSlashComments();
        cp.addSingleQuoteStrings();
        cp.addDoubleQuoteStrings();
        cp.addWhiteSpace();
        cp.addExponentNumbers();
        cp.addOperatorTokenMatcher();
        cp.addSymbols("(",")","[","]",",");
        cp.setImplicitMultiplicationSymbols("(","[");
        cp.addIdentifiers();
        cp.addSemiColonTerminator();
        cp.addWhiteSpaceCommentFilter();
        cp.addBracketMatcher("(",")");
        cp.addFunctionMatcher("(",")",",");
        cp.addListMatcher("[","]",",");
        cp.addArrayAccessMatcher("[","]");

 If you want create your own parser you need to include most of these elements or alternatives, in roughly the order given. 


The simplest fix is just to replace ConfigurableParser(); with StandardConfigurableParser() and it will work.
I slightly better option would be to add the MultilineMatcher just after the cp.addSingleQuoteStrings(); and
cp.addDoubleQuoteStrings(); lines.


Hope that makes sense

Richard


On Wednesday, 13 September 2017 11:13:19 UTC+1, Marcus Gattinger wrote:

gattin...@googlemail.com

unread,
Sep 18, 2017, 2:09:46 AM9/18/17
to Jep Java Users
Sorry, my fault - you're right of course.

In the production code we use the ConfigurableParser with all the explicit settings but replace the
IdentifierTokenMatcher.basicIndetifierMatcher
by the
IdentifierTokenMatcher.dottedIndetifierMatcher
to allow variable names that contain dots within (btw: the method names are both misspelled).

So your solution works as expected.
Thank you again, Richard.

Kind regards,
Marcus

gattin...@googlemail.com

unread,
Oct 2, 2017, 7:34:57 AM10/2/17
to Jep Java Users
I've enhanced the Matchers a bit, so that they are able to parse escaped quotes, too.

Here is the code for all who are interested in:

/**
* Creates a token matcher to accept double quoted string literals that span multi lines.
*
* @return The created token matcher.
*/
private static TokenMatcher createDoubleQuotedMultiLineMatcher() {
// Create a matcher that matches the start of the string (looks for a double quote at the start of the string).
TokenMatcher startOfDoubleQuoteTokenMatcher = new TokenMatcher() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
if (s.charAt(0) != '"') {
return null;

}

return new StringToken(s, s.substring(1), '"', false);
}

        @Override
public void init(final Jep jep) { }
};

// Create a matcher that matches the end of the string (looks for a double quote somewhere in the string).
TokenMatcher endOfDoubleQuoteTokenMatcher = new TokenMatcher() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
int indexOfDoubleQuote = s.indexOf('"');
if (indexOfDoubleQuote < 0) {
return null;
}

// Check number of backslashes to decide whether or not the double quote is escaped.
while (true) {
int numberOfPrecedingBackslashes = 0;
for (int index = indexOfDoubleQuote - 1; index >= 0; index--) {
if (s.charAt(index) != '\\') {
break;
}
numberOfPrecedingBackslashes++;
}

boolean isEscaped = numberOfPrecedingBackslashes % 2 == 1;
if (!isEscaped) {
break;
}

indexOfDoubleQuote = s.indexOf('"', indexOfDoubleQuote + 1);
}

// No unescaped double quote found.
if (indexOfDoubleQuote < 0) {
return null;
}

return new StringToken(s.substring(0, indexOfDoubleQuote + 1), s.substring(0, indexOfDoubleQuote), '"', true);
}

@Override
public void init(final Jep jep) { }
};

// Create a token builder to build the complete double quoted token.
TokenBuilder doubleQuoteTokenBuilder = new TokenBuilder() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
return null;
}

@Override
public void init(final Jep jep) { }

@Override
public Token buildToken(final String s) {
// Builds the token, the s here will be that matched by the start, that matched by the end and all intermediate lines.
// String token constructor requires the whole text matched, then the string with quotes removed, then the delimiter.
            return new StringToken(s, s.substring(1, s.length() - 1), '\"', false);
}
};

    // Now combine the token matcher pair into a multi line token matcher.
return new MultiLineMatcher(startOfDoubleQuoteTokenMatcher, endOfDoubleQuoteTokenMatcher, doubleQuoteTokenBuilder);
}

/**
* Creates a token matcher to accept single quoted string literals that span multi lines.
*
* @return The created token matcher.
*/
private static TokenMatcher createSingleQuotedMultiLineMatcher() {
// Create a matcher that matches the start of the string (looks for a single quote at the start of the string).
TokenMatcher startOfSingleQuoteTokenMatcher = new TokenMatcher() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
if (s.charAt(0) != '\'') {
return null;

}

return new StringToken(s, s.substring(1), '\'', false);
}

        @Override
public void init(final Jep jep) { }
};

// Create a matcher that matches the end of the string (looks for a single quote somewhere in the string).
TokenMatcher endOfSingleQuoteTokenMatcher = new TokenMatcher() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
int indexOfSingleQuote = s.indexOf('\'');
if (indexOfSingleQuote < 0) {
return null;
}

// Check number of backslashes to decide whether or not the single quote is escaped.
while (true) {
int numberOfPrecedingBackslashes = 0;
for (int index = indexOfSingleQuote - 1; index >= 0; index--) {
if (s.charAt(index) != '\\') {
break;
}
numberOfPrecedingBackslashes++;
}

boolean isEscaped = numberOfPrecedingBackslashes % 2 == 1;
if (!isEscaped) {
break;
}

indexOfSingleQuote = s.indexOf('\'', indexOfSingleQuote + 1);
}

// No unescaped single quote found.
if (indexOfSingleQuote < 0) {
return null;
}

return new StringToken(s.substring(0, indexOfSingleQuote + 1), s.substring(0, indexOfSingleQuote), '\'', true);
}

@Override
public void init(final Jep jep) { }
};

// Create a token builder to build the complete single quoted token.
TokenBuilder singleQuoteTokenBuilder = new TokenBuilder() {

private static final long serialVersionUID = 1L;

@Override
        public Token match(final String s) throws ParseException {
return null;
}

@Override
public void init(final Jep jep) { }

@Override
public Token buildToken(final String s) {
// Builds the token, the s here will be that matched by the start, that matched by the end and all intermediate lines.
// String token constructor requires the whole text matched, then the string with quotes removed, then the delimiter.
            return new StringToken(s, s.substring(1, s.length() - 1), '\'', false);
}
};

    // Now combine the token matcher pair into a multi line token matcher.
return new MultiLineMatcher(startOfSingleQuoteTokenMatcher, endOfSingleQuoteTokenMatcher, singleQuoteTokenBuilder);
}
Reply all
Reply to author
Forward
0 new messages