ConfigurableParser cannot parse the expression with the Chinese variables

36 views
Skip to first unread message

andy.yo...@gmail.com

unread,
Jan 7, 2015, 3:55:42 PM1/7/15
to jep-...@googlegroups.com
Hi, I have the issue to parse Chinese variable with the ConfigurableParser. It works fine with English variable. The below is my test code. Does that ConfigurableParser support to parse Chinese (multi-byte chars) variable? if yes,
what am I missing. 

Thanks

Andy



import com.singularsys.jep.Jep;
import com.singularsys.jep.ParseException;
import com.singularsys.jep.configurableparser.ConfigurableParser;


public class TestConfigurableParser {

    /**
     * @param args
     * @throws ParseException
     */
    public static void main(String[] args) {
        ConfigurableParser cp = new ConfigurableParser();
        cp.addHashComments();
        cp.addSlashComments();
        cp.addDoubleQuoteStrings();
        cp.addWhiteSpace();
        cp.addExponentNumbers();
        cp.addOperatorTokenMatcher();
        cp.addSymbols("(",")","[","]",",");
        cp.setImplicitMultiplicationSymbols("(","[");
        cp.addIdentifiers();
        cp.addSemiColonTerminator();
        cp.addWhiteSpaceCommentFilter();
        cp.addBracketMatcher("(",")");
        cp.addFunctionMatcher("(",")",",");
        cp.addListMatcher("[","]",",");
        cp.addArrayAccessMatcher("[","]");

        // Construct the Jep instance and set the parser
        Jep jep = new Jep();
        jep.setComponent(cp);
       
        String expression = "ProductA + 1";
        try {
            jep.parse(expression);
        } catch (ParseException e) {
            e.printStackTrace();
            System.out.println("English expression doesn't work.");
        }
       
        expression = "产品A + 1";
        try {
            jep.parse(expression);
        } catch (ParseException e) {
            e.printStackTrace();
            System.out.println("Chinese expression doesn't work.");
        }

    }

===== out put ====
com.singularsys.jep.ParseException: Could not match text '产品A + 1'.
    at com.singularsys.jep.configurableparser.Tokenizer.nextTokenMultiLine(Unknown Source)
    at com.singularsys.jep.configurableparser.Tokenizer.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.scan(Unknown Source)
    at com.singularsys.jep.configurableparser.ConfigurableParser.parse(Unknown Source)
    at com.singularsys.jep.Jep.parse(Unknown Source)
    at com.singularsys.jep.Jep.parse(Unknown Source)
    at TestConfigurableParser.main(TestConfigurableParser.java:44)
Chinese expression doesn't work.


Richard Morris

unread,
Jan 7, 2015, 7:15:27 PM1/7/15
to jep-...@googlegroups.com
By default it only accepts ascii character for variable names. To change this behaviour you need to specify a different regexp which matches your desired character. I'm not really an expert on this but I think the regexp [\\p{L}_][\\p{L}\\p{N}_\\.]* should match. You need to specify it in the IdentifierTokenMatcher

ConfigurableParser cp = new ConfigurableParser();
cp.addHashComments();
cp.addSlashComments();

cp.addSingleQuoteStrings();


cp.addDoubleQuoteStrings();
cp.addWhiteSpace();
cp.addExponentNumbers();
cp.addOperatorTokenMatcher();
cp.addSymbols("(",")","[","]",",");
cp.setImplicitMultiplicationSymbols("(","[");

cp.addTokenMatcher(new IdentifierTokenMatcher("[\\p{L}_][\\p{L}\\p{N}_\\.]*"));

Reply all
Reply to author
Forward
0 new messages