Parenthesis containing long regular expressions in a regexp are improperly rejected as invalid

70 views
Skip to first unread message

Santiago Valencia

unread,
Nov 15, 2016, 10:11:15 AM11/15/16
to checkstyle
Hi,
Please have a look at the below regexp that is being rejected when Sonar tries to execute Checkstyle.

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 03:20 min

[INFO] Finished at: 2016-11-14T09:48:25+00:00 [INFO] Final Memory: 108M/451M [INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal org.sonarsource.scanner.maven:sonar-maven-plugin:3.2:sonar (default-cli) on project fulfilment-manager-service: Can not execute Checkstyle: cannot initialize module RegexpHeader - Cannot set property 'header' in module RegexpHeader to '^\/\*(\s\*{1,100})?$\n^\s\*\s{1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n((^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n)|(^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n))?^\s\*{1,}\/$\n': line 3 in header specification is not a regular expression: InvocationTargetException -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.


Even when this is a perfectly valid regexp header - checked in several regular expression testing sites with positive result - for some reason Checkstyle doesn't like the part that indicates:

((^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n)|(^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n))?

Curiously the above subset of the full regular expression passes as valid when not inside parenthesis, accepting

^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n

and also
^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n

as valid regexp. It also accepts the full regular expression when we escape the parenthesis but of course that provides a very different regular expression as what is expected is different as each escaped parenthesis mean we expect the parenthesis character there and not a grouping in a regexp.

It's more, to do it trickier Checkstyle accepts things like ^(a|b)$ as a valid regular expression but for some reason it doesn't seem to accept long regular expressions contained inside parenthesis.

Just to make it clear, it's not the double parenthesis outside what is the problem as (^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n) alone will not pass as valid either.

Any idea how could we make it pass and at the same time keep the regexp as it is?

Thanks in advance,
Santiago Valencia

R Veach

unread,
Nov 15, 2016, 10:44:37 AM11/15/16
to checkstyle
The complete exception I am getting is:


com.puppycrawl.tools.checkstyle.api.CheckstyleException: cannot initialize module RegexpHeader - Cannot set property 'header' to '^\/\*(\s\*{1,100})?$\n^\s\*\s{1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n((^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n)|(^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n))?^\s\*{1,}\/$\n' in module RegexpHeader
    at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:429)
Checkstyle ends with 1 errors.
    at com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:141)
    at com.puppycrawl.tools.checkstyle.Main.runCheckstyle(Main.java:421)
    at com.puppycrawl.tools.checkstyle.Main.runCli(Main.java:359)
    at com.puppycrawl.tools.checkstyle.Main.main(Main.java:174)
Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: Cannot set property 'header' to '^\/\*(\s\*{1,100})?$\n^\s\*\s{1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n((^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}$\n^\s\*\sWho\s\s:\s\b.{1,}$\n^\s\*\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*\sSource\scontrol$\n^\s\*\s{1,50}\$Revision:\s[0-9]*\.?[0-9]+\s\$\n^\s\*\s{1,50}\$Author:\s[A-Za-z0-9]*\s\$\n^\s\*\s{1,50}\$Date:\s\b.{1,}\$\n^\s\*$\n)|(^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n))?^\s\*{1,}\/$\n' in module RegexpHeader
    at com.puppycrawl.tools.checkstyle.api.AutomaticBean.tryCopyProperty(AutomaticBean.java:182)
    at com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:134)
    at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:425)
    ... 4 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.commons.beanutils.PropertyUtilsBean.invokeMethod(PropertyUtilsBean.java:2127)
    at org.apache.commons.beanutils.PropertyUtilsBean.setSimpleProperty(PropertyUtilsBean.java:2108)
    at org.apache.commons.beanutils.BeanUtilsBean.copyProperty(BeanUtilsBean.java:437)
    at com.puppycrawl.tools.checkstyle.api.AutomaticBean.tryCopyProperty(AutomaticBean.java:172)
    ... 6 more
Caused by: org.apache.commons.beanutils.ConversionException: line 3 in header specification is not a regular expression
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.postProcessHeaderLines(RegexpHeaderCheck.java:158)
    at com.puppycrawl.tools.checkstyle.checks.header.AbstractHeaderCheck.loadHeader(AbstractHeaderCheck.java:182)
    at com.puppycrawl.tools.checkstyle.checks.header.AbstractHeaderCheck.setHeader(AbstractHeaderCheck.java:156)
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.setHeader(RegexpHeaderCheck.java:178)
    ... 14 more
Caused by: java.util.regex.PatternSyntaxException: Unclosed group near index 8
((^\s\*$
        ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.accept(Pattern.java:1813)
    at java.util.regex.Pattern.group0(Pattern.java:2908)
    at java.util.regex.Pattern.sequence(Pattern.java:2051)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.group0(Pattern.java:2905)
    at java.util.regex.Pattern.sequence(Pattern.java:2051)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.postProcessHeaderLines(RegexpHeaderCheck.java:155)
    ... 17 more

It is hard to read your expression. If you could give us a smaller case, it might be easier to say what the problem is.

CS is first splitting the regexp by lines and are breaking it apart into smaller sections so we can tell the users exactly which line failed the expression and not just the expression failed.
Most likely, there is a disconnect between where you think the regexp ends for a line and where CS thinks it ends for a line and this creating an invalid expression and your exception.

Have you tried if you get the same problem with the property headerFile?
http://checkstyle.sourceforge.net/config_header.html#RegexpHeader

Roman Ivanov

unread,
Nov 15, 2016, 2:38:35 PM11/15/16
to R Veach, checkstyle
Hi Santiago,

http://checkstyle.sourceforge.net/report_issue.html#How_to_report_a_bug

please try to reproduce problem by our CLI, it will help to distinguish problems (sonar , checkstyle, .... ) and reproduce problem on our side.

--
You received this message because you are subscribed to the Google Groups "checkstyle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to checkstyle+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Santiago Valencia

unread,
Nov 16, 2016, 4:38:01 AM11/16/16
to checkstyle, rvea...@gmail.com
Thanks for your answers.

I have now a shorter version, picking up just one of the two possible alternative headers I'll explain so you can understand exactly what the expression is expecting.

The new regular expression would be:

^\/\*(\s\*{1,100})?$\n^\s\*\s{
1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n(^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n)?^\s\*{1,}\/$\n
This fails with exactly the same error being shown.

The expected valid headers would be:

/*
 * Copyright 2016 Our Company Name Here
 */


/* *******
 * Copyright 2014 Our Company Name Here
 *
 * What : CatalogueResource Who : SValencia When : 16 Nov 2016
 *
 * **************************************
 */

The new regular expression can be divided as it follows:
1) ^\/\*(\s\*{1,100})?$\n^\s\*\s{1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n

This part relates to the two possible header top two lines:

/*
 * Copyright 2016 Our Company Name Here

and

/* *******
 * Copyright 2014 Our Company Name Here


2) (^\s\*$\n^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n)?

This part relates to the second header only, it's optional because it's not present in the first header, the expected text is literally as it follows:

 *
 * What : CatalogueResource Who : SValencia When : 16 Nov 2016
 *
 * **************************************

3) ^\s\*{1,}\/$\n
This last part relates to the espace, asterisk(s) and a final slash, ending with a new line, that is to say:
 */

Hope it helps, I only have access to Sonar at the moment but I'll see if I can access a CLI and will report you on the output from that, otherwise I will ask for a developer help but this could take time for me to have something to share with you.

Thanks again,
Santiago Valencia

R Veach

unread,
Nov 16, 2016, 7:41:14 AM11/16/16
to checkstyle, rvea...@gmail.com
2) (^\s\*$\n^\s\*\sWhat\s:\s\b.{
1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?)$\n^\s\*$\n^\s\*(\s\*{1,100})?$\n)?

This part relates to the second header only, it's optional because it's not present in the first header, the expected text is literally as it follows:

This is where the failure is.

Caused by: org.apache.commons.beanutils.ConversionException: line 3 in header specification is not a regular expression
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.postProcessHeaderLines(RegexpHeaderCheck.java:159)

    at com.puppycrawl.tools.checkstyle.checks.header.AbstractHeaderCheck.loadHeader(AbstractHeaderCheck.java:182)
    at com.puppycrawl.tools.checkstyle.checks.header.AbstractHeaderCheck.setHeader(AbstractHeaderCheck.java:156)
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.setHeader(RegexpHeaderCheck.java:178)
    ... 14 more
Caused by: java.util.regex.PatternSyntaxException: Unclosed group near index 7

(^\s\*$
      ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.accept(Pattern.java:1813)
    at java.util.regex.Pattern.group0(Pattern.java:2908)
    at java.util.regex.Pattern.sequence(Pattern.java:2051)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)
    at com.puppycrawl.tools.checkstyle.checks.header.RegexpHeaderCheck.postProcessHeaderLines(RegexpHeaderCheck.java:155)
    ... 17 more

This check currently don't support multi-line matching and maintain order, so this is not a bug. You would have to submit this as a feature request.
The check is expecting a one to one ratio of lines in the input to lines we broke the expression into. It doesn't support matching 2 lines with 1 expression.

The first thing this check does is use the '\n's to identify where to break apart the big regexp apart.
It does this correctly for item 1 and 3, but it ignores the ()s on item 2 and this is where it incorrectly breaks the expression apart and creates an invalid pattern.
Instead of keeping 2 as `(blah\nblah)?` it thinks it should break it into `(blah` and `blah)?` and those aren't valid expressions.

Since this regexp contains a fully optional line, you can try to use the parameter `multiLines`, but it won't maintain order.
http://checkstyle.sourceforge.net/config_header.html#RegexpHeader


Example:
````
$ cat TestClass.java

/* *******
 * Copyright 2014 Our Company Name Here
 *
 * What : CatalogueResource Who : SValencia When : 16 Nov 2016
 *
 * **************************************
 */

public class TestClass {
    void method() {
    }
}

$ cat TestConfig.xml
<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
          "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
          "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<module name="Checker">
    <property name="charset" value="UTF-8"/>

  <module name="RegexpHeader">
    <property name="header" value="^\/\*(\s\*{1,100})?$\n^\s\*\s{1,50}Copyright\s\d\d\d\d?[-\d\d\d\d]+\sOur\sCompany\sName\sHere$\n(^\s\*$)|(^\s\*\sWhat\s:\s\b.{1,}\sWho\s:\s\b.{1,}\sWhen\s:\s(.*?))\n^\s\*(\s\*{1,100})?$\n^\s\*{1,}\/$"/>
    <property name="multiLines" value="3"/>
    <property name="fileExtensions" value="java"/>
  </module>
</module>

$ java -jar checkstyle-7.2-all.jar -c TestConfig.xml TestClass.java
Starting audit...
Audit done.
````


 

R Veach

unread,
Nov 16, 2016, 7:45:53 AM11/16/16
to checkstyle, rvea...@gmail.com
I forgot to mention that my example had some minor changes to the regexp you gave us.

Santiago Valencia

unread,
Nov 16, 2016, 8:02:11 AM11/16/16
to checkstyle, rvea...@gmail.com
Thank you R Veach and Roman Ivanov.

Regards,
Santiago Valencia
Reply all
Reply to author
Forward
0 new messages