Analysis failure in Java Properties plugin

110 views
Skip to first unread message

G. Ann Campbell

unread,
May 3, 2016, 7:51:18 AM5/3/16
to SonarQube
Hi David,

I upgraded the Java Properties plugin on Nemo yesterday, and today find that the analysis of the Wicket project failed with:

Unable to analyze file: /scratch/jenkins/workspace/ZZZZ_SLAVE_SIZE_wicket/wicket-core/src/main/java/org/apache/wicket/Application_ar.properties: Unable to highlight file [moduleKey=org.apache.wicket:wicket-core, relative=src/main/java/org/apache/wicket/Application_ar.properties, basedir=/scratch/jenkins/workspace/ZZZZ_SLAVE_SIZE_wicket/wicket-core] from offset 3650 to offset 3675: 3675 is not a valid offset for file [moduleKey=org.apache.wicket:wicket-core, relative=src/main/java/org/apache/wicket/Application_ar.properties

There aren't many more details but let me know if you need me to try to dredge something up.


Ann

---
G. Ann CAMPBELL | SonarSource
Product Owner

mjdet...@gmail.com

unread,
May 3, 2016, 9:00:28 AM5/3/16
to SonarQube
The file likely contains characters that are not part of ISO-8859-1. According to Java specification, properties files should be encoded as ISO-8859-1 and any special characters should use a "\uXXXX" character code representation. This conversion can be done with the native2ascii tool that is bundled with the JDK.

The only option right now is to fix the file or exclude it from analysis.

G. Ann Campbell

unread,
May 3, 2016, 9:43:15 AM5/3/16
to SonarQube, mjdet...@gmail.com
I'm not sure whether or not the file in question changed, but regardless of what is or is not valid in it, it shouldn't fail the analysis altogether. Log a parsing error, sure, but not kill the analysis.


Ann

David Racodon

unread,
May 3, 2016, 1:16:10 PM5/3/16
to G. Ann Campbell, SonarQube, mjdet...@gmail.com
Hi guys,

This is not a parsing issue otherwise it would have raised a parsing issue and the analysis would have moved to the next file without crashing.

As discussed 

David RACODON
Freelance QA Consultant

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonarqube/f7ec95f4-ab8d-4392-a623-fb8afea40dfa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

G. Ann Campbell

unread,
May 3, 2016, 1:26:31 PM5/3/16
to SonarQube, ann.ca...@sonarsource.com, mjdet...@gmail.com
Sorry, you're right. Not parsing.


Ann


On Tuesday, 3 May 2016 13:16:10 UTC-4, David Racodon wrote:
Hi guys,

This is not a parsing issue otherwise it would have raised a parsing issue and the analysis would have moved to the next file without crashing.

As discussed 

David RACODON
Freelance QA Consultant

On Tue, May 3, 2016 at 3:43 PM, G. Ann Campbell <ann.ca...@sonarsource.com> wrote:
I'm not sure whether or not the file in question changed, but regardless of what is or is not valid in it, it shouldn't fail the analysis altogether. Log a parsing error, sure, but not kill the analysis.


Ann

On Tuesday, 3 May 2016 09:00:28 UTC-4, mjdet...@gmail.com wrote:
The file likely contains characters that are not part of ISO-8859-1.  According to Java specification, properties files should be encoded as ISO-8859-1 and any special characters should use a "\uXXXX" character code representation.  This conversion can be done with the native2ascii tool that is bundled with the JDK.

The only option right now is to fix the file or exclude it from analysis.

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+unsubscribe@googlegroups.com.

David Racodon

unread,
May 3, 2016, 1:29:19 PM5/3/16
to G. Ann Campbell, SonarQube, mjdet...@gmail.com
The full email...

Hi guys,

This is not a parsing issue otherwise it would have raised a parsing issue and the analysis would have moved to the next file without crashing.

As discussed a couple months ago with Matthew: https://github.com/racodond/sonar-jproperties-plugin/issues/15, this is a corner case. It happens during syntax highlighting when:
  • The file is not encoded in ISO-8859-1 as required by the Java specifications
  • One character is decoded as multiple characters using ISO-8859-1 decoder
The case here is the same as Matthew faced. A UTF-8 encoded file embedding characters decoded as multiple characters with ISO-8859-1 decoder. I didn't really want to deal with this specific use case. The documentation of the plugin already states that files must be encoded in ISO-8859-1 and it points to this potential encoding issue while facing this kind of error: https://github.com/racodond/sonar-jproperties-plugin/blob/master/README.md#notes

I tend to not do anything about this. But your feedback is more than welcome.

Thank you

Regards,

David RACODON
Freelance QA Consultant

On Tue, May 3, 2016 at 7:26 PM, G. Ann Campbell <ann.ca...@sonarsource.com> wrote:
Sorry, you're right. Not parsing.


Ann


On Tuesday, 3 May 2016 13:16:10 UTC-4, David Racodon wrote:
Hi guys,

This is not a parsing issue otherwise it would have raised a parsing issue and the analysis would have moved to the next file without crashing.

As discussed 

David RACODON
Freelance QA Consultant

On Tue, May 3, 2016 at 3:43 PM, G. Ann Campbell <ann.ca...@sonarsource.com> wrote:
I'm not sure whether or not the file in question changed, but regardless of what is or is not valid in it, it shouldn't fail the analysis altogether. Log a parsing error, sure, but not kill the analysis.


Ann

On Tuesday, 3 May 2016 09:00:28 UTC-4, mjdet...@gmail.com wrote:
The file likely contains characters that are not part of ISO-8859-1.  According to Java specification, properties files should be encoded as ISO-8859-1 and any special characters should use a "\uXXXX" character code representation.  This conversion can be done with the native2ascii tool that is bundled with the JDK.

The only option right now is to fix the file or exclude it from analysis.

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonarqube/7759d1fa-be93-41b6-b3e6-b03bf48528fa%40googlegroups.com.

Michel Pawlak

unread,
May 4, 2016, 4:32:26 AM5/4/16
to SonarQube, ann.ca...@sonarsource.com, mjdet...@gmail.com
Hi David,

Have you thought about using Apache Tika ?


You can then:
  • call `setDeclaredEncoding(String encoding)` to help the detector
  • call `detect()` to get the a `CharsetMatch` instance matching the charset that matches best the input.
  • call `getLanguage()` on the `CharsetMatch` instance to get the language as an ISO string
  • call `getConfidence()` on the `CharsetMatch` instance to get confidence on a scale ranging from 0 to 100
  • decide depending on your own confidence threshold if highlighting (or any other task such as raising a SQ issue or even ignoring the file (and log the problem)) has to be done or not
I'm pretty sure this would be an elegant solution to this problem.

Hope it helps.

Michel

Michel

David Racodon

unread,
May 4, 2016, 5:56:39 AM5/4/16
to Michel Pawlak, SonarQube, mjdet...@gmail.com, ann.ca...@sonarsource.com

Thanks a lot for your feedback Michel!
I'll give it a try and keep you posted.

David

David Racodon

unread,
Jun 1, 2016, 4:48:38 AM6/1/16
to Michel Pawlak, SonarQube, Matthew DeTullio, G. Ann Campbell
I'm not really convinced by the encoding guessing accuracy of the library. Thus, I will leave it as it is for now.

David RACODON
Freelance QA Consultant

Reply all
Reply to author
Forward
0 new messages