[JIRA] (JENKINS-53901) Using readFile does not handle UTF-8 with BOM files

33 views
Skip to first unread message

jakub@pawlinski.pl (JIRA)

unread,
Oct 4, 2018, 6:31:08 AM10/4/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski created an issue
 
Jenkins / Bug JENKINS-53901
Using readFile does not handle UTF-8 with BOM files
Issue Type: Bug Bug
Assignee: Andrew Bayer
Components: pipeline-model-definition-plugin
Created: 2018-10-04 10:30
Environment: Jenkins 2.73.1 and Jenkins 2.81 Pipeline Groovy Plugin 2.40
Priority: Blocker Blocker
Reporter: Jakub Pawlinski

The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error:
an exception which occurred:
in field com.cloudbees.groovy.cps.impl.BlockScopeEnv.locals
in object com.cloudbees.groovy.cps.impl.LoopBlockScopeEnv@29044815
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@25c9f135
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@307ab985
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@5a92c230
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@37a0a42f
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@184a6ff5
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@676c6c8d
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@19f01356
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@74d1467b
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@4d098490
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@28223d82
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@6e27611b
in field com.cloudbees.groovy.cps.Continuable.e
in object org.jenkinsci.plugins.workflow.cps.SandboxContinuable@78ff9c41
in field org.jenkinsci.plugins.workflow.cps.CpsThread.program
in object org.jenkinsci.plugins.workflow.cps.CpsThread@7841b6fe
in field org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.threads
in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
Caused: java.io.NotSerializableException: java.util.TreeMap$Entry
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:860)
at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
at java.util.HashMap.internalWriteEntries(HashMap.java:1785)
at java.util.HashMap.writeObject(HashMap.java:1362)
at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
at java.util.TreeMap.writeObject(TreeMap.java:2438)
at sun.reflect.GeneratedMethodAccessor176.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:140)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:458)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:434)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:422)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:362)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

A test repo was created to replicate this.

https://github.com/sflynn-dell/pipeline-test

Branches:
declarative-script - readFile is successful when used inside a script closure.
declarative-env - readFile fails when used inside an environment enclosure.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jakub@pawlinski.pl (JIRA)

unread,
Oct 4, 2018, 6:44:06 AM10/4/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski updated an issue
Change By: Jakub Pawlinski
The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error:
an exception which occurred:
in field com Using Jenkins ver . cloudbees 2 . groovy 121 . cps.impl.BlockScopeEnv.locals
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject 2 I'm extracting xml file ( RiverMarshaller.java:860 nuspec )
at org
from some nuget packages and trying to parse it . jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports : 65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java {code : 56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.
java :50) }
at org. jboss xml . marshalling sax . river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java SAXParseException; lineNumber : 179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java
1; columnNumber : 344)
at java
1; Content is not allowed in prolog . util.HashMap.internalWriteEntries(HashMap.java:1785)
at java.util.HashMap.writeObject(HashMap.java:1362) {code}
at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
The way I'm parsing xml is : 43)
at java.lang.reflect.Method.invoke(Method.java {code : 498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.
java :273) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976) @NonCPS
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject def parsePackage ( RiverMarshaller.java:854 packageName, packageVersion ) {
at org     def packageFullName = "${packageName} . jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) ${packageVersion}"
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java   bat """curl -L https : 988)
at org
//www . jboss nuget . marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at
org /api/v2/package/${packageName}/${packageVersion} -o ${packageFullName} . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) nupkg"""
at org   bat """unzip ${packageFullName} . jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) nupkg -d ${packageFullName}"""
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org   def nuspecPath = """${packageFullName}\\${packageName} . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) nuspec"""
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java   def nuspecContent = readFile file : 854) nuspecPath
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields   def nuspecXML = new XmlSlurper ( RiverMarshaller.java:1032 false, false )
at org
. jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject parseText ( RiverMarshaller.java:988 nuspecContent )
at org   println nuspecXML . jboss metadata . marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) version
at org   
  def newXml = XmlUtil
. jboss.marshalling.river.RiverMarshaller.doWriteObject serialize ( RiverMarshaller.java:854 nuspecXML )
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)   return newXml
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) {code}
at org It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)  
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) I tried to replicate it directly in groovy doing 
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java {code : 1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.
java :988) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject def xmldata = new File ( RiverMarshaller "Newtonsoft . java:967)
at org
Json . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854 nuspec" )
at org
. jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) text
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject def pkg = new XmlSlurper ( RiverMarshaller.java:988 )
at org
. jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject parseText ( RiverMarshaller.java:967 xmldata )
at org

println pkg
. jboss metadata . marshalling version . river.RiverMarshaller.doWriteObject text ( RiverMarshaller.java:854 )
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) {code}
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) But here the leading BOM characters are not passed into xmldata variable
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)  
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org Attached example nuspec with BOM in it . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)

jakub@pawlinski.pl (JIRA)

unread,
Oct 4, 2018, 6:44:06 AM10/4/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski updated an issue
Change By: Jakub Pawlinski
Attachment: Newtonsoft.Json.nuspec

jakub@pawlinski.pl (JIRA)

unread,
Oct 4, 2018, 6:51:02 AM10/4/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski updated an issue
Change By: Jakub Pawlinski
Environment: Jenkins 2. 73 121 . 1 2 and Jenkins 2.81 Pipeline Groovy Plugin 2. 40 54

jakub@pawlinski.pl (JIRA)

unread,
Oct 4, 2018, 6:51:03 AM10/4/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski updated an issue
Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
{code:java}
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
{code}

The way I'm parsing xml is:
{code:java}
@NonCPS
def parsePackage(packageName, packageVersion) {
    def packageFullName = "${packageName}.${packageVersion}"
  bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
  bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""

  def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
  def nuspecContent = readFile file:nuspecPath
  def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
  println nuspecXML.metadata.version
  
  def newXml = XmlUtil.serialize(nuspecXML)
  return newXml
}
{code}

It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

 

I tried to replicate it directly in groovy doing 
{code:java}
def xmldata = new File("Newtonsoft.Json.nuspec").text
def pkg = new XmlSlurper().parseText(xmldata)
println pkg.metadata.version.text()
{code}

But here the leading BOM characters are not passed into xmldata variable

 

Attached example nuspec with BOM in it.

 

 

andrew.bayer@gmail.com (JIRA)

unread,
Oct 15, 2018, 5:45:02 AM10/15/18
to jenkinsc...@googlegroups.com
Andrew Bayer assigned an issue to Unassigned
Change By: Andrew Bayer
Component/s: workflow-basic-steps-plugin
Component/s: pipeline-model-definition-plugin
Assignee: Andrew Bayer

svanoort@cloudbees.com (JIRA)

unread,
Oct 15, 2018, 10:00:02 AM10/15/18
to jenkinsc...@googlegroups.com
Sam Van Oort commented on Bug JENKINS-53901
 
Re: Using readFile does not handle UTF-8 with BOM files

Jakub Pawlinski This is a known with the Unicode spec and the Java platform implementation of it, not Pipeline. In UTF-8 the BOM is neither needed nor suggested - since the BOM is essentially meaningless in UTF-8, Java transparently passes the BOM through.

First I'd make sure to add the "encloding: 'UTF-8'" argument to your readFile step to ensure it reads as UTF-8. Then we do postprocessing to correct for nonstandard input.

Some suggested solutions are available on StackOverflow.

Personally, I'd do something like this to sanitize your input:

/** These are UTF-8 BOM characters */
private static String removeUTF8BOM(String s) {
    return s.replace("\uEFBBBF", "");
}

(might need to be \u FEFF, try it both ways).

There's also code snippets out there that do a more efficient approach, which only considers the leading bytes of the String.

svanoort@cloudbees.com (JIRA)

unread,
Oct 15, 2018, 10:01:02 AM10/15/18
to jenkinsc...@googlegroups.com
Sam Van Oort closed an issue as Not A Defect
 

This is due to a known problem with Java's implementation of the UTF-8 spec. Suggested an easy workaround in Pipeline code to solve the issue.

Change By: Sam Van Oort
Status: Open Closed
Resolution: Not A Defect

jakub@pawlinski.pl (JIRA)

unread,
Nov 5, 2018, 5:44:02 AM11/5/18
to jenkinsc...@googlegroups.com
Jakub Pawlinski commented on Bug JENKINS-53901
 
Re: Using readFile does not handle UTF-8 with BOM files

Ok, but if its Java issue, why I could not replicate in locally using Groovy Version: 2.6.0-alpha-1 JVM: 1.8.0_111 Vendor: Oracle Corporation OS: Windows 10

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:12:02 PM7/22/19
to jenkinsc...@googlegroups.com

I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.

$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF

Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.

public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:13:02 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}

$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}

Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}

public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:15:03 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}
{code}

[ https://stackoverflow.com/questions/5406172/utf-8-without-bom ]

 

Perhaps, newer SlurpXML performs this santitation.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:15:04 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}
{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

 

Perhaps, newer SlurpXML XMLSlurper performs this santitation.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:54:01 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static CharSequence deBOM(CharSequence fromUTF8) {
    if (fromUTF8 == null) {
        return null
    }

    else if(fromUTF8.length() == 0) {
        return fromUTF8
    }

    else if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }

    else {
        return fromUTF8
    }
}

{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 10:54:02 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static String CharSequence deBOM( String CharSequence fromUTF8) {
    if(fromUTF8
== null) {

        return null
    }
    else if(fromUTF8.length() == 0) {
        return fromUTF8
    }
    else if(fromUTF8
[0] == '\uFEFF') {

        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 22, 2019, 11:03:02 PM7/22/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static CharSequence deBOM(CharSequence fromUTF8 s ) {
    if (
fromUTF8 s == null) {
        return null
    } else if (
fromUTF8 s .length() == 0) {
        return
fromUTF8 s
    } else if (
fromUTF8 s [0] == '\uFEFF') {
        return
fromUTF8[1 s . .- drop( 1 ] )
    } else {
        return
fromUTF8 s
    }
}
{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 30, 2019, 1:25:02 AM7/30/19
to jenkinsc...@googlegroups.com

This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.

          +------------------+----------+
          | Leading sequence | Encoding |
          +------------------+----------+
          | FF FE 00 00      | UTF-32LE |
          | 00 00 FE FF      | UTF-32BE |
          | FF FE            | UTF-16LE |
          | FE FF            | UTF-16BE |
          | EF BB BF         | UTF-8    |
          +------------------+----------+

http://www.rfc-editor.org/rfc/rfc4329.txt

So readFile needs a mode (or a special value for encoding) to enable automatic decoding of that post-modern BOM and then apply the detected decoding to the rest of the contents.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 30, 2019, 1:26:01 AM7/30/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.
{noformat}

          +------------------+----------+
          | Leading sequence | Encoding |
          +------------------+----------+
          | FF FE 00 00      | UTF-32LE |
          | 00 00 FE FF      | UTF-32BE |
          | FF FE            | UTF-16LE |
          | FE FF            | UTF-16BE |
          | EF BB BF         | UTF-8    |
          +------------------+----------+{noformat}
[http://www.rfc-editor.org/rfc/rfc4329.txt]

So readFile needs a mode (or a special value for
the {{ encoding }} parameter ) to enable automatic decoding of that post-modern BOM and then apply the detected decoding to the rest of the contents.

ilatypov@yahoo.ca (JIRA)

unread,
Jul 30, 2019, 1:26:03 AM7/30/19
to jenkinsc...@googlegroups.com
Ilguiz Latypov edited a comment on Bug JENKINS-53901
This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.
{noformat}
          +------------------+----------+
          | Leading sequence | Encoding |
          +------------------+----------+
          | FF FE 00 00      | UTF-32LE |
          | 00 00 FE FF      | UTF-32BE |
          | FF FE            | UTF-16LE |
          | FE FF            | UTF-16BE |
          | EF BB BF         | UTF-8    |
          +------------------+----------+{noformat}
[http://www.rfc-editor.org/rfc/rfc4329.txt]

So readFile needs a mode (or a special value for the {{encoding}} parameter) to enable automatic decoding of that sense the post-modern BOM and then apply decode the detected decoding to the rest of the contents accordingly .

jakub@pawlinski.pl (JIRA)

unread,
Sep 9, 2019, 6:52:02 AM9/9/19
to jenkinsc...@googlegroups.com

same issue with readCSV, possibly all other ways of reading files via jenkins. The issue with readCSV is more severe as I cannot step in between reading the content of the file and the content being processed to Commons CSV structure. Only way to do this is to readFile and parse it manually which makes readCSV (and other functionalities like that) redundant.

I still don't understand why you claim its not jenkins but java issue while its not replicable even in newer groovy version.

jakub@pawlinski.pl (JIRA)

unread,
Sep 9, 2019, 6:54:03 AM9/9/19
to jenkinsc...@googlegroups.com

jakub@pawlinski.pl (JIRA)

unread,
Sep 9, 2019, 6:54:04 AM9/9/19
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages