[JIRA] (JENKINS-53901) Using readFile does not handle UTF-8 with BOM files

jakub@pawlinski.pl (JIRA)

unread,

Oct 4, 2018, 6:31:08 AM10/4/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski created an issue

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Issue Type:	Bug
Assignee:	Andrew Bayer
Components:	pipeline-model-definition-plugin
Created:	2018-10-04 10:30
Environment:	Jenkins 2.73.1 and Jenkins 2.81 Pipeline Groovy Plugin 2.40
Priority:	Blocker
Reporter:	Jakub Pawlinski

The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error:
an exception which occurred:
in field com.cloudbees.groovy.cps.impl.BlockScopeEnv.locals
in object com.cloudbees.groovy.cps.impl.LoopBlockScopeEnv@29044815
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@25c9f135
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@307ab985
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@5a92c230
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@37a0a42f
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@184a6ff5
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@676c6c8d
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@19f01356
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@74d1467b
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@4d098490
in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@28223d82
in field com.cloudbees.groovy.cps.impl.CallEnv.caller
in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@6e27611b
in field com.cloudbees.groovy.cps.Continuable.e
in object org.jenkinsci.plugins.workflow.cps.SandboxContinuable@78ff9c41
in field org.jenkinsci.plugins.workflow.cps.CpsThread.program
in object org.jenkinsci.plugins.workflow.cps.CpsThread@7841b6fe
in field org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.threads
in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
Caused: java.io.NotSerializableException: java.util.TreeMap$Entry
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:860)
at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
at java.util.HashMap.internalWriteEntries(HashMap.java:1785)
at java.util.HashMap.writeObject(HashMap.java:1362)
at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
at java.util.TreeMap.writeObject(TreeMap.java:2438)
at sun.reflect.GeneratedMethodAccessor176.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:140)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:458)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:434)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:422)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:362)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

A test repo was created to replicate this.

https://github.com/sflynn-dell/pipeline-test

Branches:
declarative-script - readFile is successful when used inside a script closure.
declarative-env - readFile fails when used inside an environment enclosure.

Add Comment

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jakub@pawlinski.pl (JIRA)

unread,

Oct 4, 2018, 6:44:06 AM10/4/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski updated an issue

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Jakub Pawlinski

The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error:
an exception which occurred:

in field com Using Jenkins ver . cloudbees 2 . groovy 121 . cps.impl.BlockScopeEnv.locals

at org.jboss.marshalling.river.RiverMarshaller.doWriteObject 2 I'm extracting xml file ( RiverMarshaller.java:860 nuspec )
at org from some nuget packages and trying to parse it . jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports : 65)
at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java {code : 56)
at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream. java :50) }
at org. jboss xml . marshalling sax . river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java SAXParseException; lineNumber : 179)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java 1; columnNumber : 344)
at java 1; Content is not allowed in prolog . util.HashMap.internalWriteEntries(HashMap.java:1785)
at java.util.HashMap.writeObject(HashMap.java:1362) {code}
at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java The way I'm parsing xml is : 43)
at java.lang.reflect.Method.invoke(Method.java {code : 498)
at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass. java :273) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976) @NonCPS
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject def parsePackage ( RiverMarshaller.java:854 packageName, packageVersion ) {
at org     def packageFullName = "${packageName} . jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) ${packageVersion}"
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java   bat """curl -L https : 988)
at org //www . jboss nuget . marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org /api/v2/package/${packageName}/${packageVersion} -o ${packageFullName} . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) nupkg"""
at org   bat """unzip ${packageFullName} . jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) nupkg -d ${packageFullName}"""
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org   def nuspecPath = """${packageFullName}\\${packageName} . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) nuspec"""
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java   def nuspecContent = readFile file : 854) nuspecPath
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields   def nuspecXML = new XmlSlurper ( RiverMarshaller.java:1032 false, false )
at org . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject parseText ( RiverMarshaller.java:988 nuspecContent )
at org   println nuspecXML . jboss metadata . marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) version
at org
  def newXml = XmlUtil . jboss.marshalling.river.RiverMarshaller.doWriteObject serialize ( RiverMarshaller.java:854 nuspecXML )
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)   return newXml
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967) {code}
at org It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)

at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)

at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854) I tried to replicate it directly in groovy doing
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java {code : 1032)
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller. java :988) }
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject def xmldata = new File ( RiverMarshaller "Newtonsoft . java:967)
at org Json . jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854 nuspec" )
at org . jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) text
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject def pkg = new XmlSlurper ( RiverMarshaller.java:988 )
at org . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject parseText ( RiverMarshaller.java:967 xmldata )
at org
println pkg . jboss metadata . marshalling version . river.RiverMarshaller.doWriteObject text ( RiverMarshaller.java:854 )
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032) {code}
at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988) But here the leading BOM characters are not passed into xmldata variable

at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)

at org Attached example nuspec with BOM in it . jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)

jakub@pawlinski.pl (JIRA)

unread,

Oct 4, 2018, 6:44:06 AM10/4/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski updated an issue

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Jakub Pawlinski
Attachment:	Newtonsoft.Json.nuspec

Add Comment

jakub@pawlinski.pl (JIRA)

unread,

Oct 4, 2018, 6:51:02 AM10/4/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski updated an issue

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Jakub Pawlinski
Environment:	Jenkins 2. 73 121 . 1 2 and Jenkins 2.81 Pipeline Groovy Plugin 2. 40 54

Add Comment

jakub@pawlinski.pl (JIRA)

unread,

Oct 4, 2018, 6:51:03 AM10/4/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski updated an issue

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Jakub Pawlinski

Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
{code:java}
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
{code}

The way I'm parsing xml is:

{code:java}
@NonCPS
def parsePackage(packageName, packageVersion) {
    def packageFullName = "${packageName}.${packageVersion}"
  bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
  bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""

  def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
  def nuspecContent = readFile file:nuspecPath
  def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
  println nuspecXML.metadata.version

  def newXml = XmlUtil.serialize(nuspecXML)
  return newXml
}
{code}

It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

I tried to replicate it directly in groovy doing

{code:java}
def xmldata = new File("Newtonsoft.Json.nuspec").text
def pkg = new XmlSlurper().parseText(xmldata)
println pkg.metadata.version.text()
{code}

But here the leading BOM characters are not passed into xmldata variable

Attached example nuspec with BOM in it.

Add Comment

andrew.bayer@gmail.com (JIRA)

unread,

Oct 15, 2018, 5:45:02 AM10/15/18

to jenkinsc...@googlegroups.com

Andrew Bayer assigned an issue to Unassigned

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Andrew Bayer
Component/s:	workflow-basic-steps-plugin
Component/s:	pipeline-model-definition-plugin
Assignee:	Andrew Bayer

Add Comment

svanoort@cloudbees.com (JIRA)

unread,

Oct 15, 2018, 10:00:02 AM10/15/18

to jenkinsc...@googlegroups.com

Sam Van Oort commented on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

Jakub Pawlinski This is a known with the Unicode spec and the Java platform implementation of it, not Pipeline. In UTF-8 the BOM is neither needed nor suggested - since the BOM is essentially meaningless in UTF-8, Java transparently passes the BOM through.

First I'd make sure to add the "encloding: 'UTF-8'" argument to your readFile step to ensure it reads as UTF-8. Then we do postprocessing to correct for nonstandard input.

Personally, I'd do something like this to sanitize your input:

 
                                                                /** These are UTF-8 BOM characters */
private static String removeUTF8BOM(String s) {
    return s.replace("\uEFBBBF", "");
}

(might need to be \u FEFF, try it both ways).

There's also code snippets out there that do a more efficient approach, which only considers the leading bytes of the String.

Add Comment

svanoort@cloudbees.com (JIRA)

unread,

Oct 15, 2018, 10:01:02 AM10/15/18

to jenkinsc...@googlegroups.com

Sam Van Oort closed an issue as Not A Defect

This is due to a known problem with Java's implementation of the UTF-8 spec. Suggested an easy workaround in Pipeline code to solve the issue.

Jenkins /

JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

Change By:	Sam Van Oort
Status:	Open Closed
Resolution:	Not A Defect

Add Comment

jakub@pawlinski.pl (JIRA)

unread,

Nov 5, 2018, 5:44:02 AM11/5/18

to jenkinsc...@googlegroups.com

Jakub Pawlinski commented on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

Ok, but if its Java issue, why I could not replicate in locally using Groovy Version: 2.6.0-alpha-1 JVM: 1.8.0_111 Vendor: Oracle Corporation OS: Windows 10

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:12:02 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov commented on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.

 
                                                                $ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF

Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.

 
                                                                public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}
 
                                                            

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:13:02 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.

{code:java}

$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF

{code}

Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.

{code:java}

public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

{code}

https://stackoverflow.com/questions/5406172/utf-8-without-bom

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:15:03 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}
{code}

[ https://stackoverflow.com/questions/5406172/utf-8-without-bom ]

Perhaps, newer SlurpXML performs this santitation.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:15:04 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}
public static String deBOM(String fromUTF8) {
    if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}
{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer SlurpXML XMLSlurper performs this santitation.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:54:01 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}

public static CharSequence deBOM(CharSequence fromUTF8) {
    if (fromUTF8 == null) {
        return null
    }
    else if(fromUTF8.length() == 0) {
        return fromUTF8
    }
    else if(fromUTF8[0] == '\uFEFF') {
        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 10:54:02 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}

public static String CharSequence deBOM( String CharSequence fromUTF8) {
if(fromUTF8 == null) {

        return null
    }
    else if(fromUTF8.length() == 0) {
        return fromUTF8
    }
    else if(fromUTF8

[0] == '\uFEFF') {

        return fromUTF8[1..-1]
    }
    else {
        return fromUTF8
    }
}

{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 22, 2019, 11:03:02 PM7/22/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

I guess Sam used vague wording. It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.
{code:java}
$ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
FEFF
{code}
Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files. Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.
{code:java}

public static CharSequence deBOM(CharSequence fromUTF8 s ) {
    if ( fromUTF8 s == null) {
        return null
    } else if ( fromUTF8 s .length() == 0) {
        return fromUTF8 s
    } else if ( fromUTF8 s [0] == '\uFEFF') {
        return fromUTF8[1 s . .- drop( 1 ] )
    } else {
        return fromUTF8 s

}
}
{code}
[https://stackoverflow.com/questions/5406172/utf-8-without-bom]

Perhaps, newer XMLSlurper performs this santitation.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 30, 2019, 1:25:02 AM7/30/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov commented on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.

 
                                                                          +------------------+----------+
          | Leading sequence | Encoding |
          +------------------+----------+
          | FF FE 00 00      | UTF-32LE |
          | 00 00 FE FF      | UTF-32BE |
          | FF FE            | UTF-16LE |
          | FE FF            | UTF-16BE |
          | EF BB BF         | UTF-8    |
          +------------------+----------+ 
                                                            

http://www.rfc-editor.org/rfc/rfc4329.txt

So readFile needs a mode (or a special value for encoding) to enable automatic decoding of that post-modern BOM and then apply the detected decoding to the rest of the contents.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 30, 2019, 1:26:01 AM7/30/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.

{noformat}

+------------------+----------+{noformat}
[http://www.rfc-editor.org/rfc/rfc4329.txt]

So readFile needs a mode (or a special value for the {{ encoding }} parameter ) to enable automatic decoding of that post-modern BOM and then apply the detected decoding to the rest of the contents.

Add Comment

ilatypov@yahoo.ca (JIRA)

unread,

Jul 30, 2019, 1:26:03 AM7/30/19

to jenkinsc...@googlegroups.com

Ilguiz Latypov edited a comment on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.
{noformat}
          +------------------+----------+
          | Leading sequence | Encoding |
          +------------------+----------+
          | FF FE 00 00      | UTF-32LE |
          | 00 00 FE FF      | UTF-32BE |
          | FF FE            | UTF-16LE |
          | FE FF            | UTF-16BE |
          | EF BB BF         | UTF-8    |
          +------------------+----------+{noformat}
[http://www.rfc-editor.org/rfc/rfc4329.txt]

So readFile needs a mode (or a special value for the {{encoding}} parameter) to enable automatic decoding of that sense the post-modern BOM and then apply decode the detected decoding to the rest of the contents accordingly .

Add Comment

jakub@pawlinski.pl (JIRA)

unread,

Sep 9, 2019, 6:52:02 AM9/9/19

to jenkinsc...@googlegroups.com

Jakub Pawlinski commented on

JENKINS-53901

Re: Using readFile does not handle UTF-8 with BOM files

same issue with readCSV, possibly all other ways of reading files via jenkins. The issue with readCSV is more severe as I cannot step in between reading the content of the file and the content being processed to Commons CSV structure. Only way to do this is to readFile and parse it manually which makes readCSV (and other functionalities like that) redundant.

I still don't understand why you claim its not jenkins but java issue while its not replicable even in newer groovy version.