synchronize.sh fails on "Character reference "&#x1" is an invalid XML character"

1,052 views
Skip to first unread message

Clas

unread,
Jul 29, 2009, 3:05:14 PM7/29/09
to JetS3t Users

Hi,

I have problem running the synchronize.sh script for one of my
directories/users. I use Jets3t 0.7.1 with:

./synchronize.sh UP --nodelete server-name /home/user UP Local

After:

[/home/user] => S3[server-name]

I get the following error:

ERROR [org.jets3t.service.multithread.S3ServiceMulti
$ThreadGroupManager] A thread failed with an exception. Firing ERROR
event and cancelling all threads
org.jets3t.service.S3ServiceException: Failed to parse XML document
with handler class org.jets3t.service.impl.rest.XmlResponsesSaxParser
$ListBucketHandler
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseXmlInputStream
(XmlResponsesSaxParser.java:122)
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseListBucketObjectsResponse
(XmlResponsesSaxParser.java:190)
at
org.jets3t.service.impl.rest.httpclient.RestS3Service.listObjectsInternal
(RestS3Service.java:1121)
at
org.jets3t.service.impl.rest.httpclient.RestS3Service.listObjectsChunkedImpl
(RestS3Service.java:1087)
at org.jets3t.service.S3Service.listObjectsChunked
(S3Service.java:1488)
at org.jets3t.service.multithread.S3ServiceMulti
$ListObjectsRunnable.run(S3ServiceMulti.java:1846)
at java.lang.Thread.run(Thread.java:595)
Caused by: org.xml.sax.SAXParseException: Character reference "&#x1"
is an invalid XML character.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException
(ErrorHandlerWrapper.java:236)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError
(ErrorHandlerWrapper.java:215)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError
(XMLErrorReporter.java:386)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError
(XMLErrorReporter.java:316)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError
(XMLScanner.java:1438)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanCharReferenceValue
(XMLScanner.java:1304)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCharReference
(XMLDocumentFragmentScannerImpl.java:1259)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDispatcher.dispatch
(XMLDocumentFragmentScannerImpl.java:1753)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument
(XMLDocumentFragmentScannerImpl.java:368)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
(XML11Configuration.java:834)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse
(XMLParser.java:148)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse
(AbstractSAXParser.java:1242)
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseXmlInputStream
(XmlResponsesSaxParser.java:113)
... 6 more
Exception in thread "main" org.jets3t.service.S3ServiceException:
Failed to list all objects in S3 bucket
at org.jets3t.service.utils.FileComparer
$1.s3ServiceEventPerformed(FileComparer.java:452)
at
org.jets3t.service.multithread.S3ServiceMulti.fireServiceEvent
(S3ServiceMulti.java:199)
at org.jets3t.service.multithread.S3ServiceMulti
$1.fireErrorEvent(S3ServiceMulti.java:307)
at org.jets3t.service.multithread.S3ServiceMulti
$ThreadGroupManager.run(S3ServiceMulti.java:2483)
at org.jets3t.service.multithread.S3ServiceMulti.listObjects
(S3ServiceMulti.java:288)
at org.jets3t.service.utils.FileComparer$2.run
(FileComparer.java:482)
at org.jets3t.service.utils.FileComparer.listObjectsThreaded
(FileComparer.java:480)
at org.jets3t.service.utils.FileComparer.listObjectsThreaded
(FileComparer.java:557)
at
org.jets3t.service.utils.FileComparer.buildS3ObjectMapPartial
(FileComparer.java:628)
at
org.jets3t.apps.synchronize.Synchronize.uploadLocalDirectoryToS3
(Synchronize.java:316)
at org.jets3t.apps.synchronize.Synchronize.run
(Synchronize.java:909)
at org.jets3t.apps.synchronize.Synchronize.main
(Synchronize.java:1418)
Caused by: org.jets3t.service.S3ServiceException: Failed to parse XML
document with handler class
org.jets3t.service.impl.rest.XmlResponsesSaxParser$ListBucketHandler
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseXmlInputStream
(XmlResponsesSaxParser.java:122)
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseListBucketObjectsResponse
(XmlResponsesSaxParser.java:190)
at
org.jets3t.service.impl.rest.httpclient.RestS3Service.listObjectsInternal
(RestS3Service.java:1121)
at
org.jets3t.service.impl.rest.httpclient.RestS3Service.listObjectsChunkedImpl
(RestS3Service.java:1087)
at org.jets3t.service.S3Service.listObjectsChunked
(S3Service.java:1488)
at org.jets3t.service.multithread.S3ServiceMulti
$ListObjectsRunnable.run(S3ServiceMulti.java:1846)
at java.lang.Thread.run(Thread.java:595)
Caused by: org.xml.sax.SAXParseException: Character reference "&#x1"
is an invalid XML character.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException
(ErrorHandlerWrapper.java:236)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError
(ErrorHandlerWrapper.java:215)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError
(XMLErrorReporter.java:386)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError
(XMLErrorReporter.java:316)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError
(XMLScanner.java:1438)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanCharReferenceValue
(XMLScanner.java:1304)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCharReference
(XMLDocumentFragmentScannerImpl.java:1259)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDispatcher.dispatch
(XMLDocumentFragmentScannerImpl.java:1753)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument
(XMLDocumentFragmentScannerImpl.java:368)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
(XML11Configuration.java:834)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse
(XMLParser.java:148)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse
(AbstractSAXParser.java:1242)
at
org.jets3t.service.impl.rest.XmlResponsesSaxParser.parseXmlInputStream
(XmlResponsesSaxParser.java:113)
... 6 more

When running synchronize.sh on other directories, no error is given.
What might be the problem? Is it related to path names, or something
else? Anyone with a clue? Are there ways to find out which file(s)
that causes the problem?

/Clas

James Murty

unread,
Jul 30, 2009, 1:55:08 AM7/30/09
to jets3t...@googlegroups.com
Hi Clas,

I'm afraid you have hit an old and unresolved issue with S3: it is possible to store objects in S3 with names that cannot be properly express in the XML listing response from the service.

There is a long and ultimately fruitless discussion about the problem here:
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=10869

JetS3t does some XML pre-parsing to avoid some problems like this, but it does not handle content such as "".

The only way around this problem may be to try using a ruby-based tool to list the objects in this bucket so you can identify and remove the offending object. I recall that the ruby XML parser is much less strict than most, which actually makes it better in this case. I would recommend the s3cmd tool available here:
http://s3sync.net/wiki

Aside from dealing with this troublesome object name, you should find out how this item was stored in S3 in the first place. If it was uploaded by Synchronize, you should check the name of the file on your system to ensure it doesn't contain odd control characters in it, otherwise the same issue will recur.

I hope this helps somewhat. It's a tricky problem, unfortunately.

James

---
http://www.jamesmurty.com

Clas

unread,
Jul 30, 2009, 2:22:01 PM7/30/09
to JetS3t Users

Thanks James for the answer. Yes, s3cmd reads the files names nicely.
I have would have used the Ruby-based tools if it wasn't for the fact
that they require Ruby 1.8.4 or later, which I am not is able to
upgrade to on this specific server. I think I have located the files
with bad file names (with a carriage return at the end?), which seem
to be files generated by a "bad" PHP script. Thanks!

/Clas




On Jul 30, 7:55 am, James Murty <jamu...@gmail.com> wrote:
> Hi Clas,
>
> I'm afraid you have hit an old and unresolved issue with S3: it is possible
> to store objects in S3 with names that cannot be properly express in the XML
> listing response from the service.
>
> There is a long and ultimately fruitless discussion about the problem here:http://developer.amazonwebservices.com/connect/thread.jspa?threadID=1...
>
> JetS3t does some XML pre-parsing to avoid some problems like this, but it
> does not handle content such as "&#x13;".
>
> The only way around this problem may be to try using a ruby-based tool to
> list the objects in this bucket so you can identify and remove the offending
> object. I recall that the ruby XML parser is much less strict than most,
> which actually makes it better in this case. I would recommend the s3cmd
> tool available here:http://s3sync.net/wiki
>
> Aside from dealing with this troublesome object name, you should find out
> how this item was stored in S3 in the first place. If it was uploaded by
> Synchronize, you should check the name of the file on your system to ensure
> it doesn't contain odd control characters in it, otherwise the same issue
> will recur.
>
> I hope this helps somewhat. It's a tricky problem, unfortunately.
>
> James
>
> ---http://www.jamesmurty.com

James Murty

unread,
Jul 31, 2009, 1:10:11 AM7/31/09
to jets3t...@googlegroups.com
Hi Clas,

I'm glad you've been able to identify the bad file, and I'm also glad that it wasn't Synchronize that uploaded it :)

It's a hassle that S3 accepts filenames that cannot subsequently be listed, but it's just one of those quirks you need to try and avoid.

James
Reply all
Reply to author
Forward
0 new messages