Unable to pull full commit history; Malformed XML not well-formed (invalid token)

98 views
Skip to first unread message

Carl Henrik Bernhoft

unread,
Sep 29, 2022, 8:15:56 AM9/29/22
to us...@subversion.apache.org
Using SVN version:
svn, version 1.14.1 (r1886195)
  compiled May 21 2022, 10:52:35 on x86_64-pc-linux-gnu.

I tried pulling the commit log from https://github.com/haskell/random/branches/master but the process snags on https://github.com/haskell/random/commit/a44b801ab0033970660396a42462c4f7b4df56bb which corresponds to revision 897.

The error is reproducible from the command line with 
svn log --xml -r897 https://github.com/haskell/random/branches/master
 
The output is
svn: E175009: The XML response contains invalid XML
svn: E130003: Malformed XML: not well-formed (invalid token) at line 8


I was expecting control characters to be stripped, fuzzified or otherwise handled. The offending line in the commit, when printed with 'git log' displays as: 'Merged Martin SjA?gren's patch for multiline descriptionsb'
which looks like reasonable output by comparison.

I'm saw that handling unicode and control characters was a topic of discussion years ago but this case looks like a bug to me. It doesn't seem reasonable for the process to crash when pulling a remote commit log of a repo I don't own/control.

Jon Daley via users

unread,
Sep 29, 2022, 10:16:08 AM9/29/22
to Carl Henrik Bernhoft, us...@subversion.apache.org
Interesting. I can confirm the behavior (I have the same version as you).
Unfortunately, I can't help you. Google seems to say the repo is
corrupted, and needs to be fixed.
--
Jon Daley
https://jon.limedaley.com
~~
Live your life around the word of God and especially the Gospel.
-- Greg Gill

Daniel Sahlberg

unread,
Sep 29, 2022, 10:51:34 AM9/29/22
to Carl Henrik Bernhoft, us...@subversion.apache.org
Den tors 29 sep. 2022 kl 14:15 skrev Carl Henrik Bernhoft <c...@bernhoft.no>:
Using SVN version:
svn, version 1.14.1 (r1886195)
  compiled May 21 2022, 10:52:35 on x86_64-pc-linux-gnu.

I tried pulling the commit log from https://github.com/haskell/random/branches/master but the process snags on https://github.com/haskell/random/commit/a44b801ab0033970660396a42462c4f7b4df56bb which corresponds to revision 897.

The error is reproducible from the command line with 
svn log --xml -r897 https://github.com/haskell/random/branches/master
 
The output is
svn: E175009: The XML response contains invalid XML
svn: E130003: Malformed XML: not well-formed (invalid token) at line 8


The error reproduces without the --xml argument as well:
svn: E175009: The XML response contains invalid XML
svn: E130003: Malformed XML: not well-formed (invalid token) at line 8

I suppose the "XML" mentioned in the error message is because the http / webdav protocol is XML based. XML is quite picky about encoding certain characters.

I was expecting control characters to be stripped, fuzzified or otherwise handled. The offending line in the commit, when printed with 'git log' displays as: 'Merged Martin SjA?gren's patch for multiline descriptionsb'
which looks like reasonable output by comparison.

I'm saw that handling unicode and control characters was a topic of discussion years ago but this case looks like a bug to me. It doesn't seem reasonable for the process to crash when pulling a remote commit log of a repo I don't own/control.

I'm guessing - but didn't verify yet since I couldn't figure out a way to quickly sniff the network traffic - that Github's servers are not encoding the character properly when they send it to the client. My google-fu wasn't enough to find if the error has been discussed on the GitHub side.

Kind regards,
Daniel
Reply all
Reply to author
Forward
0 new messages