This process has worked just great for jundreds of downloaded
documents.
Recently I have gotten this error from word 2003
The xml file xxxxxxxx cannot be opened because there are
problems with the contents
An invalid character was found in text content
Error location: Line: 3, col: 2034
I do not have a text editor that allows me to go to this location in
the file. Can anyone suggest such?
I did use Notepad++ and with UTP-8 encoding and word wrap on, I
scrolled down to the end of the file. About 20 characters from the
end, the display was unintelligible characters. If I changed the
encoding to Ascii, I could see the text which looked to be normal
characters.
1) How can I get to the root cause?
2) Any text editors that will let me go to precisely the specified row
and column?
Bob
Emacs.
Alternatively, if allowed, post the file somewhere we can retrieve it from.
///Peter
--
XML FAQ: http://xml.silmaril.ie/
"Bob Alston" <boba...@gmail.com> wrote in message
news:7998345c-67b7-4ec5...@r34g2000yqj.googlegroups.com...
> I have downloaded a formated xml data stream and saved it as a *.txt
> file. I normally load it into Word 2003 and then output it as an
> *.xml document. The request and response specifies UTP-8.
>
> This process has worked just great for jundreds of downloaded
> documents.
>
> Recently I have gotten this error from word 2003
>
> The xml file xxxxxxxx cannot be opened because there are
> problems with the contents
>
> An invalid character was found in text content
> Error location: Line: 3, col: 2034
>
> I do not have a text editor that allows me to go to this location in the
> file. Can anyone suggest such?
If it is just a character column you could use Notepad with Wrap off and
Status bar on.
Is it possible that Word is pointing me to the wrong place?
Bob
Yes; if the document contains multibyte characters (eg UTF-8) and if one
of them is corrupt, it may push the byte offsets out, so the pointer may
be on the wrong byte.
Use Emacs, or any of the large XML editors that allows you to load a
malformed file in order to repair it. Notepad is not such an editor.
"Bob Alston" <boba...@gmail.com> wrote in message
news:d8944787-a884-4f78...@q32g2000yqb.googlegroups.com...
Yes. The error message frequently refers to the root file when in fact the
error is in a script file that it calls. A symptom of that being the case
occurs when the reported "line number" exceeds the number of lines in the
root file. I would try using ProcMon to clarify the context of the error
better.
Good luck
Robert
---
Unfortunately while it could replace the character for me, it could
not tell me exactly where the character exists nor why?
Can anyone tell me if this is a common invalid character in XML and
the likely cause?
Bob
"Bob Alston" <boba...@gmail.com> wrote in message
news:27df1fd6-4e62-4873...@h9g2000yqm.googlegroups.com...
> I downloaded a free 30 day trial of Akltova XML spy and opened the
> *.txt file using that software. I told me there was one invalid
> character that should not be present using UTF-8 encoding.
> It said the offending character was: 0xBF and it showed an upside down
> "?" in
> front of the 0xBF
That's what Charmap says. So I would suspect a substitution already
occurred somewhere else--or your new tool is misinterpreting something too.
<eg>
BTW how does this correlate with what Notepad showed you ("simple lowercase
character in normal word")? E.g. consider the context, not just the
problem character.
Robert
---
I found the offending character. It was the 0xBF character. It
immediately preceded the name of a person, at the end of a paragraph
of text. Sort of a "signature" apparently identifying the author.
The person's name is middle eastern.
On our character set, it is a box drawing character. You can enter it
easily by holding down the ALT key while typing 191 on the keypad.
Bob
At a wild random guess, that part of the file was copied and pasted from
a document that had been written on an obsolete system using Windows
1252 or something like. A previous process found the character (possibly
part of a different multibyte character) and turned it into a 0xBF. See
http://en.wikipedia.org/wiki/Unicode_Specials for details of this
behaviour. The solution is to go back to whoever generated the document
and tell them it is not Unicode-compliant and they should edit it and
regenerate it if they want it processed. And to fix their input systems
to make sure it doesn't happen again.
Thanks for the scenario on how it might have happened.
I think the key here is for the system generating the XML response, to
use filters to ensure that if the encoding is UTP-8, that only UTP-8
characters are included. It seems to me that is the responsibility of
the system generating the XML response.
Bob
I should have also mentioned that the document in question is
generated by a State government system.
Great fun.
Bob
Oops. I incorrectly stated UTP-8 above. It should have been UTF-8.
Guess that just shows I know about unshielded twisted pairs <grin>
Bob