Issue with special characters - OpenStream

152 views
Skip to first unread message

Murillo Braga

unread,
Jan 31, 2017, 11:32:27 AM1/31/17
to Caché, Ensemble, DeepSee

Hello mates,


In the input method (which is part of my business service), I guess that I've found a bug on the OpenStream method, that belongs to %XML.Reader.


Basically the pInput is a XML which contains a special character (single left quote >>> ).


Here is the piece of code:


Method OnProcessInput(pInput As %Stream, Output pOutput As %RegisteredObject, ByRef pHint As %String) As %Status
{
    // Create an instance of %XML.Reader
    Set xread = ##class(%XML.Reader).%New()
    Set xread.IgnoreNull=1
        
    // Begin processing of the file
    pInput.Rewind()
    
    sc= xread.OpenStream(pInput)
   
     if $$$ISERR(sc)
{
set fs=##class(%Stream.FileCharacter).%New()
set fs.Filename="c:\TEMP\OnProcessInput00.xml"
set fs.TranslateTable = "UTF8"
do fs.Write($$$StatusDisplayString(sc))
set tSC=fs.%Save()
}


Here I had the file created and the error message written in it (ERROR #6301: SAX XML Parser Error: invalid character 0x18 while processing Anonymous Stream at line 1 offset 4627).


So I presume that
sc= xread.OpenStream(pInput)

Is erroring, because the stream contains a special character.


What do you guys think?


Thanks in advance

Gertjan Klein

unread,
Jan 31, 2017, 12:48:42 PM1/31/17
to intersystems...@googlegroups.com
Murillo Braga wrote:

> In the input method (which is part of my business service), I guess that
> I've found a bug on the OpenStream method, that belongs to %XML.Reader.

That seems unlikely.

> Basically the pInput is a XML which contains a special character
> (single left quote >>> ‘ ).

That may or may not be related.

[...]
> Here I had the file created and the error message written in it (ERROR
> #6301: SAX XML Parser Error: invalid character 0x18 while processing
> Anonymous Stream at line 1 offset 4627).
>
> So I presume that [...] Is erroring, because the stream contains
> a special character.

That seems correct. I have confirmed that OpenStream errors on a test
file with a 0x18 byte in it.

The question is: why is that byte there? Most XML is transmitted as
UTF-8. If this is the case with your data, this byte directly maps to
the ASCII CAN (Cancel, Ctrl-X) control code. This is valid in UTF-8, but
not in XML [1]. It is not a character.

Could it be that your data is a mix of ASCII or UTF-8 and UTF-16? The
latter encoding for a single left quote is the two bytes 0x20 and 0x18,
which translate to space and CAN in UTF-8.

Otherwise it seems most likely that your data got corrupted somehow.
Perhaps examining it directly, around the area where the offending byte
occurs, can give you some clue.

Regards,
Gertjan.

[1] https://en.wikipedia.org/wiki/Valid_characters_in_XML
Reply all
Reply to author
Forward
0 new messages