I am using .Net and XMLDocument.SelectNodes to search XML documents. Up
til now this worked very well. However, ever since I was assigned to parse
an Excel document (Excel|Save as) stored as an XML document I started
doubting myself.
Somehow when I try to read an Excel XML document it will traverse ALL
nodes when I use "//*" as query but it will not select one single
worksheet when I use "//Worksheet". In the latter it returns zero nodes.
Did MS build something into XPath refraing me from querying Excel XML
documents?
The XML looks fine. Does anybody have a clue?
Thanks, André
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>André Jager</Author>
<LastAuthor>André Jager</LastAuthor>
<Created>2004-07-02T06:18:24Z</Created>
<Company>SPE Unna</Company>
<Version>11.5606</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>9210</WindowHeight>
<WindowWidth>15195</WindowWidth>
<WindowTopX>0</WindowTopX>
<WindowTopY>90</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21" ss:Name="Euro">
<NumberFormat
ss:Format="_-* #,##0.00\ "€"_-;\-*
#,##0.00\ "€"_-;_-* "-"??\ "€"_-;_-@_-"/>
</Style>
</Styles>
<Worksheet ss:Name="MyWorksheet">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60">
<Row>
<Cell><Data ss:Type="String">Test</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="Number">123</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">Test2</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="Number">1234</Data></Cell>
</Row>
<Row>
<Cell ss:Index="2" ss:StyleID="s21"
ss:Formula="=SUM(R[-2]C:R[-1]C)"><Data
ss:Type="Number">1357</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>7</ActiveRow>
<ActiveCol>5</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Tabelle2">
<Table ss:ExpandedColumnCount="0" ss:ExpandedRowCount="0"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60"/>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Tabelle3">
<Table ss:ExpandedColumnCount="0" ss:ExpandedRowCount="0"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60"/>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Cheers,
Jeff Rafter
"André Jager" <andre...@spe-unna.de> wrote in message
news:opsahy25...@saturn.spe-unna.de...
cheers, André
On Fri, 2 Jul 2004 08:06:38 -0700, Jeff Rafter <n...@no.com> wrote:
> Is this really the document you are trying to parse? It isn't
> wellformed. It
> would need something like encoding="ISO-8859-1" in the <?xm?>
> declaration.
>
> Cheers,
> Jeff Rafter
>
> "André Jager" <andre...@spe-unna.de> wrote in message
> news:opsahy25...@saturn.spe-unna.de...
>> Hello,
>>
>> I am using .Net and XMLDocument.SelectNodes to search XML documents. Up
>> til now this worked very well. However, ever since I was assigned to
>> parse
>> an Excel document (Excel|Save as) stored as an XML document I started
>> doubting myself.
>>
>> Somehow when I try to read an Excel XML document it will traverse ALL
>> nodes when I use "//*" as query but it will not select one single
>> worksheet when I use "//Worksheet". In the latter it returns zero nodes.
>>
>> Did MS build something into XPath refraing me from querying Excel XML
>> documents?
>>
>> The XML looks fine. Does anybody have a clue?
>>
>> Thanks, André
Is that tongue in cheek? I can imagine it.... : ) Basically the problem is
an encoding problem-- the document claims to be UTF-8 or UTF-16 (Unicode)...
because that is the default. However, the "é" in your name and the Euro sign
are not valid UTF-8 codepoints. Delete those and it should work fine. If you
need to automate the solution, probably the easiest way is to read the
document into a stream/string, then convert any character whose value is >
127 to "&#nnn;" This is fairly safe for what you are doing. Of course, if
you have to worry about more esoteric characters (like from astral planes or
surrogate pairs) it gets a bit more tricky.
HTH,
Jeff Rafter
I found some info: http://www.topxml.com/people/bosley/defaultns.asp
thanks, André
--