Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XPath and Excel generated XML does not work??

52 views
Skip to first unread message

André Jager

unread,
Jul 2, 2004, 2:31:43 AM7/2/04
to
Hello,

I am using .Net and XMLDocument.SelectNodes to search XML documents. Up
til now this worked very well. However, ever since I was assigned to parse
an Excel document (Excel|Save as) stored as an XML document I started
doubting myself.

Somehow when I try to read an Excel XML document it will traverse ALL
nodes when I use "//*" as query but it will not select one single
worksheet when I use "//Worksheet". In the latter it returns zero nodes.

Did MS build something into XPath refraing me from querying Excel XML
documents?

The XML looks fine. Does anybody have a clue?

Thanks, André

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>André Jager</Author>
<LastAuthor>André Jager</LastAuthor>
<Created>2004-07-02T06:18:24Z</Created>
<Company>SPE Unna</Company>
<Version>11.5606</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>9210</WindowHeight>
<WindowWidth>15195</WindowWidth>
<WindowTopX>0</WindowTopX>
<WindowTopY>90</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21" ss:Name="Euro">
<NumberFormat
ss:Format="_-* #,##0.00\ &quot;€&quot;_-;\-*
#,##0.00\ &quot;€&quot;_-;_-* &quot;-&quot;??\ &quot;€&quot;_-;_-@_-"/>
</Style>
</Styles>
<Worksheet ss:Name="MyWorksheet">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60">
<Row>
<Cell><Data ss:Type="String">Test</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="Number">123</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">Test2</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="Number">1234</Data></Cell>
</Row>
<Row>
<Cell ss:Index="2" ss:StyleID="s21"
ss:Formula="=SUM(R[-2]C:R[-1]C)"><Data
ss:Type="Number">1357</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>7</ActiveRow>
<ActiveCol>5</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Tabelle2">
<Table ss:ExpandedColumnCount="0" ss:ExpandedRowCount="0"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60"/>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Tabelle3">
<Table ss:ExpandedColumnCount="0" ss:ExpandedRowCount="0"
x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="60"/>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.4921259845"/>
<Footer x:Margin="0.4921259845"/>
<PageMargins x:Bottom="0.984251969" x:Left="0.78740157499999996"
x:Right="0.78740157499999996" x:Top="0.984251969"/>
</PageSetup>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jeff Rafter

unread,
Jul 2, 2004, 11:06:38 AM7/2/04
to
Is this really the document you are trying to parse? It isn't wellformed. It
would need something like encoding="ISO-8859-1" in the <?xm?> declaration.

Cheers,
Jeff Rafter

"André Jager" <andre...@spe-unna.de> wrote in message
news:opsahy25...@saturn.spe-unna.de...

André Jager

unread,
Jul 2, 2004, 5:48:01 PM7/2/04
to
This is the original document Excel generated. I am not that deep into
XML, I can't imagine Microsoft not to be conform ;-)

cheers, André

On Fri, 2 Jul 2004 08:06:38 -0700, Jeff Rafter <n...@no.com> wrote:

> Is this really the document you are trying to parse? It isn't
> wellformed. It
> would need something like encoding="ISO-8859-1" in the <?xm?>
> declaration.
>
> Cheers,
> Jeff Rafter
>
> "André Jager" <andre...@spe-unna.de> wrote in message
> news:opsahy25...@saturn.spe-unna.de...
>> Hello,
>>
>> I am using .Net and XMLDocument.SelectNodes to search XML documents. Up
>> til now this worked very well. However, ever since I was assigned to
>> parse
>> an Excel document (Excel|Save as) stored as an XML document I started
>> doubting myself.
>>
>> Somehow when I try to read an Excel XML document it will traverse ALL
>> nodes when I use "//*" as query but it will not select one single
>> worksheet when I use "//Worksheet". In the latter it returns zero nodes.
>>
>> Did MS build something into XPath refraing me from querying Excel XML
>> documents?
>>
>> The XML looks fine. Does anybody have a clue?
>>
>> Thanks, André

Jeff Rafter

unread,
Jul 3, 2004, 12:54:47 AM7/3/04
to
> This is the original document Excel generated. I am not that deep into
> XML, I can't imagine Microsoft not to be conform ;-)

Is that tongue in cheek? I can imagine it.... : ) Basically the problem is
an encoding problem-- the document claims to be UTF-8 or UTF-16 (Unicode)...
because that is the default. However, the "é" in your name and the Euro sign
are not valid UTF-8 codepoints. Delete those and it should work fine. If you
need to automate the solution, probably the easiest way is to read the
document into a stream/string, then convert any character whose value is >
127 to "&#nnn;" This is fairly safe for what you are doing. Of course, if
you have to worry about more esoteric characters (like from astral planes or
surrogate pairs) it gets a bit more tricky.

HTH,
Jeff Rafter


André Jager

unread,
Jul 4, 2004, 5:24:51 PM7/4/04
to
It looks like my problem lies in the namespaces used. The default
namespace is not selected. When I prefix the default namespace with some
arbitrary characters, the XPath query works.

I found some info: http://www.topxml.com/people/bosley/defaultns.asp

thanks, André

--

0 new messages