Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Shred XML, the best way?

17 views
Skip to first unread message

Tor Magne Ingvaldsen

unread,
Nov 17, 2004, 1:25:39 PM11/17/04
to
I store XML-documents in an image-column in ASE. I need to shred it into
relational data once and never acsesses the document again, just keep it
for archiving. The relational structure is header - detail so I need to
find each line item in the xml and shred it into a row.

I have embedded the shredding in a stored procedure like this:
Isolate the logic in tempdb, a copy of the relational tables and a indexed
copy of the xml-document via xmlparse(). This is fast.

After shredding the header I have a while loop until all line items are
shredded:
@idx is a line counter inside the while loop.

/* xmlextract expressions */
select @XPathEANBarcode = "/Navision_SOC_1/Line[" + convert(varchar,@idx)
+ "]/Number/text()",
@XPathModelNo = "/Navision_SOC_1/Line[" + convert(varchar,@idx)
+ "]/Common_Item_Number/text()",
... and so on

insert #MyTmpTable (....
EANBarcode,
ModelNo,
.....
select ...
stream_to_string(xmlextract(@XPathEANBarcode, Stream)), --
EANBarcode
stream_to_string(xmlextract(@XPathModelNo, Stream)), -- ModelNo
....
from #EnvelopeInStream

At last I inserts the relational data into the real tables in a short
transaction.

This is great functionality and easy to code.

But I have som questions.
I wonder if this is the clever way to do it?
To shred the document it will the be number of detail columns multiplied
with the number of detail rows acesses into the document through
xmlextract. This will take some time, especially with large documents. Are
there better ways?

While working with this I have been forced to increase the configuration
parameter "heap memory per user" quite a lot. What impact will this have
on the server as a whole?

I have just done this test on a new Solaris-box:
XML-doc 1,5Mb when in the file system.
1000 Line Items.
"Total shred time" 4 minutes.
I configured "heap memory per user" to 0,8Mb (800000)
When i tried to test 3000 Line Items the "heap memory per user" was not
large enough.
I
have little experience with xml, will this be faster or slower than to
parse the document in "standard parsers" outside the database server?

Will it be large impact on the other processes in the ASE-server? It seems
memory- and CPU-intensive.

Thanks!
Tor Ingvaldsen

Jeff Tallman

unread,
Nov 21, 2004, 10:38:16 PM11/21/04
to

Shredding outside the server is probably the best option now as the
external parsers can take advantage of XML schema support, etc....

ASE 15beta2 will have some better XML handling capabilities - when the
beta is available, you may want to participate as your exact problem
(shredding) is one that is being focused on.

Tor I

unread,
Nov 22, 2004, 9:03:33 AM11/22/04
to
Thank you for your time and answer!

> Shredding outside the server is probably the best option now as the
> external parsers can take advantage of XML schema support, etc....

Probably, but in our system we just need the options ASE offers.
I still think this is great functionality because it's fast and easy to
implement/deploy (stored proc).

I don't see this as a problem yet. I have adjusted the procedure some and
the "shredding-time" for a thousand detail lines (or 1,5Mb) is now just
under 3 minutes. I'm just curious of whether this is much compared to
shred it outside the ASE and do 1000 inserts (over the network). I will
test this myself.
It doesn't matter if the processing time takes place inside the ASE-server
provided that this will not steal disproportionate resources from the
server.
Therefore I was curious about the "heap memory per user" parameter and if
there were some experience about this. It seems to me that ASE needs
enough "heap memory per user" to hold the whole XML Document in memory. In
theory I think that is OK provided we have enough memory. We will see when
we starts to get some traffic in the system.

> ASE 15beta2 will have some better XML handling capabilities - when the
> beta is available, you may want to participate as your exact problem
> (shredding) is one that is being focused on.

I might do that, thanks for the tip-off!

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Tor I

unread,
Nov 23, 2004, 5:00:39 AM11/23/04
to
Some more numbers if someone are interested.
We received 243 relatively small XML-files yesterday and I traced the
execution time (getdate()).
The largest document had 171 detail lines and the smallest just a few,
average 12 detail lines.
Average "shred time" was 0.95 sec. The largest 14 sec and the smallest
0.02 sec.
That is, no problem to receive and shred a lot of small XML-files.

Thomas

unread,
Apr 6, 2005, 6:51:37 AM4/6/05
to
I need to shred 20mb FIXML files. Quoting from the Sybase
document 'Managing XML with Adaptive Server Enterprise':

"ASE also provides flexible ways outside the transaction
path to shred the XML data and populate tables with the
relevant trade data in relational form."

So what are these flexible ways? I haven't found much help
in the Sybase documentation or on this website. I haven't
tried the solution offered in this thread but it looks like
it would struggle performance wise with larger files. Does
anyone have a working solution for FIXML?

Any hints and help would be greatly appreciated.

TIA,
Thomas

Tor I

unread,
Apr 11, 2005, 2:32:59 PM4/11/05
to
Hi,
I can't answer your spesific questions, but can give some general
information from my experiences.

I have not found much information either, but used this manual.
http://manuals.sybase.com:80/onlinebooks/group-as/asg1251e/xmlb

I've found the "shredding functionality" in ASE very useful. We shred many
different XML-formats and we implements new "XML-parsers" very fast as
stored procedures. This is easy and cheap, but the great disadvantage is
that this form for parsing needs the whole document in memory. ASE has to
be configured with enough "heap memory per user" to hold the document.
This is obviously not good enough if the size of the documents are
indeterminable. Another problem (as with many XML parsers) is that
performance degrades exponentially as the size of the the document
decreases, but ASE's indexing mechanism helps somewhat. If the documents
are relatively small it is very fast.

We need to shred the XML-document only once and would rather wanted a
"cursor style" API who could scan the document from top to bottom in one
run. An API who does this is of course StAX and we have begun to implement
in StAX outside ASE instead. We can parse very large files now, but the
development time is much, much longer.

StAX in ASE would have been something.

Regards
Tor

--

0 new messages