I have embedded the shredding in a stored procedure like this:
Isolate the logic in tempdb, a copy of the relational tables and a indexed
copy of the xml-document via xmlparse(). This is fast.
After shredding the header I have a while loop until all line items are
shredded:
@idx is a line counter inside the while loop.
/* xmlextract expressions */
select @XPathEANBarcode = "/Navision_SOC_1/Line[" + convert(varchar,@idx)
+ "]/Number/text()",
@XPathModelNo = "/Navision_SOC_1/Line[" + convert(varchar,@idx)
+ "]/Common_Item_Number/text()",
... and so on
insert #MyTmpTable (....
EANBarcode,
ModelNo,
.....
select ...
stream_to_string(xmlextract(@XPathEANBarcode, Stream)), --
EANBarcode
stream_to_string(xmlextract(@XPathModelNo, Stream)), -- ModelNo
....
from #EnvelopeInStream
At last I inserts the relational data into the real tables in a short
transaction.
This is great functionality and easy to code.
But I have som questions.
I wonder if this is the clever way to do it?
To shred the document it will the be number of detail columns multiplied
with the number of detail rows acesses into the document through
xmlextract. This will take some time, especially with large documents. Are
there better ways?
While working with this I have been forced to increase the configuration
parameter "heap memory per user" quite a lot. What impact will this have
on the server as a whole?
I have just done this test on a new Solaris-box:
XML-doc 1,5Mb when in the file system.
1000 Line Items.
"Total shred time" 4 minutes.
I configured "heap memory per user" to 0,8Mb (800000)
When i tried to test 3000 Line Items the "heap memory per user" was not
large enough.
I
have little experience with xml, will this be faster or slower than to
parse the document in "standard parsers" outside the database server?
Will it be large impact on the other processes in the ASE-server? It seems
memory- and CPU-intensive.
Thanks!
Tor Ingvaldsen
ASE 15beta2 will have some better XML handling capabilities - when the
beta is available, you may want to participate as your exact problem
(shredding) is one that is being focused on.
> Shredding outside the server is probably the best option now as the
> external parsers can take advantage of XML schema support, etc....
Probably, but in our system we just need the options ASE offers.
I still think this is great functionality because it's fast and easy to
implement/deploy (stored proc).
I don't see this as a problem yet. I have adjusted the procedure some and
the "shredding-time" for a thousand detail lines (or 1,5Mb) is now just
under 3 minutes. I'm just curious of whether this is much compared to
shred it outside the ASE and do 1000 inserts (over the network). I will
test this myself.
It doesn't matter if the processing time takes place inside the ASE-server
provided that this will not steal disproportionate resources from the
server.
Therefore I was curious about the "heap memory per user" parameter and if
there were some experience about this. It seems to me that ASE needs
enough "heap memory per user" to hold the whole XML Document in memory. In
theory I think that is OK provided we have enough memory. We will see when
we starts to get some traffic in the system.
> ASE 15beta2 will have some better XML handling capabilities - when the
> beta is available, you may want to participate as your exact problem
> (shredding) is one that is being focused on.
I might do that, thanks for the tip-off!
--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
"ASE also provides flexible ways outside the transaction
path to shred the XML data and populate tables with the
relevant trade data in relational form."
So what are these flexible ways? I haven't found much help
in the Sybase documentation or on this website. I haven't
tried the solution offered in this thread but it looks like
it would struggle performance wise with larger files. Does
anyone have a working solution for FIXML?
Any hints and help would be greatly appreciated.
TIA,
Thomas
I have not found much information either, but used this manual.
http://manuals.sybase.com:80/onlinebooks/group-as/asg1251e/xmlb
I've found the "shredding functionality" in ASE very useful. We shred many
different XML-formats and we implements new "XML-parsers" very fast as
stored procedures. This is easy and cheap, but the great disadvantage is
that this form for parsing needs the whole document in memory. ASE has to
be configured with enough "heap memory per user" to hold the document.
This is obviously not good enough if the size of the documents are
indeterminable. Another problem (as with many XML parsers) is that
performance degrades exponentially as the size of the the document
decreases, but ASE's indexing mechanism helps somewhat. If the documents
are relatively small it is very fast.
We need to shred the XML-document only once and would rather wanted a
"cursor style" API who could scan the document from top to bottom in one
run. An API who does this is of course StAX and we have begun to implement
in StAX outside ASE instead. We can parse very large files now, but the
development time is much, much longer.
StAX in ASE would have been something.
Regards
Tor
--