we are having some performance issues regarding xml shredding.
At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.
Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.
We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.
This is how we shred the data.
------------------------------
1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)
INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)
DROP #XmlTable
4) Pass the temporary table #TransformedData into the common/shared
transformation procedure
EXEC LookupData
-------------------------------
This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?
Regards,
Johnny
Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx
You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!
SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.
Let us know how you get on!
"Johnny Persson" wrote:
> .
>
thank you for your answer!
I am a bit doubtful about the OPENXML idea. My main issue with OPENXML
is that you store your xml into a variable. That means that you are
unable to put an index to your xml. My tiny experience says that OPENXML
rarely beats nodes/CROSS APPLY-method.
I will however try the bulk importing first, if I am not pleased with
the result I sure will try the OPENXML-method :)
What would you say about the I/O-impact when joining the bulk loaded
tables before transformations/lookups?
Anyway, I will try the bulk load thing right away.. thanks once again!
I ended up with a CLR C#-parser. The performance is much better now!
Thank you!
On 2010-03-08 11:03, Bob wrote:
OPENXML builds DOM in memory so there is no need for any indexes on
variable.
In fact, I can confirm, that on large documents, OPEMXML will leave nodes()
in dust far behind.
yet, on small documents, it's foolish to use it as it grabs 1/8 of SQL
memory and bad handling or forgetting to close documents will leak memory.
"Johnny Persson" <a@a.a> wrote in message
news:eiGOXvsv...@TK2MSFTNGP06.phx.gbl...
Good to know. How large xml documents are we talking about?
Regards,
Johnny
http://blog.scmike.com/2010/tsql/fast-shredding-xml-sql-server/
"Johnny Persson" wrote:
> .
>