Xml shredding performance

Johnny Persson

unread,

Mar 6, 2010, 10:34:23 AM3/6/10

to

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny

Bob

unread,

Mar 8, 2010, 5:03:01 AM3/8/10

to

Have a look through this article:

Examples of Bulk Importing and Exporting XML Documents
http://msdn.microsoft.com/en-us/library/ms191184.aspx

You should probably contrast OPENXML performance with your nodes / CROSS
APPLY method. I much prefer nodes with CROSS APPLY however some people
report OPENXML as faster particularly with larger documents. Try it with
your data!

SQLXML Bulkload is fast for loading data if you can get it working, but not
transforming. This could be called from C#.

Let us know how you get on!

"Johnny Persson" wrote:

> .
>

Johnny Persson

unread,

Mar 8, 2010, 9:30:00 AM3/8/10

to

Hi Bob,

thank you for your answer!

I am a bit doubtful about the OPENXML idea. My main issue with OPENXML
is that you store your xml into a variable. That means that you are
unable to put an index to your xml. My tiny experience says that OPENXML
rarely beats nodes/CROSS APPLY-method.

I will however try the bulk importing first, if I am not pleased with
the result I sure will try the OPENXML-method :)

What would you say about the I/O-impact when joining the bulk loaded
tables before transformations/lookups?

Anyway, I will try the bulk load thing right away.. thanks once again!

Johnny Persson

unread,

Mar 9, 2010, 4:22:23 AM3/9/10

to

Hi Bob,

I ended up with a CLR C#-parser. The performance is much better now!

Thank you!

On 2010-03-08 11:03, Bob wrote:

Farmer

unread,

Mar 22, 2010, 3:40:23 PM3/22/10

to

Johnny

OPENXML builds DOM in memory so there is no need for any indexes on
variable.
In fact, I can confirm, that on large documents, OPEMXML will leave nodes()
in dust far behind.
yet, on small documents, it's foolish to use it as it grabs 1/8 of SQL
memory and bad handling or forgetting to close documents will leak memory.

"Johnny Persson" <a@a.a> wrote in message
news:eiGOXvsv...@TK2MSFTNGP06.phx.gbl...

Johnny Persson

unread,

Apr 5, 2010, 9:50:45 AM4/5/10

to

Sorry for late response.

Good to know. How large xml documents are we talking about?

Regards,
Johnny

scmike

unread,

Apr 29, 2010, 2:26:01 PM4/29/10

to

I have encountered a similar problem with needing to load over 5 million XML
files per week, with over 100 different file versions. I highly recommend
using CLR for shredding the XML. Here is my article on my design.

http://blog.scmike.com/2010/tsql/fast-shredding-xml-sql-server/

"Johnny Persson" wrote:

> .
>