Performance of Pull using iterate/dsl vs Pull API vs Tree with scala.xml

65 views
Skip to first unread message

Jason Calabrese

unread,
Sep 8, 2012, 1:19:24 PM9/8/12
to scale...@googlegroups.com
All,

I'm getting some surprising performance results doing a Pull using the iterate/dsl method.  It's much slower than what I get using the Pull API directly.

I'd attached some sample code that produces these results.  I don't completely trust my memory usage calculations, but the timing should be correct.

scala> XMLParsePerf("genfile", "/Users/jcalabre/temp/ref_perf500.xml", "500")
scala> XMLParsePerf("perf", "/Users/jcalabre/temp/ref_perf500.xml", "500")
REF COUNT: 500
FILE SIZE: 1 MB
PULL (iterrate/dsl): time=2162ms mem=52 MB
PULL (mutable): time=101ms mem=13 MB
TREE (standard scala.xml): time=489ms mem=45 MB

I have tests that verify that these different parsing methods produce the same results.

I'm hoping that I'm doing something wrong when using iterate, since I like that style much better. 

-Jason
XMLParsePerf.scala
XMLParse.scala

Chris Twiner

unread,
Sep 8, 2012, 2:16:59 PM9/8/12
to scale...@googlegroups.com

Hi Jason,

Can you tell me which version you are using please? There was a perf problem in pre 0.4.2 versions.

It would also be great if you could send me that xml you used.  I will create a caliper run for it and see what's up.

If you aren't using 0.4.2 you can expect half the runtime saved at least.  I am also planning an alternate iterate version (using the same mutable builder approach as the tree parsing) for 0.5, which should drop that down yet further.

Cheers,
Chris

Jason Calabrese

unread,
Sep 8, 2012, 8:03:34 PM9/8/12
to scale...@googlegroups.com
Chris,

I'm using 0.4.3.  The xml is generated in the code that I attached in the original post above.  It's a very simple structure.  I'll attach it here anyway since that may make things easier for you.

If you get a chance take a look at the code I attached, it's very possible that I'm not using the dsl correctly and that could be part of the problem.

Thanks for taking a look at this.

-Jason
ref_perf500.xml

Chris Twiner

unread,
Sep 9, 2012, 7:12:08 AM9/9/12
to scale...@googlegroups.com
hi Jason,

sorry it was a silly question to ask for the file, but I've only now
just looked at the code on the laptop (I seriously need to get a
.scala viewer configured on android).

I can see a couple of things (outside of using caliper to do warm ups
and separate tests in a new vm run):

1) the iterate function expects the full path, so its should be
iterate(List("references"l, "reference"l), pull)

Fixing this reduces the time usage (especially memory usage) considerably.

2) as the dsl xpaths are converted to Iterables (when used in a for
loop) there is a conversion and sorting cost

//children <- (reference \\)
//child <- (children *)
child <- (reference \*)

would seem to suffice (as you want all direct element nodes of the
reference and children isn't used)

and lastly we can also add the normal Scales loading into the perf mix ^_^

In your actual app the above changes should make a big impact But I'm
intrigued and I'll attempt to convert this to a caliper test and check
the end resulting count is correct :-) (and also a little
implementation inefficiency in iterate itself).

Cheers,
Chris

Jason Calabrese

unread,
Sep 9, 2012, 2:08:09 PM9/9/12
to scale...@googlegroups.com
Chris,

Thanks for the tip about using the full path for iterate and also using \*.

I saw that your example were using the full pah, but I intentionally didn't use the full path since within my application the reference elements can be nested a little different depending on how I'm getting the data.  I can make the caller of my parse method pass the path since it would be know at that time, but it would be nice to not have to do that since it adds a little complexity and isn't need when using the pull api directly.

For my previous test I was starting with a fresh vm each time and that was penalizing the iterate timing compared to the normal pull api since it was run first.  I think that was the main problem with my test.  

After multiple runs there isn't a significant difference when using the full path and \* compared to using \\ and then * with the partial path.

Those changes helped, the first run dropped to:
scala> XMLParsePerf("perf", "/Users/jcalabre/temp/ref_perf500.xml", "500")
REF COUNT: 500
FILE SIZE: 1 MB
PULL (iterate/dsl): time=1457ms mem=20 MB
PULL (mutable): time=110ms mem=12 MB
TREE (standard scala.xml): time=479ms mem=18 MB

After multiple runs Hotspot kicks in and the Scales loading is removed. Then the iterate/dsl overhead almost disappears:

scala> XMLParsePerf("perf", "/Users/jcalabre/temp/ref_perf500.xml", "500")
REF COUNT: 500
FILE SIZE: 1 MB
PULL (iterate/dsl): time=52ms mem=4 MB
PULL (mutable): time=32ms mem=12 MB
TREE (standard scala.xml): time=85ms mem=16 MB

Thanks again for looking into this.

-Jason

Chris Twiner

unread,
Sep 11, 2012, 4:47:39 PM9/11/12
to scale...@googlegroups.com
Hiya,

I've got around to the caliper version and I see the following:

[info] size benchmark ms linear runtime
[info] 50 ScalesXmlTreeNameValue 8.00 =
[info] 50 ScalaXMLTree 6.01 =
[info] 50 ScalesXmlTree 7.65 =
[info] 50 ScalesIterateFull 21.17 =
[info] 50 ScalesIterateFullOld 19.76 =
[info] 50 PullMutable 3.73 =
[info] 100 ScalesXmlTreeNameValue 18.14 =
[info] 100 ScalaXMLTree 11.32 =
[info] 100 ScalesXmlTree 15.10 =
[info] 100 ScalesIterateFull 38.73 ===
[info] 100 ScalesIterateFullOld 38.23 ===
[info] 100 PullMutable 7.31 =
[info] 500 ScalesXmlTreeNameValue 66.56 =====
[info] 500 ScalaXMLTree 66.14 =====
[info] 500 ScalesXmlTree 59.46 ====
[info] 500 ScalesIterateFull 193.90 ===============
[info] 500 ScalesIterateFullOld 178.11 ==============
[info] 500 PullMutable 35.40 ==
[info] 1000 ScalesXmlTreeNameValue 216.23 =================
[info] 1000 ScalaXMLTree 135.15 ==========
[info] 1000 ScalesXmlTree 134.76 ==========
[info] 1000 ScalesIterateFull 348.37 ============================
[info] 1000 ScalesIterateFullOld 369.05 ==============================
[info] 1000 PullMutable 70.17 =====
[info] 5000 ScalesXmlTreeNameValue 734 ========
[info] 5000 ScalaXMLTree 864 =========
[info] 5000 ScalesXmlTree 780 ========
[info] 5000 ScalesIterateFull 989 ===========
[info] 5000 ScalesIterateFullOld 1115 ============
[info] 5000 PullMutable 460 =====
[info] 10000 ScalesXmlTreeNameValue 1510 =================
[info] 10000 ScalaXMLTree 2655 ==============================
[info] 10000 ScalesXmlTree 1548 =================
[info] 10000 ScalesIterateFull 1918 =====================
[info] 10000 ScalesIterateFullOld 1900 =====================
[info] 10000 PullMutable 935 ==========

which matches my intuitions on the speed. The test is attached.

I disabled the children, child code as I'm assuming it produces
incorrect results. This wasn't immediately clear to me, and for that
I apologise, it should have given how annoying it was to implement :-)

When using \\ at the start of an xpath expression it (according to the
xpath 2 spec) it also includes the current node. The expression \\*
includes the "reference" element as well.

I didn't mean to indicate that you should only use a full path, its
just easier to predict the results, but your xml is very regular so I
think the approach pretty good looking.

The results are driven from the upcoming 0.4.4 release (the
ScalesIterateFullOld is the 0.4.3 solution). In all cases your hand
written pull code wins by a significant margin. I'll see if I can't
close that gap a bit for 0.5 :-)
JCParsePerf.scala

Jason Calabrese

unread,
Sep 15, 2012, 11:20:51 PM9/15/12
to scale...@googlegroups.com
Thanks Chris, I had a busy week and just got a chance to read over this now.

Chris Twiner

unread,
Sep 24, 2012, 3:37:11 PM9/24/12
to scale...@googlegroups.com
You had spurred my interest Jason ^_^ and the result is a reasonable
book keeping cost for having a declarative style :

[info] size benchmark ms linear runtime
[info] 5000 ScalesIterateFullRaw 500 ================
[info] 5000 PullMutable 410 =============
[info] 10000 ScalesIterateFullRaw 929 ==============================
[info] 10000 PullMutable 880 ============================

40% of the time was in the iteratee code, which isn't necessary in
iterate as its only one path and that evaluation is hidden behind an
iterator. Another 20% of that time was in the sorting of XPaths, but
for your example iterate call we know they must all be in document
order (if you had a \^ or similar in there we could not). So the
other code is then:

for {
reference <- iterate(List("references"l, "reference"l), pull)
} yield {
for {
child <- raw(reference \*)
} yield {
(localName(child), text(child))
}
}.groupBy(_._1.toUpperCase).mapValues(_.toList.map(_._2))

and the code for iterate is in the feature/aalto branch. I'll merge
it over for a 0.4.4 as it should be binary compatible with 0.4.3.
Reply all
Reply to author
Forward
0 new messages