note about parsing speed of xml vs sxml?

138 views
Skip to first unread message

'John Clements' via users-redirect

unread,
Jun 26, 2020, 11:05:42 PM6/26/20
to Racket Users
I’m parsing a large-ish apple plist file, (18 megabytes), and I find that the built-in xml parsing (read-xml) takes about five times as long as the sxml version (11 seconds vs 2.4 seconds on my machine), and that the plist parser is way longer, at 18 seconds.

Would anyone object if I added a margin note to this effect to the xml docs?

John



Neil Van Dyke

unread,
Jun 27, 2020, 12:33:08 AM6/27/20
to Racket Users
I think anyone using XML or HTML seriously with Racket should probably
at least be told of the SXML family of tools.  And warned about the
compatibility problems.

Though not tell them *everywhere* XML&HTML in the docs.  For example, I
figure a tutorial for Racket Web Server shouldn't distract readers with
that.

As you know, :) there are some useful tools using SXML, and Oleg's SSAX
parser has some different properties than core Racket's XML parser.

Complication: The incompatibility between SXML and core Racket's
representations of XML&HTML is an unfortunate accident of parallel
invention, and I think will tend to be confusing to new people.  I once
tried to address the confusion in the `sxml-intro` documentation
package, "https://www.neilvandyke.org/racket/sxml-intro/", and I'm
unhappy with the result.  The details in my document say more than
perhaps anyone will ever want to know, and, "optics"-wise, make the
situation look worse than it actually is in practice.  I think you could
do a more graceful job of this.

(Someday, someone might undertake the large task of SXML-ifying all the
many non-SXML bits of Racket, and incidentally reunite Racket with the
rest of the Scheme community in that regard.  I started, with one piece,
but got interrupted.
"https://www.neilvandyke.org/racket/rws-html-template/"  :)

Hendrik Boom

unread,
Jun 27, 2020, 8:33:15 AM6/27/20
to Neil Van Dyke, Racket Users
On Sat, Jun 27, 2020 at 12:33:02AM -0400, Neil Van Dyke wrote:
> I think anyone using XML or HTML seriously with Racket should probably at
> least be told of the SXML family of tools.  And warned about the
> compatibility problems.
>
> Though not tell them *everywhere* XML&HTML in the docs.  For example, I
> figure a tutorial for Racket Web Server shouldn't distract readers with
> that.
>
> As you know, :) there are some useful tools using SXML, and Oleg's SSAX
> parser has some different properties than core Racket's XML parser.
>
> Complication: The incompatibility between SXML and core Racket's
> representations of XML&HTML is an unfortunate accident of parallel
> invention, and I think will tend to be confusing to new people.  I once
> tried to address the confusion in the `sxml-intro` documentation package,
> "https://www.neilvandyke.org/racket/sxml-intro/", and I'm unhappy with the
> result.  The details in my document say more than perhaps anyone will ever
> want to know, and, "optics"-wise, make the situation look worse than it
> actually is in practice.  I think you could do a more graceful job of this.

On the contrary; this is the kind of information I need when choosing
between the various representations.

But in section 4. Appendix there is one bit of pervasive confusion:
you present several differences, but do not make it clear which way the
difference goes. When you say, for example, "The SXML keyword symbols
may be lowercase", do you mean that SXML itseld allows this to be done
to its keywords, or that it does not but that xexpr allows its SXML
keywords to be lower case?

>
> (Someday, someone might undertake the large task of SXML-ifying all the many
> non-SXML bits of Racket, and incidentally reunite Racket with the rest of
> the Scheme community in that regard.  I started, with one piece, but got
> interrupted. "https://www.neilvandyke.org/racket/rws-html-template/"  :)

From this, and from the general drift of your sxml-intro, I surmise that
the intent is for Racket to become fully SXML compliant, and new
software should be written for SXML, not xexpr, if at all possible.
Is this correct, and is this Racket policy? If so, it should be stated
explicitly in the sxml-intro. A statement like this (if not gainsaid by
the opposite camp (if any)) would eliminate much of the confusion
experienced by new users. It should of course also be said in the xexpr
documentation.

Finally, I seem to remember that one of the tools mentioned somewhere
for handling xml turned out not to be findable. I don't know any more
if it was mentioned in your document or elsewhere, but it might be worth
checking that the ones you mention are still available.

-- hendrik

Neil Van Dyke

unread,
Jun 27, 2020, 10:56:24 AM6/27/20
to Racket Users
Hendrik Boom wrote on 6/27/20 8:33 AM:
> But in section 4. Appendix there is one bit of pervasive confusion: you present several differences, but do not make it clear which way the difference goes. When you say, for example, "The SXML keyword symbols may be lowercase", do you mean that SXML itseld allows this to be done to its keywords, or that it does not but that xexpr allows its SXML keywords to be lower case?

Thank you.  I didn't phrase that well.  In section "Appendix:
SXML/xexp", that bulleted list is describing "SXML/xexp", relative to
canonical SXML.

That first bulletpoint is something on which I think SXML was
ambiguous.  (Some Scheme readers or symbol tables forced or disregarded
case, but others thankfully didn't.  Although, IIRC, Oleg's code was
consistent in how it used case, the ambiguity of the case of the symbols
in SXML presented portability problems when other people wrote code,
especially if they interpreted it differently, and exercised their
preferences, then you tried to combine their code.  Many Scheme
programmers emphasized personal preference, and we can imagine that a
small language with powerful linguistic extension, and a convention of
writing one's own interpreter, might attract rugged individualists.) 
"SXML/xexp" tried to mitigate that in a portable way, by saying both
all-uppercase and all-lowercase were supported, and that no other mixing
of case was permitted.

> From this, and from the general drift of your sxml-intro, I surmise
> that the intent is for Racket to become fully SXML compliant, and new
> software should be written for SXML, not xexpr, if at all possible.

There's no policy that I know of.

I think switching would be better overall, but switching is a lot of
work.  And, in a sense, there's less focus in practice on XML and HTML
than there used to be, so less reason than before to invest in
switching.  I suspect any switch won't happen wholesale, but telling
people about the separate `sxml` package might result in some future
projects choosing to use SXML.  I don't know how much activity future
projects will be.

When I first started with Scheme, I was actually lucky in my timing, in
avoiding fragmenting the XML/HTML representations even worse. The first
Scheme code I wrote was an HTML parser, and, initially, I made my own
obvious s-expression encoding of HTML, which turned out to be very
similar to Racket's `xexpr`.  But I quickly saw Oleg's XML work, and so
reworked my code to emit SXML, so that the fancy XML tools could also be
used with real-world HTML.   The switch to SXML was trivial for me then,
but the switch would've been hard for Racket (aka PLT Scheme), by the
time SXML became a de facto standard for the Scheme community.

> Finally, I seem to remember that one of the tools mentioned somewhere for handling xml turned out not to be findable. I don't know any more if it was mentioned in your document or elsewhere, but it might be worth checking that the ones you mention are still available.

Are you thinking of Jim Bender's `sxml-match`?  I need to fix that dead
link (can't release a new version at the moment), probably to point to
the PLaneT package that the text mentions.
http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim

Alex Harsanyi

unread,
Jun 27, 2020, 8:16:35 PM6/27/20
to Racket Users
Looking at the source for `read-xml`, it seems to be using `list->string` in several places.  That is, it reads characters one-by-one and constructs a list by appending a character to the end of it, than calls `list->string` to produce the string.  I suspect read-xml could be made faster by using `string-append` in these cases.


Alex.

Hendrik Boom

unread,
Jun 28, 2020, 3:07:48 PM6/28/20
to Racket Users
On Sat, Jun 27, 2020 at 05:16:34PM -0700, Alex Harsanyi wrote:
> Looking at the source for `read-xml`, it seems to be using `list->string`
> in several places. That is, it reads characters one-by-one and constructs
> a list by appending a character to the end of it, than calls `list->string`
> to produce the string. I suspect read-xml could be made faster by using
> `string-append` in these cases.

So you would be copying and reallocating strings instead of cons-cells?

The way to make that eliminate all that allocation is to implement a
likely big enough mutable string buffer and insert characters (likely
one at at time if I read you correctly) without allocating new storage
each time (unless you've made the buffer too smal; in which case, double
its size).

Then allocate the right amount of space for a string once and copy
the buffer into it when the string has been completely read in.

-- hendrik

Neil Van Dyke

unread,
Jun 28, 2020, 3:56:12 PM6/28/20
to Racket Users
If anyone wants to optimize `read-xml` for particular classes of use,
without changing the interface, it might be very helpful to run your
representative tests using the statistical profiler.

The profiler text report takes a little while of tracing through
manually to get a feel for how to read and use it, but it can be
tremendously useful, and is worth learning to do if you need performance.

After a first pass with that, you might also want to look at how costly
allocations/GC are, and maybe do some controlled experiments around
that.  For example, force a few GC cycles, run your workload under
profiler, check GC time during, and forced time after.  If you're
dealing with very large graphs coming out of the parser, I don't know
whether those are enough to matter with the current GC mechanism, but
maybe also check GC time while you're holding onto large graphs, when
you release them, and after they've been collected.  At some point, GC
gets hard for at least me to reason about, but some things make sense,
and other things you decide when to stop digging. :)  If you record all
your measurements, you can compare empirically the how different changes
to the code affect things, hopefully in representative situations.

I went through a lot of these exercises to optimize a large system, and
sped up dynamic Web page loads dramatically in the usual case (to the
point we were then mainly limited by PostgreSQL query cost, not much by
the application code in Scheme, nor our request&response network I/O),
and also greatly reduced the pain of intermittent request latency spikes
due to GC.

One of the hotspots, I did half a dozen very different implementations,
including C extension, and found an old-school pure Scheme
implementation was fastest.  I compared the performance of the
implementation using something like `shootout`, but there might be
better ways now in Racket. https://www.neilvandyke.org/racket/shootout/ 
I also found we could be much faster if we made a change to what the
algorithm guarantees, since it was more of a consistency check that
turned out to be very expensive and very redundant, due to all the ways
that utility code ended up being used.

In addition to contrived experiments, I also rigged up a runtime option
so that the server would save data from the statistical profiler for
each request a Web server handled in production.  Which was tremendously
useful, since it gave us real-world examples that were also difficult to
synthesize (e.g., complex dynamic queries), and we could go from Web
logs and user feedback, to exactly what happened.

(In that system I optimized, we used Oleg's SXML tools very heavily
throughout the system, plus some bespoke SXML tools for HTML and XML. 
There was one case in which someone had accidentally used the `xml`
module, not knowing it was incompatible with the rest of the system,
which caused some strange failures (no static checking) before it was
discovered, and we changed that code to use SXML.)

Ryan Culpepper

unread,
Jun 28, 2020, 5:30:43 PM6/28/20
to Neil Van Dyke, Racket Users
Thanks Alex for pointing out the use of list->string. I've created a PR (https://github.com/racket/racket/pull/3275) that changes that code to use string ports instead (similar to Hendrik's suggestion, but the string port handles resizing automatically). Could someone (John?) with some large XML files lying around try the changes and see if they help?

Ryan


--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org.

Hendrik Boom

unread,
Jun 28, 2020, 6:13:37 PM6/28/20
to ry...@racket-lang.org, Neil Van Dyke, Racket Users
On Sun, Jun 28, 2020 at 11:30:27PM +0200, Ryan Culpepper wrote:
> Thanks Alex for pointing out the use of list->string. I've created a PR (
> https://github.com/racket/racket/pull/3275) that changes that code to use
> string ports instead (similar to Hendrik's suggestion, but the string port
> handles resizing automatically). Could someone (John?) with some large XML
> files lying around try the changes and see if they help?

I'm currently using sxml to read the current openGL specification.
2564178 bytes in one file.
(a) Is that relevant to your request?
(b) What do I do to have a choice of which sxml to use?
(c) Do I need to figure out using xml as well?
(d) And what do you want the test to do?

-- hendrik
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CANy33q%3DpZw9EPmZG%2Bdz5cRYMSP17Ofntq9JwFqVVoN8ZhO6POg%40mail.gmail.com.

Alex Harsanyi

unread,
Jun 28, 2020, 6:44:13 PM6/28/20
to Racket Users
I suggested using `string-append` because in my own performance investigations with reading 100Mb+ CSV files: constructing short tokens using string-append is faster than using a string port -- perhaps there is a fixed overhead with using string ports which makes `string-append` faster for short strings, but I don't know at what string length the string ports become faster.

I think using string ports will be definitely faster than using `list->string`, but for the difference between `string-append` and string ports, some performance measurement might be needed.

Thanks for looking into this,
Alex.

Alex Harsanyi

unread,
Jun 28, 2020, 9:01:28 PM6/28/20
to Racket Users
I tested the your string port version and I also wrote a "string-append" version of the xml reader and they are both slower by about 10-15% on my machine, when compared to the current read-xml implementation which uses `list->string`.  It looks like `list->string` is not the bottleneck here.

There are some small improvements that can be made from micro optimizations.  For example, I changed `name-char?` to not use `name-start?` but instead check for all chars, and I also changed `lex-name` to construct the list in reverse and use `(list->string (reverse chars))`, plus I reordered the cond condition to check the common case first (that the next character is a name-char? and not a 'special one).  However, this resulted in only about 5-10% speed improvement, nowhere near where the 4 time speedup when using `sxml`, as reported by John.

In the end, it may well be that speeding up `read-xml` can only be done by these types of micro-optimizations.  Another thing I looked into was the "pattern" used for reading: all the `read-xml` code will use the pattern of "peeking" the next character, deciding if it is good, than reading it.  This is much slower than just reading the characters directly.  These are the results from just reading in a 14Mb XML file:

    read-char only:  cpu time: 312 real time: 307 gc time: 0
    read-char-or-special only:  cpu time: 750 real time: 741 gc time: 0
    peek-char than read-char:  cpu time: 1234 real time: 1210 gc time: 0
    peek-char-or-special than read-char-or-special:  cpu time: 1688 real time: 1690 gc time: 0

Using this code:

(define file-name "your-test-file-here.xml")

(printf "read-char only~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
   (lambda (in)
     (let loop ([c (read-char in)])
       (if (eof-object? c)
           (void)
           (loop (read-char in)))))))

(printf "read-char-or-special only~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
   (lambda (in)
     (let loop ([c (read-char-or-special in)])
       (if (eof-object? c)
           (void)
           (loop (read-char-or-special in)))))))

(printf "peek-char than read-char~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
  (lambda (in)
    (let loop ([c (peek-char in)])
      (if (eof-object? c)
          (void)
          (begin
            (void (read-char in))
            (loop (peek-char in))))))))

(printf "peek-char-or-special than read-char-or-special~%")
(collect-garbage 'major)
(time
 (call-with-input-file file-name
  (lambda (in)
    (let loop ([c (peek-char-or-special in)])
      (if (eof-object? c)
          (void)
          (begin
            (void (read-char-or-special in))
            (loop (peek-char-or-special in))))))))

Alex.

On Monday, June 29, 2020 at 5:30:43 AM UTC+8 rmculp...@gmail.com wrote:

Bonface M. K.

unread,
Jun 29, 2020, 6:21:12 AM6/29/20
to Neil Van Dyke, Racket Users
Thanks for this! Tbh, I never knew of this.

--
Bonface M. K. (https://www.bonfacemunyoki.com)
One Divine Emacs To Rule Them All
GPG key = D4F09EB110177E03C28E2FE1F5BBAE1E0392253F

Hendrik Boom

unread,
Jun 29, 2020, 8:09:52 AM6/29/20
to Racket Users
On Sun, Jun 28, 2020 at 06:01:27PM -0700, Alex Harsanyi wrote:
> I tested the your string port version and I also wrote a "string-append"
> version of the xml reader and they are both slower by about 10-15% on my
> machine, when compared to the current read-xml implementation which uses
> `list->string`. It looks like `list->string` is not the bottleneck here.

Odd -- to remove all that storage-allocaation overhead and to find it
gets slower...

Perhaps the overhead lies in the Scheme interpreter? Does it allocate
lots of storage?
Would using chez racket help any?

-- hendrik
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/b663b6e8-ac63-4ecd-8212-c0175db5afden%40googlegroups.com.

Alex Harsanyi

unread,
Jun 29, 2020, 7:03:21 PM6/29/20
to Racket Users
I installed the sxml package out of curiosity, and while it is faster, it is not 4 times as fast, as your tests indicate. I used the following test program with a 14Mb XML file (a bike ride in TCX format):

    (define file-name "../MyPackages/more-df-tests/tcx-data/2015-09-27-0755_Road_Cycling_WF.tcx")
    ;; Make sure the file is in the cache
    (call-with-input-file file-name
      (lambda (in) (let loop ([c (read-char in)]) (unless (eof-object? c) (loop (read-char in))))))
    (collect-garbage 'major)
    (time (void (call-with-input-file file-name (lambda (in) (ssax:xml->sxml in null)))))
    (collect-garbage 'major)
    (time (void (call-with-input-file file-name read-xml)))

On my laptop the times are:

     ssax:xml->sxml : cpu time: 4031 real time: 4128 gc time: 157
     read-xml: cpu time: 9578 real time: 10031 gc time: 3270

The big difference I found so far is that `read-xml` will store the location (line number, column and file offset) for each element, and enabled `port-count-lines!` by default.  If I use:

    (parameterize ([xml-count-bytes #t])
      (time (void (call-with-input-file file-name read-xml))))

The results are much closer together, although `read-xml` is still slower and spends more time in the garbage collector:

     ssax:xml->sxml :  cpu time: 4187 real time: 4233 gc time: 202
     read-xml: cpu time: 5797 real time: 5824 gc time: 1251

Perhaps a note could be added to the documentation indicating that users can speed up `read-xml` significantly if they set `xml-count-bytes` to #t.

Alex.

On Saturday, June 27, 2020 at 11:05:42 AM UTC+8 'John Clements' via users-redirect wrote:

Neil Van Dyke

unread,
Jun 29, 2020, 7:48:14 PM6/29/20
to Racket Users
Is even 2x speedup helpful for your purpose?  3 seconds is one old magic
number for user patience in HCI, so I suppose there's still a big
difference between 4 seconds and almost 10 seconds?

For large (and absolutely massive) XML... SSAX can shine even better
than in this comparison, since you can, say, populate a database *while
you're parsing, without first constructing the intermediate
representation* of xexpr or SXML.  GC-wise, with the database-populating
scenario, you'll probably end up with small, little-referencing, local,
short-lived allocations.  Besides GC costs, you'll also use less RAM
(possibly lower AWS bill), and be less likely to push into swap (which
would be bad for performance).

In addition to SSAX's current performance characteristics and
opportunities... There might also be opportunity to optimize SSAX
significantly for Racket.  Oleg is a famously capable Scheme programmer,
but he was writing SSAX in fairly portable Scheme code, a couple decades
ago, when he wrote SSAX.  I did an initial packaging of SSAX for PLT
Scheme, Kirill Lisovsky later did many packagings of various SXML-ish
tools (including his own), and then John Clements did more work to
package Oleg's SXML-ish tools for Racket... But I don't know that anyone
has had motivation to try to optimize Racket's SSAX port, using current
Racket features, and tuning for current performance characteristics.

Side note regarding performance comparison... FWIW, SSAX might be doing
some things `read-xml` doesn't, such as namespace resolution, entity
reference resolution, and some validation.

Alex Harsanyi

unread,
Jun 29, 2020, 9:24:37 PM6/29/20
to Racket Users
On Tuesday, June 30, 2020 at 7:48:14 AM UTC+8 Neil Van Dyke wrote:
Is even 2x speedup helpful for your purpose? 

Yes it is, and for my purpose `read-xml` is fine even without any speed improvement.  In the sports field, XML (via the TCX format) is a legacy technology.  Typical TCX files are about 1Mb in size, the 14Mb one is a very large one.   Setting ` xml-count-bytes` to #t while calling `read-xml` gets me a speed improvement at a low effort, but it is not worth adding another package dependency just to support a legacy technology.

3 seconds is one old magic
number for user patience in HCI, so I suppose there's still a big
difference between 4 seconds and almost 10 seconds?

I am not sure where you got the 3 seconds from, but even 3 seconds is too long to wait on a button callback.  For large files, both read-xml and sxml would need to have a progress dialog with a cancel button, or some other form of user feedback, if one wants to make a "well behaved" GUI.
 
For large (and absolutely massive) XML... SSAX can shine even better
than in this comparison, since you can, say, populate a database *while
you're parsing, without first constructing the intermediate
representation* of xexpr or SXML.  GC-wise, with the database-populating
scenario, you'll probably end up with small, little-referencing, local,
short-lived allocations.  Besides GC costs, you'll also use less RAM
(possibly lower AWS bill), and be less likely to push into swap (which
would be bad for performance).

... if you are willing to deal with the complexity of a SAX interface, that is.  I have written code for parsing documents (correctly!) using a SAX interface, and the resulting code was so complex that I had to use a code generator for it, but yes, the resulting code was very fast.   Would I do it again? No.

The complexity of SAX parsing is probably why most people use a DOM style interface...
 
In addition to SSAX's current performance characteristics and
opportunities... There might also be opportunity to optimize SSAX
significantly for Racket. Oleg is a famously capable Scheme programmer,
but he was writing SSAX in fairly portable Scheme code, a couple decades
ago, when he wrote SSAX.  I did an initial packaging of SSAX for PLT
Scheme, Kirill Lisovsky later did many packagings of various SXML-ish
tools (including his own), and then John Clements did more work to
package Oleg's SXML-ish tools for Racket... But I don't know that anyone
has had motivation to try to optimize Racket's SSAX port, using current
Racket features, and tuning for current performance characteristics.

Side note regarding performance comparison... FWIW, SSAX might be doing
some things `read-xml` doesn't, such as namespace resolution, entity
reference resolution, and some validation.

You used the phrase "might be doing...", does that mean that it might not do those things?

Alex.

 

WarGrey Gyoudmon Ju

unread,
Jun 29, 2020, 10:32:26 PM6/29/20
to Alex Harsanyi, Racket Users
Hello, In my experience, (list->string (reverse chars)) is still the most efficient way to do parsing.
The attached file contains several functions to split the file itself by whitespaces, and the result is stable:

times: 4200 size: 4655: total:19551000
#<procedure:port->tokens:bprt>  : cpu time: 1145 real time: 1147 gc time: 9
#<procedure:port->tokens:peek>  : cpu time: 2135 real time: 2138 gc time: 7
#<procedure:port->tokens:char>  : cpu time: 811 real time: 814 gc time: 10
#<procedure:port->tokens:appd>  : cpu time: 1542 real time: 1547 gc time: 14
#<procedure:port->tokens:bf16>  : cpu time: 1285 real time: 1288 gc time: 7
#<procedure:port->tokens:bf32>  : cpu time: 1403 real time: 1405 gc time: 8

This screenshot also suggests to turn off the port lines counting since it significantly slows down the process.
Even with a progress bar, reading a 70MB csv file takes less than 3s (MBP 2018, toppest) (years ago it's 3.5s with MBP 2013, toppest).

Screen Shot 2020-06-30 at 10.08.07.png

Here are some tips for high performance parsing:
1. avoid (peek-char), instead, always return two values where one of them is for the char that should be pushed back into the port.
     Yes, I am not sure why not is there an `ungetc`-like API provided by Racket.

2. report the progress with Racket logging facilities as I found that it is the most efficient many-to-many message dispatch mechanism
    since Racket Virtual Machine itself makes heavy use of it (e.g. for reporting GC info).
--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
string-port.rkt

John Clements

unread,
Jul 1, 2020, 10:24:23 AM7/1/20
to Ryan Culpepper, Neil Van Dyke, Racket Users
Ryan, I just tested your pull request, and… it doesn’t make much difference in my example.

One important thing that I realize that I *totally neglected* to mention is that I’m running CS racket here, not BC. Based on my experiments, it appears that

1) CS is much faster than BC for both xml(read-xml) and sxml (ssax:xml->sxml), and
2) CS speeds up sxml more dramatically.

Here are the results of running my tests with ryan’s/your PR:

pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 12858 real time: 15642 gc time: 4242
ssax:warn: warning at position 150: DOCTYPE DECL plist http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2157 real time: 2342 gc time: 332
pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 10518 real time: 11248 gc time: 3544
ssax:warn: warning at position 150: DOCTYPE DECL plist http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2183 real time: 2327 gc time: 305
pajaro2:/tmp clements> racketcs zz.rkt
cpu time: 10162 real time: 10706 gc time: 3363
ssax:warn: warning at position 150: DOCTYPE DECL plist http://www.apple.com/DTDs/PropertyList-1.0.dtd found and skipped
cpu time: 2188 real time: 2325 gc time: 328

(so actually, the first of these was pretty bad. … I suspect that’s a rare occurrence.

This broadly matches my first set of timings, which suggests that in racket CS, parsing an 18 Megabyte XML file generated by Apple Music “Export Library…” is about four times faster in sxml than in xml.

In BC, by the way, parsing using xml takes about 14 seconds, and parsing using sxml takes about seven.

So really, I think maybe the on-the-side takeaway from this is this: CS is much faster than BC in this case.

John
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CANy33q%3DpZw9EPmZG%2Bdz5cRYMSP17Ofntq9JwFqVVoN8ZhO6POg%40mail.gmail.com.



Reply all
Reply to author
Forward
0 new messages