Stu
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
Can you check and confirm? When I slurp System/in it is more than twice as fast as slurping *in*.
I believe the next-biggest perf issue is how StringBuilders grow. I suspect that the 4096 buffer is making them grow more efficiently.
Stu
Looking at core's slurp, the problem is that it reads one character at
a time from the reader. The underlying reader being buffered or not,
reading one character at a time is not good for performance. The
attached patch brings it back down do the speed of slurp2 (How do I
actually create a ticket on assembla? I couldn't find a way to do
that; just browse individual tickets. I can't change tickets either;
perhaps editing is not publicly allowed?).
Anyways, some performance for FreeBSD 8/x86-64 with:
openjdk6: 15 seconds slurp, 3.0 seconds slurp2
openjdk7 (fastdebug): 14.5 seconds slurp, 2.0 seconds slurp2
And slurp2 as a function of buffer size (single run each):
1: 17.8 seconds
128: 2.92 seconds
1024: 2.88 seconds
4096: 3.12 seconds
--
/ Peter Schuller
> Has anyone else had a chance to try this? I'm surprised to see manual
> buffering behaving so much better than the BufferedReader
> implementation but it does seem to make quite a difference.
Not really that surprising looking at the implementation. Each call to
read(), in addition to the obvious buffer arithmetic etc, results in:
* A lock being acquired.
* ensureOpen() being called, which checks whether the stream was already closed.
One might argue about whether streams should be thread-safe or not,
but in any case I think it is good default behavior to always strive
to do I/O chunk-wise, regardless of whether the expectation is that
the underlying stream is fast (for some definition of fast).
--
/ Peter Schuller
You are on the contributors list, so I just need to know your account name on Assembla to activate your ability to add tickets, patches, etc. Let me know your account name (which needs to be some permutation of your real name, not a nick).
Thanks,
Stu
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en<slurp-buffer-size.patch>
When I read up before submitting the contributor agreement I did see
the preference for real name derived account names, but I went with my
usual 'scode' anyway since I normally use the same name *everywhere*
and I didn't get the impression it was a strict requirement.
I can register another account (no problem), but what implications are
there on the fact that I wrote 'scode' on the contributor agreement I
mail:ed Rich?
--
/ Peter Schuller
I just registered "peterschuller".
--
/ Peter Schuller
Stu
Interesting. Why do you consider it recommended to read one character
at a time in a case like this? Maybe there is such a recommendation
that I don't know about, but in general I would consider it contrary
to expected practice when doing I/O if performance is a concern.
Even discounting the fact of ensureOpen() and lock acquisition that
seems to be used as per my previous post, even the most efficient
implementation I can think of (do some index checking and bump a
positional pointer) would still be generally expected to be slower
when invoked one character/byte at a time than larger chunks being
coped with e.g. System.arraycopy() (though as with all things, within
reason; cranking up size too much will of course have other effects
such as GC overhead, poorer cache locality, etc).
(Note that I'm not arguing the point of whether or not it should be
committed before 1.2, but I'm genuinely interested in why not reading
one character (or byte) at a time would be a controversial change.)
--
/ Peter Schuller
On Sat 07/08/10 14:02 , "Stuart Halloway" stuart....@gmail.com sent:
> No. We want to collect more information and do more comparisons before
> moving away from the recommended Java buffering.
> Stu
This isn't an issue with the buffering, it is an issue with the massive overhead of doing character at a time i/o - it is something that you really
should never ever do. I'd say something somewhere doing character at a time i/o is probably the number one cause of crippling performance problems
that I've seen in Java.
--
Dave
> This isn't an issue with the buffering, it is an issue with the massive
> overhead of doing character at a time i/o - it is something that you really
> should never ever do. I'd say something somewhere doing character at a
> time i/o is probably the number one cause of crippling performance problems
> that I've seen in Java.
Just to add, the implementation really ought to use the stream copy stuff from clojure.java.io, which copies streams using a 1k buffer by default.
--
Dave
In the past I've steered clear of using slurp because it didn't hand character encodings, so I had
to write my own version (which did do block at a time copying), but now that the API has been
fixed, I'd hope to be able to use the version in core, but only if the implementation is usable.
--
Dave