Win7-64, 3.3.0b2 versus 3.2.3
print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3
Why is searching for a two-byte char in a two-bytes per char string so
much faster in 3.2? Is this worth a tracker issue (I searched and could
not find one) or is there a known and un-fixable cause?
print(timeit("a.encode()", "a = 'a'*1000"))
# 1.5 in 3.2, .26 in 3.3
print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
# 1.7 in 3.2, .51 in 3.3
This is one of the 3.3 improvements. But since the results are equal:
('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
and 3.3 should know that for an all-ascii string, I do not see why
adding the parameter should double the the time. Another issue or known
and un-fixable?
--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com
I get opposite numbers:
$ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
1000000 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
10000000 loops, best of 3: 0.119 usec per loop
However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.
> Why is searching for a two-byte char in a two-bytes per char string so
> much faster in 3.2? Is this worth a tracker issue (I searched and could
> not find one) or is there a known and un-fixable cause?
I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).
> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why
> adding the parameter should double the the time. Another issue or known
> and un-fixable?
When observing performance differences, you should ask yourself whether
they matter at all or not.
Regards
Antoine.
--
Software development and contracting: http://pro.pitrou.net
Just curious, what system?
>
> $ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
> 1000000 loops, best of 3: 0.599 usec per loop
> $ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
> 10000000 loops, best of 3: 0.119 usec per loop
>
> However, in both cases the operation is blindingly fast (less than
> 1µs), which should make it pretty much a non-issue.
The current default 'number' of 1000000 is higher that I remember. Good
to know.
>> Why is searching for a two-byte char in a two-bytes per char string so
>> much faster in 3.2? Is this worth a tracker issue (I searched and could
>> not find one) or is there a known and un-fixable cause?
>
> I don't think it's worth a tracker issue. First, because as said above
> it's practically a non-issue. Second, given the nature and depth of
> changes brought by the switch to the PEP 393 implementation, an
> individual micro-benchmark like this is not very useful; you'd need to
> make a more extensive analysis of string performance (as a hint, we
> have the stringbench benchmark in the Tools directory).
It is not in my 3.3.0b2 windows install, but I have heard of it. Another
good reminder. My main interest was in refuting '3.3 strings ops are
always slower'. Both points above are also good 'ammo'. I am sure this
discussion will re-occur after the release.
--
Terry Jan Reedy
On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy <tjr...@udel.edu> wrote:The issue came up in python-list about string operations being slower in
3.3. (The categorical claim is false as some things are actually
faster.) Some things I understand, this one I do not.
Win7-64, 3.3.0b2 versus 3.2.3
print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3
I get opposite numbers:
Yes, some operations are slower, but others are faster :-) There was
an important effort to limit the overhead of the PEP 393 (when the
branch was merged, most operations were slower). I tried to fix all
performance regressions. If you find cases where Python 3.3 is slower,
I can investigate and try to optimize it (in Python 3.4) or at least
explain why it is slower :-)
As said by Antoine, use the stringbench tool if you would like to get
a first overview of string performances.
> Some things I understand, this one I do not.
>
> Win7-64, 3.3.0b2 versus 3.2.3
> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
> # .6 in 3.2, 1.2 in 3.3
On Linux with narrow build (UTF-16), I get:
$ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
100000 loops, best of 3: 4.25 usec per loop
$ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
100000 loops, best of 3: 3.21 usec per loop
Linux-2.6.30.10-105.2.23.fc11.i586-i686-with-fedora-11-Leonidas
Python 3.2.2+ (3.2:1453d2fe05bf, Aug 21 2012, 14:21:05)
Python 3.3.0b2+ (default:b36ce0a3a844, Aug 21 2012, 14:05:23)
I'm not sure that I read your benchmark correctly: you write c='...'
and then ord(c)=8230. Algorithms to find a substring are different if
the substring is a single character or if the substring is longer. For
1 character, Antoine Pitrou modified the code to use memchr() and
memrchr(), even if the string is not UCS1 (if this benchmark, the
string uses a UCS2 storage): it may find false positives.
> Why is searching for a two-byte char in a two-bytes per char string so much
> faster in 3.2?
Can you reproduce your benchmark on other Windows platforms? Do you
run the benchmark more than once? I always run a benchmark 3 times.
I don't like the timeit module for micro benchmarks, it is really
unstable (default settings are not written for micro benchmarks).
Example of 4 runs on the same platform:
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.79 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.61 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 3.16 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.76 usec per loop
I wrote my own benchmark tool, based on timeit, to have more stable
results on micro benchmarks:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
Example of 4 runs:
3.18 us: c=chr(8230); a='a'*1000+c; c in a
3.18 us: c=chr(8230); a='a'*1000+c; c in a
3.21 us: c=chr(8230); a='a'*1000+c; c in a
3.18 us: c=chr(8230); a='a'*1000+c; c in a
My benchmark.py script calibrates automatically the number of loops to
take at least 100 ms, and then repeat the test during at least 1.0
second.
Using time instead of a fixed number of loops is more reliable because
the test is less dependent on the system activity.
> print(timeit("a.encode()", "a = 'a'*1000"))
> # 1.5 in 3.2, .26 in 3.3
>
> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
> # 1.7 in 3.2, .51 in 3.3
This test doesn't compare performances of the UTF-8 encoder: "encode"
an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates
the memory (ASCII is compatible with UTF-8)...
So your benchmark just measures the performances of
PyArg_ParseTupleAndKeywords()... Try also str.encode('utf-8').
If you want to benchmark the UTF-8 encoder, use at least a non-ASCII
character like "\x80".
At least, your benchmark shows that Python 3.3 is *much* faster than
Python 3.2 to "encode" pure ASCII strings to UTF-8 :-)
Victor
> I'm not sure that I read your benchmark correctly: you write c='...'
Apparenly you didn't - or your MUA was not able to display it
correctly. He didn't say
'...' # U+002E U+002E U+002E, 3x FULL STOP
but
'…' # U+2026, HORIZONTAL ELLIPSIS
Regards,
Martin
And when invoked from the command-line, it is already time-based: unless -n is specified, python guesstimates the number of iterations to be a power of 10 resulting in at least 0.2s per test (the repeat defaults to 3 though)
As a side-note, every time I use timeit programmatically, it annoys me that this behavior is not available and has to be implemented manually.
> If it is as unstable as you suggest, and if you have an alternative
> which is more stable and accurate, I would love to see it in the
> standard library.
+100, sounds like someone should contribute a patch for this.
Stefan
Yes, that is what I wrote, showed, and posted to python-list :-)
I was and am posting here in response to a certain French writer who
dislikes the fact that 3.3 unicode favors text written with the first
256 code points, which do not include all the characters needed for
French, and do not include the euro symbol invented years after that set
was established. His opinion aside, his search for 'evidence' did turn
up a version of the example below.
> an important effort to limit the overhead of the PEP 393 (when the
> branch was merged, most operations were slower). I tried to fix all
> performance regressions.
Yes, I read and appreciated the speed-up patches by you and others.
> If you find cases where Python 3.3 is slower,
> I can investigate and try to optimize it (in Python 3.4) or at least
> explain why it is slower :-)
Replacement appears to be as much as 6.5 times slower on some Win 7
machines. (I factored out the setup part, which increased the ratio
since it takes the same time on both machines.)
ttr = timeit.repeat
# 3.2.3
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[0.385043233078477, 0.35294282203631155, 0.3468394370770511]
# 3.3.0b2
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[2.2624885911213823, 2.245330314124203, 2.2531118686461014]
How do this compare on *nix?
> As said by Antoine, use the stringbench tool if you would like to get
> a first overview of string performances.
I found it, ran it on 3.2 and 3.3, and posted to python-list that 3.3
unicode looks quite good. It is overall comparable to both byte
operations and 3.2 unicode operations. Replace operations were
relatively the slowest, though I do not remember any as bad as the
example above.
>> Some things I understand, this one I do not.
>>
>> Win7-64, 3.3.0b2 versus 3.2.3
>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
>> # .6 in 3.2, 1.2 in 3.3
>
> On Linux with narrow build (UTF-16), I get:
>
> $ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 4.25 usec per loop
> $ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 3.21 usec per loop
The slowdown seems to be specific to (some?) windows systems. Perhaps we
as hitting a difference in the VC2008 and VC2010 compilers or runtimes.
Someone on python-list wondered whether the 3.3.0 betas have the same
compile optimization settings as 3.2.3 final. Martin?
> Can you reproduce your benchmark on other Windows platforms? Do you
> run the benchmark more than once? I always run a benchmark 3 times.
Always, and now I see the repeat does this for me.
> I don't like the timeit module for micro benchmarks, it is really
> unstable (default settings are not written for micro benchmarks).
I am reporting rounded lowest times. As other said, make timeit better
if you can.
>> print(timeit("a.encode()", "a = 'a'*1000"))
>> # 1.5 in 3.2, .26 in 3.3
>>
>> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
>> # 1.7 in 3.2, .51 in 3.3
>
> This test doesn't compare performances of the UTF-8 encoder: "encode"
> an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates
> the memory (ASCII is compatible with UTF-8)...
That is what I thought, and why I was puzzled, ...
> So your benchmark just measures the performances of
> PyArg_ParseTupleAndKeywords()...,
having forgotten about arg processing. I should have factored out the
.encode lookup (as I did with .replace). The following suggests that you
are correct. The difference, about .3, is independent of the length of
string being copied.
>>> ttr("aenc()", "aenc = ('a'*10000).encode")
[0.588499543029684, 0.5760222493490801, 0.5757037691037112]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*10000).encode")
[0.8973955632254729, 0.887000380270365, 0.884113153942053]
>>> ttr("aenc()", "aenc = ('a'*50000).encode")
[3.6618914099180984, 3.650091040467487, 3.6542183723140624]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*50000).encode")
[3.964849740958016, 3.9363826484832316, 3.937290440151628]
--
Terry Jan Reedy