Google 网上论坛不再支持新的 Usenet 帖子或订阅项。历史内容仍可供查看。

Possibly better stringobject allocator

已查看 1,977 次
跳至第一个未读帖子

Skip Montanaro

未读,
1997年2月7日 03:00:001997/2/7
收件人

We all know Python loves its objects. It really loves strings. The other
day I profiled Python and checked malloc calls in the CGI script that runs
on our site most frequently. With about a 3-second run-time it called
malloc over 66,000 times with half being from inside stringobject.c. (Lists
were the next most frequently allocated objects, accounting for about 10,000
malloc calls.) I figured it was a place that could use a little help.

I wrote a possibly better (read: not evaluated in a performance way yet)
string object allocator last night that treats stringobjects of <= 128 bytes
(including the stringobject header) special. The allocator rounds up
requests for small objects to the next higher power of two (<=32 -> 32, <=64
-> 64, and <=128 -> 128). No stringobject can contain less than 20 bytes by
default (stringobject headers are normally 20 bytes). 16kbyte blocks for
32-, 64-, and 128-byte stringobjects are then allocated as necessary, and a
free list is maintained, currently chained together through the ob_type
field.

The net effect is to cut way back on the number of malloc calls. I doubt
that by itself this is such a big performance win on my system, but malloc
needs to add some bookkeeping overhead of its own, so I figure while I might
only be saving a small amount in time, I'm also saving a small amount in
space, and it all adds up. (Tim Peters observed that ten 1% improvements
would probably net you a bigger gain than a single 10% improvement.) On
systems with less industrial-strength mallocs, cutting way back on malloc
calls may be a big win.

I then added some conditionally compiled counters to stringobject.c and a
Python-visible interface in stropmodule.c that counts the number of string
allocations and deallocations for each of the four categories, returning a
list of each. On that 3-second or so CGI script I got:

Alloc Dealloc
<= 32 30950 10824
<= 64 6440 3170
<= 128 1394 1076
> 128 2025 707

This was measured right at the end, just before sys.stdout was closed by the
script. I figure most of the large allocations were probably related to
imports. I haven't yet had a chance to try this with my database server.
It uses small strings as dict keys *heavily* however, so I expect more
skewed numbers.

I'll post a copy of the modified stringobject.c and stropmodule.c to my web
site:

http://www.automatrix.com/~skip/python/

I'm late for dinner, but I'll try and get them over there tonight or
tomorrow.

(By the way, there's nothing string-specific about this small object
allocator. I was thinking of chaining the free list through another field
and setting the type field once, but that would make it string-specific.
With a little work it could be the basis for a general small object
allocator as it now exists.)

Cheers,

--
Skip Montanaro | Python - it's not just for scripting anymore...
sk...@calendar.com | http://www.python.org/
(518)372-5583 | Musi-Cal ------> http://concerts.calendar.com/

Skip Montanaro

未读,
1997年2月8日 03:00:001997/2/8
收件人

I installed modified versions of stringobject.c and stropmodule.c on our web
server. They are accessible via

http://www.automatrix.com/~skip/python/

Warning: This is just a first attempt. I've done some testing, but not a
bunch. Use this on an experimental basis only. The code isn't yet properly
packaged, and there are some definite warts on it. Feedback is welcome.

One thing I haven't yet figured out is this: Given an arbitrary number, n,
return the power of two that is equal to or greater than n *without writing
a loop*. I can clearly do something like:

for (i = 1; i < n; i <<= 1);

Can it be done using a single bit-twiddling expression though?

Guido van Rossum

未读,
1997年2月8日 03:00:001997/2/8
收件人

> I installed modified versions of stringobject.c and stropmodule.c on our web
> server. They are accessible via
>
> http://www.automatrix.com/~skip/python/

Cool. I read your description and am very pleased with your
approach. Did you benchmark it with pystone yet? (I'm still waiting
for a better benchmark, but nobody has given me one yet... ;-)

> Warning: This is just a first attempt. I've done some testing, but not a
> bunch. Use this on an experimental basis only. The code isn't yet properly
> packaged, and there are some definite warts on it. Feedback is welcome.

I immediately went to resizestring (to check that you'd taken care of
it properly -- you have) and noticed that there's one easy
optimization there: when the new and old sizes fall in the same
bucket, you don't need to do anything except change the ob_size field.

> One thing I haven't yet figured out is this: Given an arbitrary number, n,
> return the power of two that is equal to or greater than n *without writing
> a loop*. I can clearly do something like:
>
> for (i = 1; i < n; i <<= 1);
>
> Can it be done using a single bit-twiddling expression though?

For numbers < 256 you can do it with a single table lookup, if you can
spare 256 bytes for the table. For larger numbers you can quickly
find the highest byte in the number that's non-zero and use that to
index the table and add 8* the byte number (if you count from the
right end ;_)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Peter Hart

未读,
1997年2月9日 03:00:001997/2/9
收件人

Skip:

>> One thing I haven't yet figured out is this: Given an arbitrary number, n,
>> return the power of two that is equal to or greater than n *without writing
>> a loop*. I can clearly do something like:
>>
>> for (i = 1; i < n; i <<= 1);
>>
>> Can it be done using a single bit-twiddling expression though?
>
>For numbers < 256 you can do it with a single table lookup, if you can
>spare 256 bytes for the table. For larger numbers you can quickly
>find the highest byte in the number that's non-zero and use that to
>index the table and add 8* the byte number (if you count from the
>right end ;_)
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)

* untested code*
It's not a single bittwiddle, but I think:

x |= x >> 1; // 00100100 -> 00100100 | 00010010 = 00110110
x |= x >> 2; // 00110110 -> 00110110 | 00001101 = 00111111
x |= x >> 4; // 00111111 -> 00111111 | 00000011 = 00111111
x = x+1; // 00111111 -> 01000000

almost works for 8 bits. Two more steps and you've got 32 bits. I
assume a decent compiler should be able to make this happen very
quickly.... Unfortunately it gives the power of two <greater than> x.
I think you'd need something like:

x = y;
(above code inserted)
if (x >> 1 == y) x = y;

To make it work for the power of two <greater than or equal to> x.

-Pete Hart.
'Waste Please' - Exhortation on bins in DisneyWorld....
------------------------------------------------------------------------------
Peter Hart: ze...@one.net, 10604...@compuserve.com. Voice: +1-513-579-9957
135, Garfield Place Apt#423, Cincinnati OH45202, USA.


William Lewis

未读,
1997年2月14日 03:00:001997/2/14
收件人

In article <3302147b...@news.one.net>, ze...@one.net (Peter Hart) writes:
[... nifty bit twiddle snipped ...]

> Unfortunately it gives the power of two <greater than> x.
>I think you'd need something like:
>
> x = y;
> (above code inserted)
> if (x >> 1 == y) x = y;
>
>To make it work for the power of two <greater than or equal to> x.

Well, you could always just decrement x before doing the twiddle.
Saves you a branch (and possible pipeline stall); and since a compare
is the same as a subtract, it shouldn't slow you down any...

--
Wim Lewis / wiml@{omnigroup.com|hhhh.org} / I do not speak for Omni
PGP 0x27F772C1: 0C 0D 10 D5 FC 73 D1 35 26 46 42 9E DC 6E 0A 88

anish198...@gmail.com

未读,
2014年2月16日 22:27:432014/2/16
收件人
On Saturday, February 8, 1997 12:00:00 AM UTC-8, Guido van Rossum wrote:
> > I installed modified versions of stringobject.c and stropmodule.c on our web
> > server. They are accessible via
> >
> > http://www.automatrix.com/~skip/python/
>
> Cool. I read your description and am very pleased with your
> approach. Did you benchmark it with pystone yet? (I'm still waiting
> for a better benchmark, but nobody has given me one yet... ;-)
>
> > Warning: This is just a first attempt. I've done some testing, but not a
> > bunch. Use this on an experimental basis only. The code isn't yet properly
> > packaged, and there are some definite warts on it. Feedback is welcome.
>
> I immediately went to resizestring (to check that you'd taken care of
> it properly -- you have) and noticed that there's one easy
> optimization there: when the new and old sizes fall in the same
> bucket, you don't need to do anything except change the ob_size field.
>
> > One thing I haven't yet figured out is this: Given an arbitrary number, n,
> > return the power of two that is equal to or greater than n *without writing
> > a loop*. I can clearly do something like:
> >
> > for (i = 1; i < n; i <<= 1);
> >
> > Can it be done using a single bit-twiddling expression though?
>
> For numbers < 256 you can do it with a single table lookup, if you can
> spare 256 bytes for the table. For larger numbers you can quickly
> find the highest byte in the number that's non-zero and use that to
> index the table and add 8* the byte number (if you count from the
> right end ;_)

Below is the code for this idea.

#include <stdio.h>
int log[256];
int next_power_of_two(int no)
{
int t, tt, r;
if(tt = no>> 16) {
r = (t = tt >> 8)?24+log[tt]:16+log[t];
} else {
r = (t = no >> 8)?8+log[t]:log[no];
}
return r;
}

void make_table()
{
int i;

log[0] = 0;
log[1] = 1;
for(i=2;i<256;i++) {
log[i] = 1 + log[i/2];
}
}

int main()
{
int no = 512;
make_table();
printf ("%d\n", next_power_of_two(no));
0 个新帖子