Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Possibly better stringobject allocator

1,990 views
Skip to first unread message

Skip Montanaro

unread,
Feb 7, 1997, 3:00:00 AM2/7/97
to

We all know Python loves its objects. It really loves strings. The other
day I profiled Python and checked malloc calls in the CGI script that runs
on our site most frequently. With about a 3-second run-time it called
malloc over 66,000 times with half being from inside stringobject.c. (Lists
were the next most frequently allocated objects, accounting for about 10,000
malloc calls.) I figured it was a place that could use a little help.

I wrote a possibly better (read: not evaluated in a performance way yet)
string object allocator last night that treats stringobjects of <= 128 bytes
(including the stringobject header) special. The allocator rounds up
requests for small objects to the next higher power of two (<=32 -> 32, <=64
-> 64, and <=128 -> 128). No stringobject can contain less than 20 bytes by
default (stringobject headers are normally 20 bytes). 16kbyte blocks for
32-, 64-, and 128-byte stringobjects are then allocated as necessary, and a
free list is maintained, currently chained together through the ob_type
field.

The net effect is to cut way back on the number of malloc calls. I doubt
that by itself this is such a big performance win on my system, but malloc
needs to add some bookkeeping overhead of its own, so I figure while I might
only be saving a small amount in time, I'm also saving a small amount in
space, and it all adds up. (Tim Peters observed that ten 1% improvements
would probably net you a bigger gain than a single 10% improvement.) On
systems with less industrial-strength mallocs, cutting way back on malloc
calls may be a big win.

I then added some conditionally compiled counters to stringobject.c and a
Python-visible interface in stropmodule.c that counts the number of string
allocations and deallocations for each of the four categories, returning a
list of each. On that 3-second or so CGI script I got:

Alloc Dealloc
<= 32 30950 10824
<= 64 6440 3170
<= 128 1394 1076
> 128 2025 707

This was measured right at the end, just before sys.stdout was closed by the
script. I figure most of the large allocations were probably related to
imports. I haven't yet had a chance to try this with my database server.
It uses small strings as dict keys *heavily* however, so I expect more
skewed numbers.

I'll post a copy of the modified stringobject.c and stropmodule.c to my web
site:

http://www.automatrix.com/~skip/python/

I'm late for dinner, but I'll try and get them over there tonight or
tomorrow.

(By the way, there's nothing string-specific about this small object
allocator. I was thinking of chaining the free list through another field
and setting the type field once, but that would make it string-specific.
With a little work it could be the basis for a general small object
allocator as it now exists.)

Cheers,

--
Skip Montanaro | Python - it's not just for scripting anymore...
sk...@calendar.com | http://www.python.org/
(518)372-5583 | Musi-Cal ------> http://concerts.calendar.com/

Skip Montanaro

unread,
Feb 8, 1997, 3:00:00 AM2/8/97
to

I installed modified versions of stringobject.c and stropmodule.c on our web
server. They are accessible via

http://www.automatrix.com/~skip/python/

Warning: This is just a first attempt. I've done some testing, but not a
bunch. Use this on an experimental basis only. The code isn't yet properly
packaged, and there are some definite warts on it. Feedback is welcome.

One thing I haven't yet figured out is this: Given an arbitrary number, n,
return the power of two that is equal to or greater than n *without writing
a loop*. I can clearly do something like:

for (i = 1; i < n; i <<= 1);

Can it be done using a single bit-twiddling expression though?

Guido van Rossum

unread,
Feb 8, 1997, 3:00:00 AM2/8/97
to

> I installed modified versions of stringobject.c and stropmodule.c on our web
> server. They are accessible via
>
> http://www.automatrix.com/~skip/python/

Cool. I read your description and am very pleased with your
approach. Did you benchmark it with pystone yet? (I'm still waiting
for a better benchmark, but nobody has given me one yet... ;-)

> Warning: This is just a first attempt. I've done some testing, but not a
> bunch. Use this on an experimental basis only. The code isn't yet properly
> packaged, and there are some definite warts on it. Feedback is welcome.

I immediately went to resizestring (to check that you'd taken care of
it properly -- you have) and noticed that there's one easy
optimization there: when the new and old sizes fall in the same
bucket, you don't need to do anything except change the ob_size field.

> One thing I haven't yet figured out is this: Given an arbitrary number, n,
> return the power of two that is equal to or greater than n *without writing
> a loop*. I can clearly do something like:
>
> for (i = 1; i < n; i <<= 1);
>
> Can it be done using a single bit-twiddling expression though?

For numbers < 256 you can do it with a single table lookup, if you can
spare 256 bytes for the table. For larger numbers you can quickly
find the highest byte in the number that's non-zero and use that to
index the table and add 8* the byte number (if you count from the
right end ;_)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Peter Hart

unread,
Feb 9, 1997, 3:00:00 AM2/9/97
to

Skip:

>> One thing I haven't yet figured out is this: Given an arbitrary number, n,
>> return the power of two that is equal to or greater than n *without writing
>> a loop*. I can clearly do something like:
>>
>> for (i = 1; i < n; i <<= 1);
>>
>> Can it be done using a single bit-twiddling expression though?
>
>For numbers < 256 you can do it with a single table lookup, if you can
>spare 256 bytes for the table. For larger numbers you can quickly
>find the highest byte in the number that's non-zero and use that to
>index the table and add 8* the byte number (if you count from the
>right end ;_)
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)

* untested code*
It's not a single bittwiddle, but I think:

x |= x >> 1; // 00100100 -> 00100100 | 00010010 = 00110110
x |= x >> 2; // 00110110 -> 00110110 | 00001101 = 00111111
x |= x >> 4; // 00111111 -> 00111111 | 00000011 = 00111111
x = x+1; // 00111111 -> 01000000

almost works for 8 bits. Two more steps and you've got 32 bits. I
assume a decent compiler should be able to make this happen very
quickly.... Unfortunately it gives the power of two <greater than> x.
I think you'd need something like:

x = y;
(above code inserted)
if (x >> 1 == y) x = y;

To make it work for the power of two <greater than or equal to> x.

-Pete Hart.
'Waste Please' - Exhortation on bins in DisneyWorld....
------------------------------------------------------------------------------
Peter Hart: ze...@one.net, 10604...@compuserve.com. Voice: +1-513-579-9957
135, Garfield Place Apt#423, Cincinnati OH45202, USA.


William Lewis

unread,
Feb 14, 1997, 3:00:00 AM2/14/97
to

In article <3302147b...@news.one.net>, ze...@one.net (Peter Hart) writes:
[... nifty bit twiddle snipped ...]

> Unfortunately it gives the power of two <greater than> x.
>I think you'd need something like:
>
> x = y;
> (above code inserted)
> if (x >> 1 == y) x = y;
>
>To make it work for the power of two <greater than or equal to> x.

Well, you could always just decrement x before doing the twiddle.
Saves you a branch (and possible pipeline stall); and since a compare
is the same as a subtract, it shouldn't slow you down any...

--
Wim Lewis / wiml@{omnigroup.com|hhhh.org} / I do not speak for Omni
PGP 0x27F772C1: 0C 0D 10 D5 FC 73 D1 35 26 46 42 9E DC 6E 0A 88

anish198...@gmail.com

unread,
Feb 16, 2014, 10:27:43 PM2/16/14
to
On Saturday, February 8, 1997 12:00:00 AM UTC-8, Guido van Rossum wrote:
> > I installed modified versions of stringobject.c and stropmodule.c on our web
> > server. They are accessible via
> >
> > http://www.automatrix.com/~skip/python/
>
> Cool. I read your description and am very pleased with your
> approach. Did you benchmark it with pystone yet? (I'm still waiting
> for a better benchmark, but nobody has given me one yet... ;-)
>
> > Warning: This is just a first attempt. I've done some testing, but not a
> > bunch. Use this on an experimental basis only. The code isn't yet properly
> > packaged, and there are some definite warts on it. Feedback is welcome.
>
> I immediately went to resizestring (to check that you'd taken care of
> it properly -- you have) and noticed that there's one easy
> optimization there: when the new and old sizes fall in the same
> bucket, you don't need to do anything except change the ob_size field.
>
> > One thing I haven't yet figured out is this: Given an arbitrary number, n,
> > return the power of two that is equal to or greater than n *without writing
> > a loop*. I can clearly do something like:
> >
> > for (i = 1; i < n; i <<= 1);
> >
> > Can it be done using a single bit-twiddling expression though?
>
> For numbers < 256 you can do it with a single table lookup, if you can
> spare 256 bytes for the table. For larger numbers you can quickly
> find the highest byte in the number that's non-zero and use that to
> index the table and add 8* the byte number (if you count from the
> right end ;_)

Below is the code for this idea.

#include <stdio.h>
int log[256];
int next_power_of_two(int no)
{
int t, tt, r;
if(tt = no>> 16) {
r = (t = tt >> 8)?24+log[tt]:16+log[t];
} else {
r = (t = no >> 8)?8+log[t]:log[no];
}
return r;
}

void make_table()
{
int i;

log[0] = 0;
log[1] = 1;
for(i=2;i<256;i++) {
log[i] = 1 + log[i/2];
}
}

int main()
{
int no = 512;
make_table();
printf ("%d\n", next_power_of_two(no));
0 new messages