Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How Python Implements "long integer"?

7 views
Skip to first unread message

Pedram

unread,
Jul 5, 2009, 3:38:30 AM7/5/09
to
Hello,
I'm reading about implementation of long ints in Python. I downloaded
the source code of CPython and will read the longobject.c, but from
where I should start reading this file? I mean which function is the
first?
Anyone can help?
Thanks

Pedram

Mark Dickinson

unread,
Jul 5, 2009, 5:57:02 AM7/5/09
to
On Jul 5, 8:38 am, Pedram <pm567...@gmail.com> wrote:
> Hello,
> I'm reading about implementation of long ints in Python. I downloaded
> the source code of CPython and will read the longobject.c, but from
> where I should start reading this file? I mean which function is the
> first?

I don't really understand the question: what do you mean by 'first'?
It might help if you tell us what your aims are.

In any case, you probably also want to look at the Include/
longintrepr.h and Include/longobject.h files.

Mark

Pedram

unread,
Jul 5, 2009, 8:09:49 AM7/5/09
to

Thanks for reply,
Sorry I can't explain too clear! I'm not English ;)
But I want to understand the implementation of long int object in
Python. How Python allocates memory and how it implements operations
for this object?
Although, I'm reading the source code (longobject.c and as you said,
longintrepr.h and longobject.h) but if you can help me, I really
appreciate that.

Pedram

Mark Dickinson

unread,
Jul 5, 2009, 9:04:11 AM7/5/09
to
On Jul 5, 1:09 pm, Pedram <pm567...@gmail.com> wrote:
> Thanks for reply,
> Sorry I can't explain too clear! I'm not English ;)

That's shocking. Everyone should be English. :-)

> But I want to understand the implementation of long int object in
> Python. How Python allocates memory and how it implements operations
> for this object?

I'd pick one operation (e.g., addition), and trace through the
relevant functions in longobject.c. Look at the long_as_number
table to see where to get started.

In the case of addition, that table shows that the nb_add slot is
given by long_add. long_add does any necessary type conversions
(CONVERT_BINOP) and then calls either x_sub or x_add to do the real
work.
x_add calls _PyLong_New to allocate space for a new PyLongObject, then
does the usual digit-by-digit-with-carry addition. Finally, it
normalizes
the result (removes any unnecessary zeros) and returns.

As far as memory allocation goes: almost all operations call
_PyLong_New at some point. (Except in py3k, where it's a bit more
complicated because small integers are cached.)

If you have more specific questions I'll have a go at answering them.

Mark

Pedram

unread,
Jul 5, 2009, 10:15:11 AM7/5/09
to
On Jul 5, 5:04 pm, Mark Dickinson <dicki...@gmail.com> wrote:

> That's shocking.  Everyone should be English. :-)

Yes, I'm trying :)

> I'd pick one operation (e.g., addition), and trace through the
> relevant functions in longobject.c.  Look at the long_as_number
> table to see where to get started.
>
> In the case of addition, that table shows that the nb_add slot is
> given by long_add.  long_add does any necessary type conversions
> (CONVERT_BINOP) and then calls either x_sub or x_add to do the real
> work.
> x_add calls _PyLong_New to allocate space for a new PyLongObject, then
> does the usual digit-by-digit-with-carry addition.  Finally, it
> normalizes
> the result (removes any unnecessary zeros) and returns.
>
> As far as memory allocation goes: almost all operations call
> _PyLong_New at some point.  (Except in py3k, where it's a bit more
> complicated because small integers are cached.)

Oh, I didn't see long_as_number before. I'm reading it. That was very
helpful, thanks.

> If you have more specific questions I'll have a go at answering them.
>
> Mark

Thank you a million.
I will write your name in my "Specially thanks to" section of my
article (In font size 72) ;)

Pedram

Pablo Torres N.

unread,
Jul 5, 2009, 10:30:47 AM7/5/09
to pytho...@python.org
On Sun, Jul 5, 2009 at 04:57, Mark Dickinson<dick...@gmail.com> wrote:
> On Jul 5, 8:38 am, Pedram <pm567...@gmail.com> wrote:
>> Hello,
>> I'm reading about implementation of long ints in Python. I downloaded
>> the source code of CPython and will read the longobject.c, but from
>> where I should start reading this file? I mean which function is the
>> first?
>
> I don't really understand the question:  what do you mean by 'first'?
> It might help if you tell us what your aims are.

I think he means the entry point, problem is that libraries have many.


--
Pablo Torres N.

Pedram

unread,
Jul 5, 2009, 12:01:30 PM7/5/09
to
Hello again,
This time I have a simple C question!
As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
found PyObject_NEW_VAR in objimpl.h header file. But I can't
understand the last line :( Here's the code:

#define PyObject_NEW_VAR(type, typeobj, n) \
( (type *) PyObject_InitVar( \
(PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
(n)) ),\
(typeobj), (n)) )

I know this will replace the PyObject_New_VAR(type, typeobj, n)
everywhere in the code and but I can't understand the last line, which
is just 'typeobj' and 'n'! What do they do? Are they make any sense in
allocation process?

Aahz

unread,
Jul 5, 2009, 12:12:19 PM7/5/09
to
In article <6f6be2b9-49f4-4db0...@l31g2000yqb.googlegroups.com>,

Look in the code to find out what PyObject_InitVar() does -- and, more
importantly, what its signature is. The clue you're missing is the
trailing backslash on the third line, but that should not be required if
you're using an editor that shows you matching parentheses.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha

Pedram

unread,
Jul 5, 2009, 12:32:41 PM7/5/09
to
On Jul 5, 8:12 pm, a...@pythoncraft.com (Aahz) wrote:
> In article <6f6be2b9-49f4-4db0-9c21-52062d8ea...@l31g2000yqb.googlegroups.com>,

>
>
>
> Pedram  <pm567...@gmail.com> wrote:
>
> >This time I have a simple C question!
> >As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
> >found PyObject_NEW_VAR in objimpl.h header file. But I can't
> >understand the last line :( Here's the code:
>
> >#define PyObject_NEW_VAR(type, typeobj, n) \
> >( (type *) PyObject_InitVar( \
> >      (PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
> >(n)) ),\
> >      (typeobj), (n)) )
>
> >I know this will replace the PyObject_New_VAR(type, typeobj, n)
> >everywhere in the code and but I can't understand the last line, which
> >is just 'typeobj' and 'n'! What do they do? Are they make any sense in
> >allocation process?
>
> Look in the code to find out what PyObject_InitVar() does -- and, more
> importantly, what its signature is.  The clue you're missing is the
> trailing backslash on the third line, but that should not be required if
> you're using an editor that shows you matching parentheses.
> --
> Aahz (a...@pythoncraft.com)           <*>        http://www.pythoncraft.com/

>
> "as long as we like the same operating system, things are cool." --piranha

No, they wrapped the 3rd line!

I'll show you the code in picture below:
http://lh3.ggpht.com/_35nHfALLgC4/SlDVMEl6oOI/AAAAAAAAAKg/vPWA1gttvHM/s640/Screenshot.png

As you can see the PyObject_MALLOC has nothing to do with typeobj and
n in line 4.

Pedram

unread,
Jul 5, 2009, 12:34:37 PM7/5/09
to
> I'll show you the code in picture below:http://lh3.ggpht.com/_35nHfALLgC4/SlDVMEl6oOI/AAAAAAAAAKg/vPWA1gttvHM...

>
> As you can see the PyObject_MALLOC has nothing to do with typeobj and
> n in line 4.

Oooooh! What a mistake! I got it! they're Py_Object_InitVar
parameters.
Sorry and Thanks!

Pedram

unread,
Jul 6, 2009, 8:24:43 AM7/6/09
to
OK, fine, I read longobject.c at last! :)
I found that longobject is a structure like this:

struct _longobject {
struct _object *_ob_next;
struct _object *_ob_prev;
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
digit ob_digit[1];
}

And a digit is a 15-item array of C's unsigned short integers.
Am I right? Or I missed something! Is this structure is constant in
all environments (Linux, Windows, Mobiles, etc.)?

Mark Dickinson

unread,
Jul 6, 2009, 9:46:05 AM7/6/09
to
On Jul 6, 1:24 pm, Pedram <pm567...@gmail.com> wrote:
> OK, fine, I read longobject.c at last! :)
> I found that longobject is a structure like this:
>
> struct _longobject {
>     struct _object *_ob_next;
>     struct _object *_ob_prev;

For current CPython, these two fields are only present in debug
builds; for a normal build they won't exist.

>     Py_ssize_t ob_refcnt;
>     struct _typeobject *ob_type;

You're missing an important field here (see the definition of
PyObject_VAR_HEAD):

Py_ssize_t ob_size; /* Number of items in variable part */

For the current implementation of Python longs, the absolute value of
this field gives the number of digits in the long; the sign gives the
sign of the long (0L is represented with zero digits).

>     digit ob_digit[1];

Right. This is an example of the so-called 'struct hack' in C; it
looks as though there's just a single digit, but what's intended here
is that there's an array of digits tacked onto the end of the struct;
for any given PyLongObject, the size of this array is determined at
runtime. (C99 allows you to write this as simply ob_digit[], but not
all compilers support this yet.)

> }

> And a digit is a 15-item array of C's unsigned short integers.

No: a digit is a single unsigned short, which is used to store 15 bits
of the Python long. Python longs are stored in sign-magnitude format,
in base 2**15. So each of the base 2**15 'digits' is an integer in
the range [0, 32767). The unsigned short type is used to store those
digits.

Exception: for Python 2.7+ or Python 3.1+, on 64-bit machines, Python
longs are stored in base 2**30 instead of base 2**15, using a 32-bit
unsigned integer type in place of unsigned short.

> Is this structure is constant in
> all environments (Linux, Windows, Mobiles, etc.)?

I think it would be dangerous to rely on this struct staying constant,
even just for CPython. It's entirely possible that the representation
of Python longs could change in Python 2.8 or 3.2. You should use the
public, documented C-API whenever possible.

Mark

Pedram

unread,
Jul 6, 2009, 11:13:00 AM7/6/09
to
Hello Mr. Dickinson. Glad to see you again :)

On Jul 6, 5:46 pm, Mark Dickinson <dicki...@gmail.com> wrote:
> On Jul 6, 1:24 pm, Pedram <pm567...@gmail.com> wrote:
>
> > OK, fine, I read longobject.c at last! :)
> > I found that longobject is a structure like this:
>
> > struct _longobject {
> >     struct _object *_ob_next;
> >     struct _object *_ob_prev;
>
> For current CPython, these two fields are only present in debug
> builds;  for a normal build they won't exist.

I couldn't understand the difference between them. What are debug
build and normal build themselves? And You mean in debug build
PyLongObject is a doubly-linked-list but in normal build it is just an
array (Or if not how it'll store in this mode)?

> >     Py_ssize_t ob_refcnt;
> >     struct _typeobject *ob_type;
>
> You're missing an important field here (see the definition of
> PyObject_VAR_HEAD):
>
>     Py_ssize_t ob_size; /* Number of items in variable part */
>
> For the current implementation of Python longs, the absolute value of
> this field gives the number of digits in the long;  the sign gives the
> sign of the long (0L is represented with zero digits).

Oh, you're right. I missed that. Thanks :)

> >     digit ob_digit[1];
>
> Right.  This is an example of the so-called 'struct hack' in C; it
> looks as though there's just a single digit, but what's intended here
> is that there's an array of digits tacked onto the end of the struct;
> for any given PyLongObject, the size of this array is determined at
> runtime.  (C99 allows you to write this as simply ob_digit[], but not
> all compilers support this yet.)

WOW! I didn't know anything about 'struct hacks'! I read about them
and they were very wonderful. Thanks for your point. :)

> > }
> > And a digit is a 15-item array of C's unsigned short integers.
>
> No: a digit is a single unsigned short, which is used to store 15 bits
> of the Python long.  Python longs are stored in sign-magnitude format,
> in base 2**15.  So each of the base 2**15 'digits' is an integer in
> the range [0, 32767).  The unsigned short type is used to store those
> digits.
>
> Exception: for Python 2.7+ or Python 3.1+, on 64-bit machines, Python
> longs are stored in base 2**30 instead of base 2**15, using a 32-bit
> unsigned integer type in place of unsigned short.
>
> > Is this structure is constant in
> > all environments (Linux, Windows, Mobiles, etc.)?
>
> I think it would be dangerous to rely on this struct staying constant,
> even just for CPython.  It's entirely possible that the representation
> of Python longs could change in Python 2.8 or 3.2.  You should use the
> public, documented C-API whenever possible.
>
> Mark

Thank you a lot Mark :)

Bruno Desthuilliers

unread,
Jul 6, 2009, 1:56:00 PM7/6/09
to
Mark Dickinson a �crit :

> On Jul 5, 1:09 pm, Pedram <pm567...@gmail.com> wrote:
>> Thanks for reply,
>> Sorry I can't explain too clear! I'm not English ;)
>
> That's shocking. Everyone should be English. :-)
>
Mark, tu sors !

Eric Wong

unread,
Jul 7, 2009, 2:56:40 AM7/7/09
to
Pedram wrote:

> Hello Mr. Dickinson. Glad to see you again :)
>
> On Jul 6, 5:46 pm, Mark Dickinson <dicki...@gmail.com> wrote:
>> On Jul 6, 1:24 pm, Pedram <pm567...@gmail.com> wrote:
>>
>> > OK, fine, I read longobject.c at last! :)
>> > I found that longobject is a structure like this:
>>
>> > struct _longobject {
>> > struct _object *_ob_next;
>> > struct _object *_ob_prev;
>>
>> For current CPython, these two fields are only present in debug
>> builds; for a normal build they won't exist.
>
> I couldn't understand the difference between them. What are debug
> build and normal build themselves? And You mean in debug build
> PyLongObject is a doubly-linked-list but in normal build it is just
an
> array (Or if not how it'll store in this mode)?
>

we use the macro Py_TRACE_REFS to differ the code for debug build and
normal build, that's to say, in debug build and normal build the codes
are actually *different*. In debug build, not only PyLongObject but
all Objects are linked by a doubly-linked-list and it can make the
debug process less painful. But in normal build, objects are
seperated! After an object is created, it will never be moved, so we
can and should refer to an object only by it's address(pointer).
There's no one-big-container like a list or an array for objects.

Mark Dickinson

unread,
Jul 7, 2009, 5:10:31 AM7/7/09
to
On Jul 6, 4:13 pm, Pedram <pm567...@gmail.com> wrote:
> On Jul 6, 5:46 pm, Mark Dickinson <dicki...@gmail.com> wrote:
> > On Jul 6, 1:24 pm, Pedram <pm567...@gmail.com> wrote:
>
> > > OK, fine, I read longobject.c at last! :)
> > > I found that longobject is a structure like this:
>
> > > struct _longobject {
> > >     struct _object *_ob_next;
> > >     struct _object *_ob_prev;
>
> > For current CPython, these two fields are only present in debug
> > builds;  for a normal build they won't exist.
>
> I couldn't understand the difference between them. What are debug
> build and normal build themselves? And You mean in debug build
> PyLongObject is a doubly-linked-list but in normal build it is just an
> array (Or if not how it'll store in this mode)?

No: a PyLongObject is stored the same way (ob_size giving sign and
number of digits, ob_digit giving the digits themselves) whether or
not a debug build is in use.

A debug build does various things (extra checks, extra information) to
make it easier to track down problems. On Unix-like systems, you can
get a debug build by configuring with the --with-pydebug flag.

The _ob_next and _ob_prev fields have nothing particularly to do with
Python longs; for a debug build, these two fields are added to *all*
Python objects, and provide a doubly-linked list that links all 'live'
Python objects together. I'm not really sure what, if anything, the
extra information is used for within Python---it might be used by some
external tools, I guess.

Have you looked at the C-API documentation?

http://docs.python.org/c-api/index.html

_ob_next and _ob_prev are described here:

http://docs.python.org/c-api/typeobj.html#_ob_next

(These docs are for Python 2.6; I'm not sure what version you're
working with.)

Mark

Pedram

unread,
Jul 7, 2009, 7:06:07 AM7/7/09
to

It seems there's an island named Python!
Thanks for links, I'm on reading them.

0 new messages