Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to extract source code from code objects ?

2 views
Skip to first unread message

Michele Simionato

unread,
Jan 2, 2003, 11:14:52 AM1/2/03
to
I can compile a string in a code object:

>>> code_object=compile('x=1','<string>','exec')

now the code_object.co_code attribute contains the bytecode corresponding
to the statement x=1:

>>> code_object.co_code
'\x7f\x00\x00\x7f\x01\x00d\x00\x00Z\x00\x00d\x01\x00S'

Is there a way to have back the source code, i.e. the string 'x=1'
instead of the bytecode ?
TIA,


--
Michele Simionato - Dept. of Physics and Astronomy
210 Allen Hall Pittsburgh PA 15260 U.S.A.
Phone: 001-412-624-9041 Fax: 001-412-624-9163
Home-page: http://www.phyast.pitt.edu/~micheles/

Skip Montanaro

unread,
Jan 2, 2003, 11:21:36 AM1/2/03
to

Michele> Is there a way to have back the source code, i.e. the string
Michele> 'x=1' instead of the bytecode ?

I can think of two answers: decompilation and locating the source file.
Since strings aren't associated with files, that option is out. That leaves
decompilation. Google for "decompyle".

--
Skip Montanaro - sk...@pobox.com
http://www.musi-cal.com/
http://www.mojam.com/

Jp Calderone

unread,
Jan 2, 2003, 11:35:45 AM1/2/03
to
On Thu, Jan 02, 2003 at 08:14:52AM -0800, Michele Simionato wrote:
> I can compile a string in a code object:
>
> >>> code_object=compile('x=1','<string>','exec')
>
> now the code_object.co_code attribute contains the bytecode corresponding
> to the statement x=1:
>
> >>> code_object.co_code
> '\x7f\x00\x00\x7f\x01\x00d\x00\x00Z\x00\x00d\x01\x00S'
>
> Is there a way to have back the source code, i.e. the string 'x=1'
> instead of the bytecode ?
> TIA,

>>> import decompyle
>>> decompyle.decompyle(compile('x=1', '<string>', 'exec'))
x = 1

http://www.crazy-compilers.com/decompyle/

Jp

Michele Simionato

unread,
Jan 3, 2003, 10:25:43 AM1/3/03
to
Jp Calderone <exa...@intarweb.us> wrote in message news:<mailman.1041525363...@python.org>...

> On Thu, Jan 02, 2003 at 08:14:52AM -0800, Michele Simionato wrote:
> > I can compile a string in a code object:
> >
> > >>> code object=compile('x=1','<string>','exec')
> >
> > now the code object.co code attribute contains the bytecode corresponding
> > to the statement x=1:
> >
> > >>> code object.co code

> > '\x7f\x00\x00\x7f\x01\x00d\x00\x00Z\x00\x00d\x01\x00S'
> >
> > Is there a way to have back the source code, i.e. the string 'x=1'
> > instead of the bytecode ?
> > TIA,
>
> >>> import decompyle
> >>> decompyle.decompyle(compile('x=1', '<string>', 'exec'))
> x = 1
>
> http://www.crazy-compilers.com/decompyle/
>
> Jp
>
>

decompyle works and it is easy to install and to use. However, it
is not in the standard library and this implies two disadvantages:
i) it is not universally available;
ii) for any new version of Python I must download a new version of decompyle.

I was hoping for a more built-in solution, as for instance
an attribute .co_source in code objects. I guess this would be
inefficient for memory consumption or for other technical reasons.

My original problem was to convert back to source an AST object
generated with the parser module. Since parser.st objects can be converted
to code objects, I asked how to extract source code from code objects.
It seems that there is no simple built-in way of doing that (decompyle
is a few thousands of lines of not so simple Python code).

I guess the reason why there is no standard way to extract the source
from AST objects is that their implementation (can) change with new
version of Python, therefore there would be a maintanance issue
(as it happens with decompyle). Am I correct ?

Cheers,


Michele

Skip Montanaro

unread,
Jan 3, 2003, 10:47:17 AM1/3/03
to

Michele> decompyle works and it is easy to install and to use. However,
Michele> it is not in the standard library and this implies two
Michele> disadvantages:
Michele> i) it is not universally available;
Michele> ii) for any new version of Python I must download a new version
Michele> of decompyle.

Michele> I was hoping for a more built-in solution, as for instance an
Michele> attribute .co_source in code objects. I guess this would be
Michele> inefficient for memory consumption or for other technical
Michele> reasons.

Inefficient, perhaps, but not hugely, I wouldn't think, since code objects
are relatively rare beasts in the Python object landscape. The source code
isn't normally needed, because in most cases it can be looked up in the
source file (check out the inspect.getsource function). If you can provide
a convincing use case why the source for manually compiled code should be
retained in a feature request on SourceForge, it's possible that someone
will add a co_source field to code objects. It would help if you supplied a
patch that implemented the feature request. (Note that simply extending
code objects with a co_source slot won't be enough. The compiler will have
to be modified to stuff the code in there.)

An alternative is to simply maintain a dictionary that maps code objects to
source code, e.g.,:

>>> def f(): pass
...
>>> f
<function f at 0x3ba538>
>>> d = {}
>>> d[f.func_code] = 'pass'

Since code objects are immutable, they can be used as dictionary keys.
Would that be sufficient for your needs?

Skip

"Martin v. Löwis"

unread,
Jan 3, 2003, 11:03:02 AM1/3/03
to Michele Simionato
Michele Simionato wrote:
> I was hoping for a more built-in solution, as for instance
> an attribute .co_source in code objects. I guess this would be
> inefficient for memory consumption or for other technical reasons.

Can you please explain why you are hoping for a builtin solution?

> My original problem was to convert back to source an AST object
> generated with the parser module. Since parser.st objects can be converted
> to code objects, I asked how to extract source code from code objects.
> It seems that there is no simple built-in way of doing that (decompyle
> is a few thousands of lines of not so simple Python code).

Generating source from an AST object is much simpler than generating
source from byte code. However, I still wonder where the need comes from
in the first place: If you have just created the AST, you surely still
have a copy of the source code around.

> I guess the reason why there is no standard way to extract the source
> from AST objects is that their implementation (can) change with new
> version of Python, therefore there would be a maintanance issue
> (as it happens with decompyle). Am I correct ?

No. There is no way to do that because nobody ever wanted to do it (or
atleast no frequently enough to contribute a solution to the standard
library). If such a thing would be contributed then you are right: there
would be a maintenance issue. So any volunteer to contribute that
feature would also need to volunteer to maintain it.

In Python itself, this entire issue is solved more pragmatically: When
you need the source (e.g. for a traceback), you look at the line number,
locate the source file, and print the relevant lines.

Regards,
Martin

holger krekel

unread,
Jan 3, 2003, 11:08:01 AM1/3/03
to
Skip Montanaro wrote:
>
> Michele> decompyle works and it is easy to install and to use. However,
> Michele> it is not in the standard library and this implies two
> Michele> disadvantages:
> Michele> i) it is not universally available;
> Michele> ii) for any new version of Python I must download a new version
> Michele> of decompyle.
>
> Michele> I was hoping for a more built-in solution, as for instance an
> Michele> attribute .co_source in code objects. I guess this would be
> Michele> inefficient for memory consumption or for other technical
> Michele> reasons.
>
> Inefficient, perhaps, but not hugely, I wouldn't think, since code objects
> are relatively rare beasts in the Python object landscape. The source code
> isn't normally needed, because in most cases it can be looked up in the
> source file (check out the inspect.getsource function). If you can provide
> a convincing use case why the source for manually compiled code should be
> retained in a feature request on SourceForge, it's possible that someone
> will add a co_source field to code objects. It would help if you supplied a
> patch that implemented the feature request. (Note that simply extending
> code objects with a co_source slot won't be enough. The compiler will have
> to be modified to stuff the code in there.)

Add the ability to *modify* the co_code object and then we could do
live-editing of objects: interactively modify a function object by
modifying it's source and recompiling the co_code object. This would also
help to version control on the object level. inspect.getsource isn't
that exact especially since you can't use it within the __main__ module.

i guess modifying the co_code object might get tricky as it currently
is an immutable object. Or is this easily solved?

holger

Michele Simionato

unread,
Jan 3, 2003, 1:26:59 PM1/3/03
to
Thanks for the kind replies. Here is my motivation.
I was experimenting with symbolic manipulations in Python (see a
thread on this
subject I started few weeks ago). In that thread I asked for a
function able
to perform nontrivial substitutions in Python expression. For instance
I
wanted a function that could transform the expression

"square(square(x+y)+z)+square(x+w)"

in

"((x+y)**2+z)**2+(x+w)**2"

when square is the function that sends x -> x**2.

For this simple problem Bengt Richter provided me with a nice solution
involving the tokenize module. However that solution is not general.
In the case of complicate arguments (involving nested lists for
instance) it is not enough to scan the Python code, one needs to
parse it. Therefore, I wanted to use the parser module to extract the
AST of the original expression, modify it, and go back to source code
for the modified AST. However, I haven't seen a builtin mechanism to
convert AST -> source. One can convert AST -> bytecode and for this
reason I wanted to convert bytecode to source. It would be much better
for me to convert directly AST -> source. How easy/difficult is it ?

I haven't in mind a project involving symbolic manipulations, I am
considering
this problem with the only aim of better understanding how Python
works
under the hood (incidentally, notice that this kind of manipulations
on the source code are very close to having macros in Python).

Michele

John Roth

unread,
Jan 3, 2003, 8:55:02 PM1/3/03
to

"Michele Simionato" <mi...@pitt.edu> wrote in message
news:2259b0e2.03010...@posting.google.com...

> I can compile a string in a code object:
>
> >>> code_object=compile('x=1','<string>','exec')
>
> now the code_object.co_code attribute contains the bytecode
corresponding
> to the statement x=1:
>
> >>> code_object.co_code
> '\x7f\x00\x00\x7f\x01\x00d\x00\x00Z\x00\x00d\x01\x00S'
>
> Is there a way to have back the source code, i.e. the string 'x=1'
> instead of the bytecode ?
> TIA,

Look at the "inspect" module - it's 3.11 in the 2.2 Python Library
Reference.
I think this may do exactly what you want. It does, of course, require
that you have the source modules availible.

John Roth

Pieter Nagel

unread,
Jan 4, 2003, 6:32:24 AM1/4/03
to mi...@pitt.edu
In the short term, you could solve your problem by using the
compiler.visitor module to walk the AST.

Something like:

class MathExpressionASTVisitor:
def visitCallFunc(self, callFunc):
funcName = callFunc.node.name # or however this is done
if funcName == 'square':
print callFunc.args, '**2',
else:
print funcName, '(', callFunc.args, ')',

visitor = MathExpressionASTVisitor()
import compiler
compiler.visitor.ASTVisitor().preorder(ast, visitor)

But this is only crude pseudocode.

In the long term, I would suggest that you use your own type of abstract
syntax tree to represent the parsed expressions, instead of Python's
AST. When you do more complex manipulations, you are going to need more
and more code on the ast tree that Python does not offer.

Of course, as long as the syntax of your expressions is sufficiently
Pythonesque, you can use the Python compiler module to parse it for you;
and then use the visitor pattern above to transform the Python AST into
your AST.

--
,_
/_) /| /
/ i e t e r / |/ a g e l
http://www.nagel.co.za

0 new messages