New bytecode tracer merged into trunk

302 views
Skip to first unread message

Michał Kwiatkowski

unread,
Sep 8, 2010, 11:31:20 AM9/8/10
to pytho...@googlegroups.com
Hi!

For the last couple of months I've been working on a new bytecode
tracer that allows reliable tracing of side effects. A few days ago I
merged that new tracer into Pythoscope's trunk. What I need now is
more testing. All internal Pythoscope tests pass, but it's very
probable I may have unintentionally broken something. The new tracer
is plugged-in, but the new information is not yet used. That's what
I'll focus on in the coming weeks. If you want to know more details,
read on.

Idea that I naively thought was going to solve the whole problem was
using setprofile and its 'c_call' and 'c_return' events. All side
effects have a C function counterpart (or rather - that's what I
thought at first). So, if I could intercept those calls I could track
all side effects. Well, it sounded OK until I've tried it. Turns out,
'c_call' and 'c_return' events are not so friendly as the 'call' and
'return' events are. There simply is no way to get to the values
passed to and returned by the C function! All I knew was that the call
happened and which function was involved, but that was it.

I wrote a mail to comp.lang.python[1], but got no answer, what could
be expected, considering a narrow and low-level nature of this stuff.

I dug through CPython sources which gave me some other ideas about how
to approach this problem. I found a way to poke my nose into internal
structure of a Python object using ctypes, and through that I could
access the value stack. You see, Python stack works almost like a C
stack during execution - before a function is called all its arguments
are put on the stack. Return value is also put on the stack, so that
the caller can pop it and either use it or discard it.

I thought I was saved, until I learned that the return value is put on
the stack *after* the trace function for 'c_return' event is called.
So, I was in this weird situation where I was able to record the
arguments, but not the return value of a call. At this point I knew
setprofile was a dead-end.

I tried a different approach and modified Ned Batchelder's bytecode
tracing hack[2] to rewrite functions on the fly during tracing. There
were some problems along the way, which needed further tweaking (most
notable one being a custom import hook), but the general idea of a
bytecode tracer turned out to be good. So, instead of tracing by
events or lines, the new tracer goes bytecode-by-bytecode. If you're
interested in learning more about development of this tracer, full
history is at github.[3]

There were more reasons for tracing bytecodes.

First was that in Python 2.3 'c_call' and 'c_return' events are not
reported at all. So, if I wanted to still support that version, I had
to use a different approach than relying on setprofile.

Second, my initial assumption was false - there is one group of
bytecodes that *do* side effects. I'm talking about the PRINT_* family
of bytecodes. Those don't generate any trace events, yet have some
very visible effect.

All in all, it seems to work pretty well, although as I mentioned at
the beginning, it still needs further testing. Also, I feel a bit
uneasy with the fact that I'm using a tandem of hacks here: bytecode
rewriting and using ctypes to get to the internals of a Python object.

Good side of that is that I understand much more about ctypes
extension now, and I will be able to replace C implementation of _util
with Python version that uses ctypes. I think it will be a change for
a better, as from the user's point of view it's easier to install a
package than to compile a C extension. I'm still not sure if what I do
with ctypes is evil, but hey, it's there for a reason. ;)

Anyway, thanks for reading and all comments are welcome! :-)

Cheers,
mk

[1] http://groups.google.com/group/comp.lang.python/browse_frm/thread/75e786bb6752e730
[2] http://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
[3] http://github.com/infrared/bytecode-hack

Paul Hildebrandt

unread,
Sep 8, 2010, 7:07:53 PM9/8/10
to pytho...@googlegroups.com
This is awesome!  Thanks for writing it up I appreciate the journey as well as the result.
--

C. Titus Brown

unread,
Sep 12, 2010, 9:47:47 PM9/12/10
to pytho...@googlegroups.com
this is very cool! blog post?

> --
> You received this message because you are subscribed to the Google Groups "Pythoscope" group.
> To post to this group, send email to pytho...@googlegroups.com.
> To unsubscribe from this group, send email to pythoscope+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pythoscope?hl=en.
>

--
C. Titus Brown, c...@msu.edu

Reply all
Reply to author
Forward
0 new messages