Running Pythoscope against the stdlib

20 views
Skip to first unread message

Ryan Freckleton

unread,
Oct 21, 2009, 11:25:19 PM10/21/09
to Pythoscope
Yesterday I got an itch to play with pythoscope again, so I ran it
against the latest checkout of the python standard library.

Some things I noticed:

1. Pythoscope, by default, looks through all the directories in the
current working directory. This is an issue in the standard library
because tests are in the 'test' folder, which is traversed with
everything else when pythoscope is initialized.

Solution: Add a command line option or configuration option to exclude
directories from scanning.

2. Memory use during initialization needs to be optimized. The
pythoscope process was using on the order of at least 800 MB (probably
closer to 1.5 GB based on my memory and swap usage). My suspicion is
that parse trees or "store" objects being created in memory and not
collected before the application ends. I'll have to run it through
Dowser or Guppy to verify.

Sadly, this prevented me from completing my run against the standard
library. (After I got to the "Tix.py" module my machine started to
swap continuously, More RAM would help :)

Solution: Make the parsing during initialization more lazy: open file,
parse it, write pickle, clean up memory, repeat. Don't let things hang
around in memory. Maybe generators or other lazy evaluation could be
used? My

I also tried running it against Distribute and Tornado, but didn't
have enough time to write good entry points for them. All these tests
were done with the 0.4.1 release.

If other people agree that these are legitimate enhancement requests
I'll add tickets to launchpad.

Cheers,
Ryan

sste...@gmail.com

unread,
Oct 22, 2009, 9:28:43 AM10/22/09
to pytho...@googlegroups.com, Ryan Freckleton

On Oct 21, 2009, at 11:25 PM, Ryan Freckleton wrote:

>
> Yesterday I got an itch to play with pythoscope again, so I ran it
> against the latest checkout of the python standard library.
>
> Some things I noticed:
>

> I also tried running it against Distribute and Tornado, but didn't
> have enough time to write good entry points for them. All these tests
> were done with the 0.4.1 release.
>
> If other people agree that these are legitimate enhancement requests
> I'll add tickets to launchpad.

Good points all; I'd say add'em to the tracker and at least they won't
get lost.

On a side note, would you be interested in helping set up entry points
for Distribute/distutils? I'm the QA manager for the product and
could use an experienced Pythoscope hand (I just started using it a
week ago). Feel free to contact me off-list.

Thanks,

S

Michał Kwiatkowski

unread,
Oct 22, 2009, 10:30:47 AM10/22/09
to pytho...@googlegroups.com
On Thu, Oct 22, 2009 at 5:25 AM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
> 1. Pythoscope, by default, looks through all the directories in the
> current working directory. This is an issue in the standard library
> because tests are in the 'test' folder, which is traversed with
> everything else when pythoscope is initialized.
>
> Solution: Add a command line option or configuration option to exclude
> directories from scanning.

Pythoscope scanning test directories is intentional. Pythoscope tries
to put test cases into appropriate test files, so it has to know all
of the application's source code, including tests.

Of course, I still see situations where exclude option would be
helpful, and would gladly accept patches implementing this
functionality (which BTW shouldn't be to hard to do).

> 2. Memory use during initialization needs to be optimized. The
> pythoscope process was using on the order of at least 800 MB (probably
> closer to 1.5 GB based on my memory and swap usage). My suspicion is
> that parse trees or "store" objects being created in memory and not
> collected before the application ends. I'll have to run it through
> Dowser or Guppy to verify.
>
> Sadly, this prevented me from completing my run against the standard
> library. (After I got to the "Tix.py" module my machine started to
> swap continuously, More RAM would help :)
>
> Solution: Make the parsing during initialization more lazy: open file,
> parse it, write pickle, clean up memory, repeat. Don't let things hang
> around in memory. Maybe generators or other lazy evaluation could be
> used? My

That's exactly what I implemented for 0.4.1 release, see
http://pythoscope.org/improve-information-storage-performance for
details (especially the "Conclusions" part). Sadly, due to my
oversight there was a case when the reference to AST wouldn't be freed
- in particular, if the module contained the "if __name__ ==
'__main__':" snippet. Python stdlib contains quite a lot of those and
that probably was the reason why Pythoscope leaked memory during your
tests.

The good news is that I found this bug literally yesterday and managed
to fix it in r303. Please check if the problems with memory
consumption remain on current Pythoscope trunk.

> I also tried running it against Distribute and Tornado, but didn't
> have enough time to write good entry points for them. All these tests
> were done with the 0.4.1 release.

I encourage you to check out the development version. I tend to keep
the trunk quite stable, making bigger changes in branches.

Moreover, 0.4.2 will be out soon and will contain quite a few
important fixes. For a full list see
https://launchpad.net/pythoscope/+milestone/0.4.2-dependency-cleanup .

> If other people agree that these are legitimate enhancement requests
> I'll add tickets to launchpad.

I think exclude option is a good idea - we can track that as a
blueprint: https://blueprints.launchpad.net/pythoscope/+addspec

Cheers,
mk

Michał Kwiatkowski

unread,
Oct 22, 2009, 10:39:28 AM10/22/09
to pytho...@googlegroups.com, Ryan Freckleton
On Thu, Oct 22, 2009 at 3:28 PM, sste...@gmail.com
<sste...@gmail.com> wrote:
> On a side note, would you be interested in helping set up entry points
> for Distribute/distutils?  I'm the QA manager for the product and
> could use an experienced Pythoscope hand (I just started using it a
> week ago).  Feel free to contact me off-list.

If you need help using Pythoscope feel free to post your questions
here, I'll be glad to help.

Cheers,
mk

Ryan Freckleton

unread,
Oct 22, 2009, 11:29:21 AM10/22/09
to pytho...@googlegroups.com
<snip>

>> Solution: Make the parsing during initialization more lazy: open file,
>> parse it, write pickle, clean up memory, repeat. Don't let things hang
>> around in memory. Maybe generators or other lazy evaluation could be
>> used? My
>
> That's exactly what I implemented for 0.4.1 release, see
> http://pythoscope.org/improve-information-storage-performance for
> details (especially the "Conclusions" part). Sadly, due to my
> oversight there was a case when the reference to AST wouldn't be freed
> - in particular, if the module contained the  "if __name__ ==
> '__main__':" snippet. Python stdlib contains quite a lot of those and
> that probably was the reason why Pythoscope leaked memory during your
> tests.
>
> The good news is that I found this bug literally yesterday and managed
> to fix it in r303. Please check if the problems with memory
> consumption remain on current Pythoscope trunk.
>
>> I also tried running it against Distribute and Tornado, but didn't
>> have enough time to write good entry points for them. All these tests
>> were done with the 0.4.1 release.
>
> I encourage you to check out the development version. I tend to keep
> the trunk quite stable, making bigger changes in branches.
>
> Moreover, 0.4.2 will be out soon and will contain quite a few
> important fixes. For a full list see
> https://launchpad.net/pythoscope/+milestone/0.4.2-dependency-cleanup.
>

Story of my life, "D'oh, that's a bug fixed in the latest version!"
:-) I'll repeat my tests using latest trunk of Pythoscope. Thanks
Michał!

<snip>


>
> I think exclude option is a good idea - we can track that as a
> blueprint: https://blueprints.launchpad.net/pythoscope/+addspec
>

I'll add it later today or tomorrow.
> Cheers,
> mk

Michał Kwiatkowski

unread,
Oct 22, 2009, 12:29:43 PM10/22/09
to pytho...@googlegroups.com
On Thu, Oct 22, 2009 at 5:29 PM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
> Story of my life, "D'oh, that's a bug fixed in the latest version!"
> :-) I'll repeat my tests using latest trunk of Pythoscope. Thanks
> Michał!

And thanks to you for testing Pythoscope! I really like your idea of
testing Pythoscope on Python's stdlib. Actually I think this is a
great candidate for one of the upcoming blueprints:

https://blueprints.launchpad.net/pythoscope/+spec/open-source-project-coverage

The idea is to use an open source project to track Pythoscope metrics
over time. It's still just a rough idea, but I think measuring memory
consumption and inspection time would be good starting points. I
already have some working code - if someone's interested look at the
benchmark scripts in Pythoscope's "tools" directory.

>> I think exclude option is a good idea - we can track that as a
>> blueprint: https://blueprints.launchpad.net/pythoscope/+addspec
>
> I'll add it later today or tomorrow.

Thanks again!

Cheers,
mk

Ryan Freckleton

unread,
Oct 24, 2009, 12:11:48 AM10/24/09
to pytho...@googlegroups.com
2009/10/22 Michał Kwiatkowski <consta...@gmail.com>:

I ran against latest trunk (revision 304) and memory usage was much,
MUCH better. No memory leaks at all. Pythoscope stands at about 30 MB
usage throughout the run.

Sadly, I ran into an ERROR message after inspecting the urlparse.py module.

ERROR: Oops, it seems internal Pythoscope error occured. Please
file a bug report at https://bugs.launchpad.net/pythoscope

[Which should read "It seems +that an+ internal Pythoscope error occur+r+ed"...]

No stacktrace information or anything else. Is pythoscope logging
DEBUG or TRACE level output somewhere outside of the .pythoscope
folder?

This was about 30 minutes into initializing against the standard
library. I'm trying to run pythoscope against pydoc.py right now,
we'll see how it goes.

Having to re-inspect all the modules before generating tests
is...painful. Is pythoscope inspecting the modules again instead of
just looking at timestamps? That would probably be a good gain to just
do a timestamp compare of the *.pickle file versus the *.py file
before running it through the inspector.

Hope this helps,
=====
--Ryan E. Freckleton

Michał Kwiatkowski

unread,
Oct 24, 2009, 5:28:43 AM10/24/09
to pytho...@googlegroups.com
On Sat, Oct 24, 2009 at 6:11 AM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
> I ran against latest trunk (revision 304) and memory usage was much,
> MUCH better. No memory leaks at all. Pythoscope stands at about 30 MB
> usage throughout the run.

That's good to hear. :-)

> Sadly, I ran into an ERROR message after inspecting the urlparse.py module.
>
>    ERROR: Oops, it seems internal Pythoscope error occured. Please
> file a bug report at https://bugs.launchpad.net/pythoscope
>
> [Which should read "It seems +that an+ internal Pythoscope error occur+r+ed"...]

I'll fix the message, thanks!

> No stacktrace information or anything else. Is pythoscope logging
> DEBUG or TRACE level output somewhere outside of the .pythoscope
> folder?

That's bad, there should be a traceback. Pythoscope logs everything to
stderr by default. You may try -v (--verbose) flag, but I have doubts
if it will give any additional info in this case.

I'm travelling right now, so I'll look into this issue once I get back
home on Sunday.

> This was about 30 minutes into initializing against the standard
> library. I'm trying to run pythoscope against pydoc.py right now,
> we'll see how it goes.
>
> Having to re-inspect all the modules before generating tests
> is...painful. Is pythoscope inspecting the modules again instead of
> just looking at timestamps? That would probably be a good gain to just
> do a timestamp compare of the *.pickle file versus the *.py file
> before running it through the inspector.

I assume that because the initialization failed the pickle file wasn't
written. That causes the full inspection, which will probably fail
again for the same reasons.

In normal operation, if the initialization was successful, no
additional inspection is needed during test generation.

Cheers,
mk

Ryan Freckleton

unread,
Oct 24, 2009, 2:17:04 PM10/24/09
to pytho...@googlegroups.com
2009/10/24 Michał Kwiatkowski <consta...@gmail.com>:
<snip>

>> No stacktrace information or anything else. Is pythoscope logging
>> DEBUG or TRACE level output somewhere outside of the .pythoscope
>> folder?
>
> That's bad, there should be a traceback. Pythoscope logs everything to
> stderr by default. You may try -v (--verbose) flag, but I have doubts
> if it will give any additional info in this case.
>
> I'm travelling right now, so I'll look into this issue once I get back
> home on Sunday.
>

I'll run it with the post-mortem debugger to see I can get some better
information.

>> This was about 30 minutes into initializing against the standard
>> library. I'm trying to run pythoscope against pydoc.py right now,
>> we'll see how it goes.
>>
>> Having to re-inspect all the modules before generating tests
>> is...painful. Is pythoscope inspecting the modules again instead of
>> just looking at timestamps? That would probably be a good gain to just
>> do a timestamp compare of the *.pickle file versus the *.py file
>> before running it through the inspector.
>
> I assume that because the initialization failed the pickle file wasn't
> written. That causes the full inspection, which will probably fail
> again for the same reasons.
>
> In normal operation, if the initialization was successful, no
> additional inspection is needed during test generation.
>
> Cheers,
> mk

Ok, cool.

=====
--Ryan E. Freckleton

Ryan Freckleton

unread,
Oct 24, 2009, 10:12:23 PM10/24/09
to pytho...@googlegroups.com
On Sat, Oct 24, 2009 at 12:17 PM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
> 2009/10/24 Michał Kwiatkowski <consta...@gmail.com>:
> <snip>
>>> No stacktrace information or anything else. Is pythoscope logging
>>> DEBUG or TRACE level output somewhere outside of the .pythoscope
>>> folder?
>>
>> That's bad, there should be a traceback. Pythoscope logs everything to
>> stderr by default. You may try -v (--verbose) flag, but I have doubts
>> if it will give any additional info in this case.
>>
>> I'm travelling right now, so I'll look into this issue once I get back
>> home on Sunday.
>>
>
> I'll run it with the post-mortem debugger to see I can get some better
> information.
<snip>

Ran it through the debugger. The ERROR message is being caused by the
use of namedtuple as a class factory in urlparse.py.

astvisitor.derive_class_name() is the method that's throwing the
exception. The class being inspected is something like this (minimal
example that will reproduce the error):

class Example(namedtuple('a')):
pass

It's barfing on the unexpected call in the list of class bases.

I'll aggregate this into a bug report a bit later when I have more time.

=====
--Ryan E. Freckleton

Michał Kwiatkowski

unread,
Oct 25, 2009, 4:35:58 PM10/25/09
to pytho...@googlegroups.com
2009/10/25 Ryan Freckleton <ryan.fr...@gmail.com>:

> Ran it through the debugger. The ERROR message is being caused by the
> use of namedtuple as a class factory in urlparse.py.
>
> astvisitor.derive_class_name() is the method that's throwing the
> exception. The class being inspected is something like this (minimal
> example that will reproduce the error):
>
> class Example(namedtuple('a')):
>    pass
>
> It's barfing on the unexpected call in the list of class bases.

Dynamic nature of Python showing up again. :-) Class can have
superclasses defined through computation, by no means "static". I
forgot about that. Well, I guess it's best to leave that part to
dynamic inspection.

> I'll aggregate this into a bug report a bit later when I have more time.

Thank you.

Cheers,
mk

Ryan Freckleton

unread,
Oct 25, 2009, 6:49:25 PM10/25/09
to pytho...@googlegroups.com
>> I'll aggregate this into a bug report a bit later when I have more time.
>
> Thank you.
>
> Cheers,
> mk

Submitted https://bugs.launchpad.net/pythoscope/+bug/460715


=====
--Ryan E. Freckleton

Michał Kwiatkowski

unread,
Oct 27, 2009, 12:52:44 PM10/27/09
to pytho...@googlegroups.com
On Sun, Oct 25, 2009 at 11:49 PM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
> Submitted https://bugs.launchpad.net/pythoscope/+bug/460715

And now fixed in trunk. I ran it on Python's stdlib and it completed
without failing, yay! :-)

Parser gave warnings about a few files and the resulting
.pythoscope/code-trees/ has around 200Mb, so the work isn't completely
done here yet. I'll get back to it later, now I have to focus on
releasing 0.4.2.

Cheers,
mk

C. Titus Brown

unread,
Oct 27, 2009, 12:55:11 PM10/27/09
to pytho...@googlegroups.com
On Tue, Oct 27, 2009 at 05:52:44PM +0100, Micha? Kwiatkowski wrote:
>
> On Sun, Oct 25, 2009 at 11:49 PM, Ryan Freckleton
> <ryan.fr...@gmail.com> wrote:
> > Submitted https://bugs.launchpad.net/pythoscope/+bug/460715
>
> And now fixed in trunk. I ran it on Python's stdlib and it completed
> without failing, yay! :-)

very cool, congrats!

--titus

Ryan Freckleton

unread,
Oct 27, 2009, 12:57:31 PM10/27/09
to pytho...@googlegroups.com

My sentiments as well, thanks Michał!

I should have some time later this week to some more
profiling/performance testing (and hopefully increase the stdlib's
test coverage as well :).

=====
--Ryan E. Freckleton

sste...@gmail.com

unread,
Oct 27, 2009, 1:26:45 PM10/27/09
to pytho...@googlegroups.com

On Oct 27, 2009, at 12:52 PM, Michał Kwiatkowski wrote:

>
> On Sun, Oct 25, 2009 at 11:49 PM, Ryan Freckleton
> <ryan.fr...@gmail.com> wrote:
>> Submitted https://bugs.launchpad.net/pythoscope/+bug/460715
>
> And now fixed in trunk. I ran it on Python's stdlib and it completed
> without failing, yay! :-)

That's a pretty awesome achievement. Congrats!

S

Paul Hildebrandt

unread,
Oct 27, 2009, 2:18:38 PM10/27/09
to pytho...@googlegroups.com
Nice job!  Congrats!   Should we make a News post or do you want to wait until you consider the work completely done?
--

Michał Kwiatkowski

unread,
Oct 27, 2009, 2:44:36 PM10/27/09
to pytho...@googlegroups.com
On Tue, Oct 27, 2009 at 7:18 PM, Paul Hildebrandt
<Paul.Hil...@disneyanimation.com> wrote:
> Nice job!  Congrats! Should we make a News post or do you want to wait until you consider
> the work completely done?

Let's wait until the 0.4.2 release.

Cheers,
mk

Reply all
Reply to author
Forward
0 new messages