[Python-ideas] disabling .pyc and .pyo files

Kristján Valur Jónsson

unread,

Dec 8, 2009, 12:51:46 PM12/8/09

to python...@python.org

Hello there.

We have a large project involving multiple perforce branches of hundreds of .py files each.

Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them.

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy. The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved.

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

Do you think that such a command line option would be useful for Python at large?

Cheers,

Kristján

Jesse Noller

unread,

Dec 8, 2009, 1:58:41 PM12/8/09

to Kristján Valur Jónsson, python...@python.org

2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>:

FWIW: I've been bitten by this more than once, especially on Django
projects, mainly during the development cycle.
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Todd Whiteman

unread,

Dec 8, 2009, 2:07:46 PM12/8/09

to Kristján Valur Jónsson, python...@python.org

Kristján Valur Jónsson wrote:
> I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files. I am about to implement that patch
> for our purposes, thus forcing recompilation of the .py files on each
> run if so specified. This will ensure that the application will
> execute only the code represented by the checked-out .py files. But it
> occurred to me that this functionality might be of interest to other
> people than just us. I can imagine, for example, that buildbots running
> the python regression testsuite might be running into problems with
> stray .pyo files from time to time.
>
> Do you think that such a command line option would be useful for Python
> at large?

Yes, this is already implemented (as of Python 2.6), see -B option:
http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

Guido van Rossum

unread,

Dec 8, 2009, 2:11:32 PM12/8/09

to Todd Whiteman, python...@python.org

-B only blocks *writing* of bytecode. I think the OP wants to block
*reading*, and only in the specific case where there is no
corresponding source code file.

2009/12/8 Todd Whiteman <to...@activestate.com>:

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Dec 8, 2009, 2:10:00 PM12/8/09

to Jesse Noller, python...@python.org

Agreed. I wonder if this functionality ought to be opt-in instead of
opt-out? The only use cases I am aware of are software vendors who
don't want to distribute their source (a near-extinct breed for
sure...) or people with absurdly small disks (ditto).

2009/12/8 Jesse Noller <jno...@gmail.com>:

--
--Guido van Rossum (python.org/~guido)

John Arbash Meinel

unread,

Dec 8, 2009, 2:27:21 PM12/8/09

to Guido van Rossum, python...@python.org

Guido van Rossum wrote:
> -B only blocks *writing* of bytecode. I think the OP wants to block
> *reading*, and only in the specific case where there is no
> corresponding source code file.
>
> 2009/12/8 Todd Whiteman <to...@activestate.com>:
>> Kristján Valur Jónsson wrote:
>>> I looked at the import code and I found that it is trivial to block the
>>> reading and writing of .pyo files. I am about to implement that patch for
>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>> specified. This will ensure that the application will execute only the
>>> code represented by the checked-out .py files. But it occurred to me that
>>> this functionality might be of interest to other people than just us. I can
>>> imagine, for example, that buildbots running the python regression testsuite
>>> might be running into problems with stray .pyo files from time to time.
>>>
>>> Do you think that such a command line option would be useful for Python at
>>> large?
>> Yes, this is already implemented (as of Python 2.6), see -B option:
>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

This would be quite nice for us. In our case we have been bit several
times during refactoring. You move one file, but your test suite still
passes because .pyc is still around.

I think having it be opt-in would be nice.

I do think that the standard py2exe code generates a library.zip that
only has .pyc or .pyo files (and no .py files). It isn't that we would
care if they were present, but I suppose it makes the final .zip file
smaller and faster to load?

Whatever flag is available, though, I'm sure py2exe could be taught to
pass it.

John
=:->

Raymond Hettinger

unread,

Dec 8, 2009, 2:34:25 PM12/8/09

to Guido van Rossum, Jesse Noller, python...@python.org

>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>> This can happen for a variety of reasons, but most often it occurs when .py
>> files are being removed, or moved in the hierarchy. The problem is that the
>> application will happily load and import an orphaned .pyo file, even though
>> the .py file has gone or moved.

I've seen this same problem occur for a number of users.
It is recurring opportunity to get tripped-up.

Raymond

Brett Cannon

unread,

Dec 8, 2009, 2:51:04 PM12/8/09

to Raymond Hettinger, python...@python.org

On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger <pyt...@rcn.com> wrote:

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
This can happen for a variety of reasons, but most often it occurs when .py
files are being removed, or moved in the hierarchy. The problem is that the
application will happily load and import an orphaned .pyo file, even though
the .py file has gone or moved.

I've seen this same problem occur for a number of users.
It is recurring opportunity to get tripped-up.

Another way that a sys.dont_read_bytecode flag would be helpful is for VMs that don't use Python bytecode (e.g. Jython). They could set this flag to True by default which allows code to introspect on the VM to see if it is using bytecode or not. Plus it would let importlib easily skip bytecode usage on VMs that don't support it instead of trying to come up with some heuristic to pick up on that fact (I have not figured that one out yet, but Jython folk were thinking about having marshal.loads() always throw an exception).

-Brett

Ben Finney

unread,

Dec 8, 2009, 4:44:01 PM12/8/09

to python...@python.org

Kristján Valur Jónsson
<kris...@ccpgames.com> writes:

> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
> files. This can happen for a variety of reasons, but most often it
> occurs when .py files are being removed, or moved in the hierarchy.
> The problem is that the application will happily load and import an
> orphaned .pyo file, even though the .py file has gone or moved.

Yes, I think Python users would benefit from having the above behaviour
be opt-in.

I suggest:

* A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
interpreter follows the current behaviour. If ‘False’, any bytecode
file satisfies an import only if it has a corresponding source file
(where “corresponding” means “this source file would, if compiled,
result in a bytecode file replacing this one”).

I suggest this attribute should be implemented as ‘True’ by default
(to match current behaviour), then switched to ‘False’ by default as
soon as feasible.

* The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set,
causes the interpreter to set the above option ‘True’.

* The ‘-b’ option to the interpreter command-line sets the above option
‘True’.

--
\ “I have yet to see any problem, however complicated, which, |
`\ when you looked at it in the right way, did not become still |
_o__) more complicated.” —Paul Anderson |
Ben Finney

Collin Winter

unread,

Dec 8, 2009, 5:20:21 PM12/8/09

to Brett Cannon, python...@python.org

On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon <br...@python.org> wrote:
> Another way that a sys.dont_read_bytecode flag would be helpful is for VMs
> that don't use Python bytecode (e.g. Jython). They could set this flag to
> True by default which allows code to introspect on the VM to see if it is
> using bytecode or not. Plus it would let importlib easily skip bytecode
> usage on VMs that don't support it instead of trying to come up with some
> heuristic to pick up on that fact (I have not figured that one out yet, but
> Jython folk were thinking about having marshal.loads() always throw an
> exception).

It would also be useful when benchmarking multiple iterations of the
same VM. I've considered implementing something like this for Unladen
Swallow so that we could more effectively isolate the running binary
from global state (with a sys.dont_read_bytecode command-line flag
doing for bytecode files what -E does for environment variables).

+1 for this in mainline.

Collin Winter

Greg Ewing

unread,

Dec 8, 2009, 5:24:15 PM12/8/09

to python...@python.org

John Arbash Meinel wrote:

> Whatever flag is available, though, I'm sure py2exe could be taught to
> pass it.

I'm a bit worried about the idea of adding a flag that is
required to turn on functionality that was previously
available without any flag. It could make things awkward
for launcher scripts that are agnostic about the exact
version of Python being used.

--
Greg

Brett Cannon

unread,

Dec 8, 2009, 6:13:48 PM12/8/09

to Kristján Valur Jónsson, python...@python.org

2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>

[SNIP]

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter.

-Brett

Kristján Valur Jónsson

unread,

Dec 8, 2009, 6:23:20 PM12/8/09

to Brett Cannon, python...@python.org

You are right, I was suggesting the former. From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file. That would also help rule out any timestamp problems. But I‘m happy with whatever way we agree on to solve the „orphaned bytecode“ problem and glad to see that I‘m not the only one experiencing it.

Kristján

geremy condra

unread,

Dec 8, 2009, 7:07:35 PM12/8/09

to John Arbash Meinel, python...@python.org

On Tue, Dec 8, 2009 at 2:27 PM, John Arbash Meinel
<john.arba...@gmail.com> wrote:
> Guido van Rossum wrote:
>> -B only blocks *writing* of bytecode. I think the OP wants to block
>> *reading*, and only in the specific case where there is no
>> corresponding source code file.
>>
>> 2009/12/8 Todd Whiteman <to...@activestate.com>:
>>> Kristján Valur Jónsson wrote:
>>>> I looked at the import code and I found that it is trivial to block the
>>>> reading and writing of .pyo files. I am about to implement that patch for
>>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>>> specified. This will ensure that the application will execute only the
>>>> code represented by the checked-out .py files. But it occurred to me that
>>>> this functionality might be of interest to other people than just us. I can
>>>> imagine, for example, that buildbots running the python regression testsuite
>>>> might be running into problems with stray .pyo files from time to time.
>>>>
>>>> Do you think that such a command line option would be useful for Python at
>>>> large?
>>> Yes, this is already implemented (as of Python 2.6), see -B option:
>>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
>
> This would be quite nice for us. In our case we have been bit several
> times during refactoring. You move one file, but your test suite still
> passes because .pyc is still around.

Same experience here.

Geremy Condra

Eric Smith

unread,

Dec 8, 2009, 7:04:04 PM12/8/09

to python...@python.org

Ben Finney wrote:
> Kristján Valur Jónsson
> <kris...@ccpgames.com> writes:
>
>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
>> files. This can happen for a variety of reasons, but most often it
>> occurs when .py files are being removed, or moved in the hierarchy.
>> The problem is that the application will happily load and import an
>> orphaned .pyo file, even though the .py file has gone or moved.
>
> Yes, I think Python users would benefit from having the above behaviour
> be opt-in.

Agreed. This has bitten me, too. Often when it's a permissions problem
where another user has created the .pyc file and I can't overwrite it
(this on Windows).

> I suggest:
>
> * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
> interpreter follows the current behaviour. If ‘False’, any bytecode
> file satisfies an import only if it has a corresponding source file
> (where “corresponding” means “this source file would, if compiled,
> result in a bytecode file replacing this one”).

I agree with this in principle, but I don't see how you're going to
implement it. In order to actually check this condition, aren't you
going to have to compile the source code anyway? If so, just skip the
bytecode file. Although I guess you could store a hash of the source in
the compiled file, or other similar optimizations.

> I suggest this attribute should be implemented as ‘True’ by default
> (to match current behaviour), then switched to ‘False’ by default as
> soon as feasible.
>
> * The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set,
> causes the interpreter to set the above option ‘True’.
>
> * The ‘-b’ option to the interpreter command-line sets the above option
> ‘True’.

Sounds good to me.

Eric.

Brett Cannon

unread,

Dec 8, 2009, 7:45:42 PM12/8/09

to Kristján Valur Jónsson, python...@python.org

2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>

You are right, I was suggesting the former. From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file. That would also help rule out any timestamp problems. But I‘m happy with whatever way we agree on to solve the „orphaned bytecode“ problem and glad to see that I‘m not the only one experiencing it.

I prefer the former as well (don't read any bytecode no matter if source is available or not); clear and simple semantics that are easy to implement.

Ben Finney

unread,

Dec 8, 2009, 9:28:01 PM12/8/09

to python...@python.org

Eric Smith <er...@trueblade.com> writes:

> Ben Finney wrote:
> > I suggest:
> >
> > * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
> > interpreter follows the current behaviour. If ‘False’, any bytecode
> > file satisfies an import only if it has a corresponding source file
> > (where “corresponding” means “this source file would, if compiled,
> > result in a bytecode file replacing this one”).
>
> I agree with this in principle

Thanks.

> but I don't see how you're going to implement it. In order to actually
> check this condition, aren't you going to have to compile the source
> code anyway? If so, just skip the bytecode file. Although I guess you
> could store a hash of the source in the compiled file, or other
> similar optimizations.

You seem to be seeing something I was careful not to write. The check
is:

this source file would, if compiled, result in a bytecode file
replacing this one

Nowhere there is there anything about the resulting bytecode files being
equivalent. I'm limiting the check only to whether the resulting
bytecode file would *replace* the existing bytecode file.

This doesn't require knowing anything at all about the contents of the
current bytecode file; indeed, my intention was to phrase it so that
it's checked before bothering to open the existing bytecode file.

Is there a better term for this? I'm not well-versed enough in the
Python import internals to know.

--
\ “Philosophy is questions that may never be answered. Religion |
`\ is answers that may never be questioned.” —anonymous |
_o__) |
Ben Finney

Guido van Rossum

unread,

Dec 8, 2009, 10:30:25 PM12/8/09

to Ben Finney, python...@python.org

If there was a corresponding source file, it would have been found
first -- and the bytecode file would be used *if* it matches the
source file (by comparing a timestamp in the bytecode file's header to
the actual mtime of the source file).

So I'm not sure what there is to do apart from *not* using "lone"
bytecode files. (The latter was actually added as a feature at some
point so I betcha it's easy to make it conditional on a flag.)

--
--Guido van Rossum (python.org/~guido)

Ben Finney

unread,

Dec 9, 2009, 12:38:32 AM12/9/09

to python...@python.org

Guido van Rossum <gu...@python.org> writes:

> On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> > this source file would, if compiled, result in a bytecode file
> > replacing this one
> >
> > Nowhere there is there anything about the resulting bytecode files
> > being equivalent. I'm limiting the check only to whether the
> > resulting bytecode file would *replace* the existing bytecode file.
> >
> > This doesn't require knowing anything at all about the contents of
> > the current bytecode file; indeed, my intention was to phrase it so
> > that it's checked before bothering to open the existing bytecode
> > file.
> >
> > Is there a better term for this? I'm not well-versed enough in the
> > Python import internals to know.
>
> If there was a corresponding source file, it would have been found
> first -- and the bytecode file would be used *if* it matches the
> source file (by comparing a timestamp in the bytecode file's header to
> the actual mtime of the source file).

Right, that's what I thought. I was only looking for a way to say “only
use a bytecode file if the corresponding source code file exists”, and
then trying to define “corresponding source code file”.

It appears that all I'm doing is confusing the issue, probably because
my understanding of the terminology is fuzzy. I hope someone else can
word it better, so the question of “which file, exactly, are we saying
must exist?” is well answered.

> So I'm not sure what there is to do apart from *not* using "lone"
> bytecode files. (The latter was actually added as a feature at some
> point so I betcha it's easy to make it conditional on a flag.)

I hope your instinct is right, and I betcha it is too.

--
\ “Intellectual property is to the 21st century what the slave |
`\ trade was to the 16th.” —David Mertz |
_o__) |
Ben Finney

Eric Smith

unread,

Dec 9, 2009, 1:18:45 AM12/9/09

to Ben Finney, python...@python.org

Sorry for top posting. My phone makes me!

You're right: I misread. Sorry about that.
--
Eric.

"Ben Finney" <ben+p...@benfinney.id.au> wrote:

>Eric Smith <er...@trueblade.com> writes:
>
>> Ben Finney wrote:
>> > I suggest:
>> >
>> > * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
>> > interpreter follows the current behaviour. If ‘False’, any bytecode
>> > file satisfies an import only if it has a corresponding source file
>> > (where “corresponding” means “this source file would, if compiled,
>> > result in a bytecode file replacing this one”).
>>
>> I agree with this in principle
>
>Thanks.
>
>> but I don't see how you're going to implement it. In order to actually
>> check this condition, aren't you going to have to compile the source
>> code anyway? If so, just skip the bytecode file. Although I guess you
>> could store a hash of the source in the compiled file, or other
>> similar optimizations.
>
>You seem to be seeing something I was careful not to write. The check
>is:
>

> this source file would, if compiled, result in a bytecode file
> replacing this one
>
>Nowhere there is there anything about the resulting bytecode files being
>equivalent. I'm limiting the check only to whether the resulting
>bytecode file would *replace* the existing bytecode file.
>
>This doesn't require knowing anything at all about the contents of the
>current bytecode file; indeed, my intention was to phrase it so that
>it's checked before bothering to open the existing bytecode file.
>
>Is there a better term for this? I'm not well-versed enough in the
>Python import internals to know.
>

>--
> \ “Philosophy is questions that may never be answered. Religion |
> `\ is answers that may never be questioned.” —anonymous |

Ben Finney

unread,

Dec 9, 2009, 1:28:19 AM12/9/09

to python...@python.org

Eric Smith <er...@trueblade.com> writes:

> Sorry for top posting. My phone makes me!

No, it really doesn't. If you have a broken tool, please don't inflict
its brokenness on others, especially if you *know* it's broken when
you use it.

--
\ “Nothing so needs reforming as other people's habits.” —Mark |
`\ Twain, _Pudd'n'head Wilson_ |

Nick Coghlan

unread,

Dec 9, 2009, 5:22:35 AM12/9/09

to Ben Finney, python...@python.org

Ben Finney wrote:
> Right, that's what I thought. I was only looking for a way to say “only
> use a bytecode file if the corresponding source code file exists”, and
> then trying to define “corresponding source code file”.

As Guido said, the check goes the other way: the interpreter looks for
source files first, and if it doesn't find one, only then does it look
for orphaned bytecode files (pyo/pyc).

The check for a corresponding bytecode files after a source file has
actually been found follows a different path through the import code.

Since the two features are somewhat orthogonal, slicing out the check
for orphaned bytecode files while keeping the check for a cached
bytecode file should be fairly straightforward.

Fair warning to anyone that implements this - expect to be updating
quite a few parts of the test suite. The runpy, command line, import and
zipimport tests would all need to be updated to make sure they were
respecting the flag (and probably the importlib tests as well, at least
in Py3k).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Paul Moore

unread,

Dec 9, 2009, 7:40:53 AM12/9/09

to Brett Cannon, python...@python.org

2009/12/9 Brett Cannon <br...@python.org>:

> I prefer the former as well (don't read any bytecode no matter if source is
> available or not); clear and simple semantics that are easy to implement.

If that's the rule, what is the point in writing bytecode at all?
It'll never be read...

Paul.