[Python-ideas] disabling .pyc and .pyo files

161 views
Skip to first unread message

Kristján Valur Jónsson

unread,
Dec 8, 2009, 12:51:46 PM12/8/09
to python...@python.org

Hello there.

We have a large project involving multiple perforce branches of hundreds of .py files each.

Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them.

 

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.  This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy.  The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved.

 

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files.  I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified.   This will ensure that the application will execute only the code represented by the checked-out .py files.  But it occurred to me that this functionality might be of interest to other people than just us.  I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

 

Do you think that such a command line option would be useful for Python at large?

 

Cheers,

Kristján

Jesse Noller

unread,
Dec 8, 2009, 1:58:41 PM12/8/09
to Kristján Valur Jónsson, python...@python.org
2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>:

FWIW: I've been bitten by this more than once, especially on Django
projects, mainly during the development cycle.
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Todd Whiteman

unread,
Dec 8, 2009, 2:07:46 PM12/8/09
to Kristján Valur Jónsson, python...@python.org
Kristján Valur Jónsson wrote:
> I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files. I am about to implement that patch
> for our purposes, thus forcing recompilation of the .py files on each
> run if so specified. This will ensure that the application will
> execute only the code represented by the checked-out .py files. But it
> occurred to me that this functionality might be of interest to other
> people than just us. I can imagine, for example, that buildbots running
> the python regression testsuite might be running into problems with
> stray .pyo files from time to time.
>
> Do you think that such a command line option would be useful for Python
> at large?

Yes, this is already implemented (as of Python 2.6), see -B option:
http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

Guido van Rossum

unread,
Dec 8, 2009, 2:11:32 PM12/8/09
to Todd Whiteman, python...@python.org
-B only blocks *writing* of bytecode. I think the OP wants to block
*reading*, and only in the specific case where there is no
corresponding source code file.

2009/12/8 Todd Whiteman <to...@activestate.com>:

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Dec 8, 2009, 2:10:00 PM12/8/09
to Jesse Noller, python...@python.org
Agreed. I wonder if this functionality ought to be opt-in instead of
opt-out? The only use cases I am aware of are software vendors who
don't want to distribute their source (a near-extinct breed for
sure...) or people with absurdly small disks (ditto).

2009/12/8 Jesse Noller <jno...@gmail.com>:

--
--Guido van Rossum (python.org/~guido)

John Arbash Meinel

unread,
Dec 8, 2009, 2:27:21 PM12/8/09
to Guido van Rossum, python...@python.org
Guido van Rossum wrote:
> -B only blocks *writing* of bytecode. I think the OP wants to block
> *reading*, and only in the specific case where there is no
> corresponding source code file.
>
> 2009/12/8 Todd Whiteman <to...@activestate.com>:
>> Kristján Valur Jónsson wrote:
>>> I looked at the import code and I found that it is trivial to block the
>>> reading and writing of .pyo files. I am about to implement that patch for
>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>> specified. This will ensure that the application will execute only the
>>> code represented by the checked-out .py files. But it occurred to me that
>>> this functionality might be of interest to other people than just us. I can
>>> imagine, for example, that buildbots running the python regression testsuite
>>> might be running into problems with stray .pyo files from time to time.
>>>
>>> Do you think that such a command line option would be useful for Python at
>>> large?
>> Yes, this is already implemented (as of Python 2.6), see -B option:
>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

This would be quite nice for us. In our case we have been bit several
times during refactoring. You move one file, but your test suite still
passes because .pyc is still around.

I think having it be opt-in would be nice.

I do think that the standard py2exe code generates a library.zip that
only has .pyc or .pyo files (and no .py files). It isn't that we would
care if they were present, but I suppose it makes the final .zip file
smaller and faster to load?

Whatever flag is available, though, I'm sure py2exe could be taught to
pass it.

John
=:->

Raymond Hettinger

unread,
Dec 8, 2009, 2:34:25 PM12/8/09
to Guido van Rossum, Jesse Noller, python...@python.org

>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>> This can happen for a variety of reasons, but most often it occurs when .py
>> files are being removed, or moved in the hierarchy. The problem is that the
>> application will happily load and import an orphaned .pyo file, even though
>> the .py file has gone or moved.

I've seen this same problem occur for a number of users.
It is recurring opportunity to get tripped-up.


Raymond

Brett Cannon

unread,
Dec 8, 2009, 2:51:04 PM12/8/09
to Raymond Hettinger, python...@python.org
On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger <pyt...@rcn.com> wrote:

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
This can happen for a variety of reasons, but most often it occurs when .py
files are being removed, or moved in the hierarchy. The problem is that the
application will happily load and import an orphaned .pyo file, even though
the .py file has gone or moved.

I've seen this same problem occur for a number of users.
It is recurring opportunity to get tripped-up.

Another way that a sys.dont_read_bytecode flag would be helpful is for VMs that don't use Python bytecode (e.g. Jython). They could set this flag to True by default which allows code to introspect on the VM to see if it is using bytecode or not. Plus it would let importlib easily skip bytecode usage on VMs that don't support it instead of trying to come up with some heuristic to pick up on that fact (I have not figured that one out yet, but Jython folk were thinking about having marshal.loads() always throw an exception).

-Brett 

Ben Finney

unread,
Dec 8, 2009, 4:44:01 PM12/8/09
to python...@python.org
Kristján Valur Jónsson
<kris...@ccpgames.com> writes:

> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
> files. This can happen for a variety of reasons, but most often it
> occurs when .py files are being removed, or moved in the hierarchy.
> The problem is that the application will happily load and import an
> orphaned .pyo file, even though the .py file has gone or moved.

Yes, I think Python users would benefit from having the above behaviour
be opt-in.

I suggest:

* A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
interpreter follows the current behaviour. If ‘False’, any bytecode
file satisfies an import only if it has a corresponding source file
(where “corresponding” means “this source file would, if compiled,
result in a bytecode file replacing this one”).

I suggest this attribute should be implemented as ‘True’ by default
(to match current behaviour), then switched to ‘False’ by default as
soon as feasible.

* The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set,
causes the interpreter to set the above option ‘True’.

* The ‘-b’ option to the interpreter command-line sets the above option
‘True’.

--
\ “I have yet to see any problem, however complicated, which, |
`\ when you looked at it in the right way, did not become still |
_o__) more complicated.” —Paul Anderson |
Ben Finney

Collin Winter

unread,
Dec 8, 2009, 5:20:21 PM12/8/09
to Brett Cannon, python...@python.org
On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon <br...@python.org> wrote:
> Another way that a sys.dont_read_bytecode flag would be helpful is for VMs
> that don't use Python bytecode (e.g. Jython). They could set this flag to
> True by default which allows code to introspect on the VM to see if it is
> using bytecode or not. Plus it would let importlib easily skip bytecode
> usage on VMs that don't support it instead of trying to come up with some
> heuristic to pick up on that fact (I have not figured that one out yet, but
> Jython folk were thinking about having marshal.loads() always throw an
> exception).

It would also be useful when benchmarking multiple iterations of the
same VM. I've considered implementing something like this for Unladen
Swallow so that we could more effectively isolate the running binary
from global state (with a sys.dont_read_bytecode command-line flag
doing for bytecode files what -E does for environment variables).

+1 for this in mainline.

Collin Winter

Greg Ewing

unread,
Dec 8, 2009, 5:24:15 PM12/8/09
to python...@python.org
John Arbash Meinel wrote:

> Whatever flag is available, though, I'm sure py2exe could be taught to
> pass it.

I'm a bit worried about the idea of adding a flag that is
required to turn on functionality that was previously
available without any flag. It could make things awkward
for launcher scripts that are agnostic about the exact
version of Python being used.

--
Greg

Brett Cannon

unread,
Dec 8, 2009, 6:13:48 PM12/8/09
to Kristján Valur Jónsson, python...@python.org
2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>

[SNIP] 

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files.  I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified.   This will ensure that the application will execute only the code represented by the checked-out .py files.  But it occurred to me that this functionality might be of interest to other people than just us.  I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.


Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter.

-Brett

Kristján Valur Jónsson

unread,
Dec 8, 2009, 6:23:20 PM12/8/09
to Brett Cannon, python...@python.org

You are right, I was suggesting the former.  From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file.  That would also help rule out any timestamp problems.  But I‘m happy with whatever way we agree on to solve the „orphaned bytecode“ problem and glad to see that I‘m not the only one experiencing it.

 

Kristján

geremy condra

unread,
Dec 8, 2009, 7:07:35 PM12/8/09
to John Arbash Meinel, python...@python.org
On Tue, Dec 8, 2009 at 2:27 PM, John Arbash Meinel
<john.arba...@gmail.com> wrote:
> Guido van Rossum wrote:
>> -B only blocks *writing* of bytecode. I think the OP wants to block
>> *reading*, and only in the specific case where there is no
>> corresponding source code file.
>>
>> 2009/12/8 Todd Whiteman <to...@activestate.com>:
>>> Kristján Valur Jónsson wrote:
>>>> I looked at the import code and I found that it is trivial to block the
>>>> reading and writing of .pyo files.  I am about to implement that patch for
>>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>>> specified.   This will ensure that the application will execute only the
>>>> code represented by the checked-out .py files.  But it occurred to me that
>>>> this functionality might be of interest to other people than just us.  I can
>>>> imagine, for example, that buildbots running the python regression testsuite
>>>> might be running into problems with stray .pyo files from time to time.
>>>>
>>>> Do you think that such a command line option would be useful for Python at
>>>> large?
>>> Yes, this is already implemented (as of Python 2.6), see -B option:
>>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
>
> This would be quite nice for us. In our case we have been bit several
> times during refactoring. You move one file, but your test suite still
> passes because .pyc is still around.

Same experience here.

Geremy Condra

Eric Smith

unread,
Dec 8, 2009, 7:04:04 PM12/8/09
to python...@python.org
Ben Finney wrote:
> Kristján Valur Jónsson
> <kris...@ccpgames.com> writes:
>
>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
>> files. This can happen for a variety of reasons, but most often it
>> occurs when .py files are being removed, or moved in the hierarchy.
>> The problem is that the application will happily load and import an
>> orphaned .pyo file, even though the .py file has gone or moved.
>
> Yes, I think Python users would benefit from having the above behaviour
> be opt-in.

Agreed. This has bitten me, too. Often when it's a permissions problem
where another user has created the .pyc file and I can't overwrite it
(this on Windows).

> I suggest:
>
> * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
> interpreter follows the current behaviour. If ‘False’, any bytecode
> file satisfies an import only if it has a corresponding source file
> (where “corresponding” means “this source file would, if compiled,
> result in a bytecode file replacing this one”).

I agree with this in principle, but I don't see how you're going to
implement it. In order to actually check this condition, aren't you
going to have to compile the source code anyway? If so, just skip the
bytecode file. Although I guess you could store a hash of the source in
the compiled file, or other similar optimizations.

> I suggest this attribute should be implemented as ‘True’ by default
> (to match current behaviour), then switched to ‘False’ by default as
> soon as feasible.
>
> * The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set,
> causes the interpreter to set the above option ‘True’.
>
> * The ‘-b’ option to the interpreter command-line sets the above option
> ‘True’.

Sounds good to me.

Eric.

Brett Cannon

unread,
Dec 8, 2009, 7:45:42 PM12/8/09
to Kristján Valur Jónsson, python...@python.org
2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>

You are right, I was suggesting the former.  From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file.  That would also help rule out any timestamp problems.  But I‘m happy with whatever way we agree on to solve the „orphaned bytecode“ problem and glad to see that I‘m not the only one experiencing it.


I prefer the former as well (don't read any bytecode no matter if source is available or not); clear and simple semantics that are easy to implement.

Ben Finney

unread,
Dec 8, 2009, 9:28:01 PM12/8/09
to python...@python.org
Eric Smith <er...@trueblade.com> writes:

> Ben Finney wrote:
> > I suggest:
> >
> > * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
> > interpreter follows the current behaviour. If ‘False’, any bytecode
> > file satisfies an import only if it has a corresponding source file
> > (where “corresponding” means “this source file would, if compiled,
> > result in a bytecode file replacing this one”).
>
> I agree with this in principle

Thanks.

> but I don't see how you're going to implement it. In order to actually
> check this condition, aren't you going to have to compile the source
> code anyway? If so, just skip the bytecode file. Although I guess you
> could store a hash of the source in the compiled file, or other
> similar optimizations.

You seem to be seeing something I was careful not to write. The check
is:

this source file would, if compiled, result in a bytecode file
replacing this one

Nowhere there is there anything about the resulting bytecode files being
equivalent. I'm limiting the check only to whether the resulting
bytecode file would *replace* the existing bytecode file.

This doesn't require knowing anything at all about the contents of the
current bytecode file; indeed, my intention was to phrase it so that
it's checked before bothering to open the existing bytecode file.

Is there a better term for this? I'm not well-versed enough in the
Python import internals to know.

--
\ “Philosophy is questions that may never be answered. Religion |
`\ is answers that may never be questioned.” —anonymous |
_o__) |
Ben Finney

Guido van Rossum

unread,
Dec 8, 2009, 10:30:25 PM12/8/09
to Ben Finney, python...@python.org

If there was a corresponding source file, it would have been found
first -- and the bytecode file would be used *if* it matches the
source file (by comparing a timestamp in the bytecode file's header to
the actual mtime of the source file).

So I'm not sure what there is to do apart from *not* using "lone"
bytecode files. (The latter was actually added as a feature at some
point so I betcha it's easy to make it conditional on a flag.)

--
--Guido van Rossum (python.org/~guido)

Ben Finney

unread,
Dec 9, 2009, 12:38:32 AM12/9/09
to python...@python.org
Guido van Rossum <gu...@python.org> writes:

> On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> >   this source file would, if compiled, result in a bytecode file
> >   replacing this one
> >
> > Nowhere there is there anything about the resulting bytecode files
> > being equivalent. I'm limiting the check only to whether the
> > resulting bytecode file would *replace* the existing bytecode file.
> >
> > This doesn't require knowing anything at all about the contents of
> > the current bytecode file; indeed, my intention was to phrase it so
> > that it's checked before bothering to open the existing bytecode
> > file.
> >
> > Is there a better term for this? I'm not well-versed enough in the
> > Python import internals to know.
>
> If there was a corresponding source file, it would have been found
> first -- and the bytecode file would be used *if* it matches the
> source file (by comparing a timestamp in the bytecode file's header to
> the actual mtime of the source file).

Right, that's what I thought. I was only looking for a way to say “only
use a bytecode file if the corresponding source code file exists”, and
then trying to define “corresponding source code file”.

It appears that all I'm doing is confusing the issue, probably because
my understanding of the terminology is fuzzy. I hope someone else can
word it better, so the question of “which file, exactly, are we saying
must exist?” is well answered.

> So I'm not sure what there is to do apart from *not* using "lone"
> bytecode files. (The latter was actually added as a feature at some
> point so I betcha it's easy to make it conditional on a flag.)

I hope your instinct is right, and I betcha it is too.

--
\ “Intellectual property is to the 21st century what the slave |
`\ trade was to the 16th.” —David Mertz |
_o__) |
Ben Finney

Eric Smith

unread,
Dec 9, 2009, 1:18:45 AM12/9/09
to Ben Finney, python...@python.org
Sorry for top posting. My phone makes me!

You're right: I misread. Sorry about that.
--
Eric.

"Ben Finney" <ben+p...@benfinney.id.au> wrote:

>Eric Smith <er...@trueblade.com> writes:
>
>> Ben Finney wrote:
>> > I suggest:
>> >
>> > * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
>> > interpreter follows the current behaviour. If ‘False’, any bytecode
>> > file satisfies an import only if it has a corresponding source file
>> > (where “corresponding” means “this source file would, if compiled,
>> > result in a bytecode file replacing this one”).
>>
>> I agree with this in principle
>
>Thanks.
>
>> but I don't see how you're going to implement it. In order to actually
>> check this condition, aren't you going to have to compile the source
>> code anyway? If so, just skip the bytecode file. Although I guess you
>> could store a hash of the source in the compiled file, or other
>> similar optimizations.
>
>You seem to be seeing something I was careful not to write. The check
>is:
>

> this source file would, if compiled, result in a bytecode file
> replacing this one
>
>Nowhere there is there anything about the resulting bytecode files being
>equivalent. I'm limiting the check only to whether the resulting
>bytecode file would *replace* the existing bytecode file.
>
>This doesn't require knowing anything at all about the contents of the
>current bytecode file; indeed, my intention was to phrase it so that
>it's checked before bothering to open the existing bytecode file.
>
>Is there a better term for this? I'm not well-versed enough in the
>Python import internals to know.
>

>--
> \ “Philosophy is questions that may never be answered. Religion |
> `\ is answers that may never be questioned.” —anonymous |

Ben Finney

unread,
Dec 9, 2009, 1:28:19 AM12/9/09
to python...@python.org
Eric Smith <er...@trueblade.com> writes:

> Sorry for top posting. My phone makes me!

No, it really doesn't. If you have a broken tool, please don't inflict
its brokenness on others, especially if you *know* it's broken when
you use it.

--
\ “Nothing so needs reforming as other people's habits.” —Mark |
`\ Twain, _Pudd'n'head Wilson_ |

Nick Coghlan

unread,
Dec 9, 2009, 5:22:35 AM12/9/09
to Ben Finney, python...@python.org
Ben Finney wrote:
> Right, that's what I thought. I was only looking for a way to say “only
> use a bytecode file if the corresponding source code file exists”, and
> then trying to define “corresponding source code file”.

As Guido said, the check goes the other way: the interpreter looks for
source files first, and if it doesn't find one, only then does it look
for orphaned bytecode files (pyo/pyc).

The check for a corresponding bytecode files after a source file has
actually been found follows a different path through the import code.

Since the two features are somewhat orthogonal, slicing out the check
for orphaned bytecode files while keeping the check for a cached
bytecode file should be fairly straightforward.

Fair warning to anyone that implements this - expect to be updating
quite a few parts of the test suite. The runpy, command line, import and
zipimport tests would all need to be updated to make sure they were
respecting the flag (and probably the importlib tests as well, at least
in Py3k).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Paul Moore

unread,
Dec 9, 2009, 7:40:53 AM12/9/09
to Brett Cannon, python...@python.org
2009/12/9 Brett Cannon <br...@python.org>:

> I prefer the former as well (don't read any bytecode no matter if source is
> available or not); clear and simple semantics that are easy to implement.

If that's the rule, what is the point in writing bytecode at all?
It'll never be read...

Paul.

Jesse Noller

unread,
Dec 9, 2009, 8:04:01 AM12/9/09
to Ben Finney, python...@python.org
On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Eric Smith <er...@trueblade.com> writes:
>
>> Sorry for top posting. My phone makes me!
>
> No, it really doesn't. If you have a broken tool, please don't inflict
> its brokenness on others, especially if you *know* it's broken when
> you use it.

Top posting isn't that big of an issue. Drop it, please.

Brett Cannon

unread,
Dec 9, 2009, 1:48:30 PM12/9/09
to Paul Moore, python...@python.org


2009/12/9 Paul Moore <p.f....@gmail.com>

2009/12/9 Brett Cannon <br...@python.org>:
> I prefer the former as well (don't read any bytecode no matter if source is
> available or not); clear and simple semantics that are easy to implement.

If that's the rule, what is the point in writing bytecode at all?
It'll never be read...

This entire discussion is in the context of having a flag you need to set to turn off bytecode usage; the default behavior is not going to change.

-Brett 

Guido van Rossum

unread,
Dec 9, 2009, 1:52:20 PM12/9/09
to Brett Cannon, python...@python.org
Could it be as simple as this:

-b don't read bytecode (new flag)
-B don't write bytecode (existing flag)

?

> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>

--
--Guido van Rossum (python.org/~guido)

Brett Cannon

unread,
Dec 9, 2009, 1:56:03 PM12/9/09
to Nick Coghlan, Ben Finney, python...@python.org
On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncog...@gmail.com> wrote:
Ben Finney wrote:
> Right, that's what I thought. I was only looking for a way to say “only
> use a bytecode file if the corresponding source code file exists”, and
> then trying to define “corresponding source code file”.

As Guido said, the check goes the other way: the interpreter looks for
source files first, and if it doesn't find one, only then does it look
for orphaned bytecode files (pyo/pyc).


Just a data point: I reversed that order in importlib to match mental semantics.
 
The check for a corresponding bytecode files after a source file has
actually been found follows a different path through the import code.

Since the two features are somewhat orthogonal, slicing out the check
for orphaned bytecode files while keeping the check for a cached
bytecode file should be fairly straightforward.

Fair warning to anyone that implements this - expect to be updating
quite a few parts of the test suite. The runpy, command line, import and
zipimport tests would all need to be updated to make sure they were
respecting the flag (and probably the importlib tests as well, at least
in Py3k).

Yep for importlib, but I already protect bytecode-writing tests with a decorator for sys.dont_write_bytecode, so doing this for tests that rely on reading bytecode could easily be decorated as well.

-Brett 

Brett Cannon

unread,
Dec 9, 2009, 1:57:43 PM12/9/09
to Guido van Rossum, python...@python.org
On Wed, Dec 9, 2009 at 10:52, Guido van Rossum <gu...@python.org> wrote:
Could it be as simple as this:

-b don't read bytecode (new flag)
-B don't write bytecode (existing flag)

Unfortunately no: -b is "issue warnings about str(bytes_instance), str(bytearray_instance) and comparing bytes/bytearray with str. (-bb: issue errors)" under python3.

-Brett

Jared Grubb

unread,
Dec 9, 2009, 2:07:54 PM12/9/09
to Ben Finney, python...@python.org

On 8 Dec 2009, at 13:44, Ben Finney wrote:

* A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
 interpreter follows the current behaviour. If ‘False’, any bytecode
 file satisfies an import only if it has a corresponding source file
 (where “corresponding” means “this source file would, if compiled,
 result in a bytecode file replacing this one”).

One problem with a sys flag is that it's a global setting. Suppose a package is distributed with only pyc/pyo files, then the top-level __init__.py might flip the switch such that its sub-files can get imported from the pyc/pyo files. But you wouldnt want that flag to persist beyond that.

Another idea is to use a new file extension, which isnt the best solution, but allows the creator to explicitly set what behavior they intended for their files:
  * if a foo.py file exists, then use the existing foo.pyc/pyo as is done today
  * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but file.pyc/pyo is never used, unlike today)
(pyxxx is a placeholder for whatever would be a reasonable name)

Jared

Guido van Rossum

unread,
Dec 9, 2009, 2:11:58 PM12/9/09
to Brett Cannon, python...@python.org, Ben Finney
On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon <br...@python.org> wrote:
>
>
> On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncog...@gmail.com> wrote:
>>
>> Ben Finney wrote:
>> > Right, that's what I thought. I was only looking for a way to say “only
>> > use a bytecode file if the corresponding source code file exists”, and
>> > then trying to define “corresponding source code file”.
>>
>> As Guido said, the check goes the other way: the interpreter looks for
>> source files first, and if it doesn't find one, only then does it look
>> for orphaned bytecode files (pyo/pyc).
>>
>
> Just a data point: I reversed that order in importlib to match mental
> semantics.

IIRC zipimport also reverses the order.

>> The check for a corresponding bytecode files after a source file has
>> actually been found follows a different path through the import code.
>>
>> Since the two features are somewhat orthogonal, slicing out the check
>> for orphaned bytecode files while keeping the check for a cached
>> bytecode file should be fairly straightforward.
>>
>> Fair warning to anyone that implements this - expect to be updating
>> quite a few parts of the test suite. The runpy, command line, import and
>> zipimport tests would all need to be updated to make sure they were
>> respecting the flag (and probably the importlib tests as well, at least
>> in Py3k).
>
> Yep for importlib, but I already protect bytecode-writing tests with a
> decorator for sys.dont_write_bytecode, so doing this for tests that rely on
> reading bytecode could easily be decorated as well.
> -Brett

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Dec 9, 2009, 2:27:00 PM12/9/09
to Jared Grubb, Ben Finney, python...@python.org
On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared...@gmail.com> wrote:
>
> On 8 Dec 2009, at 13:44, Ben Finney wrote:
>
> * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
>  interpreter follows the current behaviour. If ‘False’, any bytecode
>  file satisfies an import only if it has a corresponding source file
>  (where “corresponding” means “this source file would, if compiled,
>  result in a bytecode file replacing this one”).
>
> One problem with a sys flag is that it's a global setting. Suppose a package
> is distributed with only pyc/pyo files, then the top-level __init__.py might
> flip the switch such that its sub-files can get imported from the pyc/pyo
> files. But you wouldnt want that flag to persist beyond that.

I'm not sure that there are any use cases that require using
conflicting values of this setting for different packages.

> Another idea is to use a new file extension, which isnt the best solution,
> but allows the creator to explicitly set what behavior they intended for
> their files:
>   * if a foo.py file exists, then use the existing foo.pyc/pyo as is done
> today
>   * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but
> file.pyc/pyo is never used, unlike today)
> (pyxxx is a placeholder for whatever would be a reasonable name)

It's a much bigger change, but using a different extension would
probably remove the need for a flag. It would also help with some
tools that hide .pyc/.pyo files from view (e.g. the typical
.svnignore).

--
--Guido van Rossum (python.org/~guido)

John Arbash Meinel

unread,
Dec 9, 2009, 2:34:41 PM12/9/09
to Guido van Rossum, Ben Finney, python...@python.org
Guido van Rossum wrote:
> On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared...@gmail.com> wrote:
>> On 8 Dec 2009, at 13:44, Ben Finney wrote:
>>
>> * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
>> interpreter follows the current behaviour. If ‘False’, any bytecode
>> file satisfies an import only if it has a corresponding source file
>> (where “corresponding” means “this source file would, if compiled,
>> result in a bytecode file replacing this one”).
>>
>> One problem with a sys flag is that it's a global setting. Suppose a package
>> is distributed with only pyc/pyo files, then the top-level __init__.py might
>> flip the switch such that its sub-files can get imported from the pyc/pyo
>> files. But you wouldnt want that flag to persist beyond that.
>
> I'm not sure that there are any use cases that require using
> conflicting values of this setting for different packages.
>

Well, during development of your own codebase, where you would like to
not import stale .pyc files, but it depends on a 3rd-party library where
they only ship you .pyc files.

Now if the flag was somehow "for all modules under this namespace" that
would easily handle it.

Or just living with "if you want to use private 3rd-party libs, then you
don't get this support for your own development".

(I don't currently do this, but it certainly is *a* use case.)

John
=:->

Ben Finney

unread,
Dec 9, 2009, 5:18:21 PM12/9/09
to python...@python.org
Guido van Rossum <gu...@python.org> writes:

> Could it be as simple as this:
>
> -b don't read bytecode (new flag)
> -B don't write bytecode (existing flag)

Almost, but I think many in this discussion are agitating for “don't
read orphaned bytecode” to become the default.

--
\ “Visitors are expected to complain at the office between the |
`\ hours of 9 and 11 a.m. daily.” —hotel, Athens |
_o__) |
Ben Finney

Brett Cannon

unread,
Dec 9, 2009, 5:43:05 PM12/9/09
to Guido van Rossum, Ben Finney, python...@python.org
On Wed, Dec 9, 2009 at 11:27, Guido van Rossum <gu...@python.org> wrote:
On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared...@gmail.com> wrote:
>
> On 8 Dec 2009, at 13:44, Ben Finney wrote:
>
> * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the
>  interpreter follows the current behaviour. If ‘False’, any bytecode
>  file satisfies an import only if it has a corresponding source file
>  (where “corresponding” means “this source file would, if compiled,
>  result in a bytecode file replacing this one”).
>
> One problem with a sys flag is that it's a global setting. Suppose a package
> is distributed with only pyc/pyo files, then the top-level __init__.py might
> flip the switch such that its sub-files can get imported from the pyc/pyo
> files. But you wouldnt want that flag to persist beyond that.

I'm not sure that there are any use cases that require using
conflicting values of this setting for different packages.


Same here. This is straying into optimizations for the sake of optimizing.
 
> Another idea is to use a new file extension, which isnt the best solution,
> but allows the creator to explicitly set what behavior they intended for
> their files:
>   * if a foo.py file exists, then use the existing foo.pyc/pyo as is done
> today
>   * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but
> file.pyc/pyo is never used, unlike today)
> (pyxxx is a placeholder for whatever would be a reasonable name)

It's a much bigger change, but using a different extension would
probably remove the need for a flag. It would also help with some
tools that hide .pyc/.pyo files from view (e.g. the typical
.svnignore).

From a Python VM perspective, the problem with this is it doesn't help improve the situation for other VMs that have no concept of bytecode. If we make pyc/pyo files purely an optimization for CPython (and other VMs that choose to support the format) and not a recognized executable format on its own (like it is now) then that would probably help prevent people from distributing pyc/pyo files only and thus locking out the use of other VMs.

I know some people seem to think pyc/pyo fles are a good way to obfuscate code, but it honestly isn't, IMO. But these people stand the most to lose from us even considering changing default behavior.

In a perfect world I would make pyc/pyo files completely optional and only an optimization that could not work w/o the corresponding source. But in a backwards-compatible, paranoid world I would make it an opt-in flag to ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter.

-Brett

Ben Finney

unread,
Dec 9, 2009, 5:44:29 PM12/9/09
to python...@python.org
John Arbash Meinel
<john.arba...@gmail.com> writes:

> Or just living with "if you want to use private 3rd-party libs, then
> you don't get this support for your own development".

FWIW, that's the option I would advocate. The default is to develop and
distribute with source; choosing to omit source (or choosing to use such
software) is choosing an inferior option for many other reasons as well,
so I don't see it as a use case that needs explicit support.

--
\ “A learning experience is one of those things that say, “You |
`\ know that thing you just did? Don't do that.”” —Douglas Adams, |
_o__) 2000-04-05 |
Ben Finney

Ben Finney

unread,
Dec 9, 2009, 6:00:04 PM12/9/09
to python...@python.org
Jesse Noller <jno...@gmail.com> writes:

> On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> > Eric Smith <er...@trueblade.com> writes:
> >
> >> Sorry for top posting. My phone makes me!
> >
> > No, it really doesn't. If you have a broken tool, please don't
> > inflict its brokenness on others, especially if you *know* it's
> > broken when you use it.
>
> Top posting isn't that big of an issue. Drop it, please.

No bigger than other problems of poor human-to-human communication. I
agree with Eric that it deserves apology, even if you don't think it's a
big deal.

--
\ “In any great organization it is far, far safer to be wrong |
`\ with the majority than to be right alone.” —John Kenneth |
_o__) Galbraith, 1989-07-28 |
Ben Finney

geremy condra

unread,
Dec 9, 2009, 6:04:08 PM12/9/09
to Brett Cannon, Ben Finney, python...@python.org
> In a perfect world I would make pyc/pyo files completely optional and only
> an optimization that could not work w/o the corresponding source. But in a
> backwards-compatible, paranoid world I would make it an opt-in flag to
> ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter.
> -Brett

FWIW, I'm in about the same boat here.

As a somewhat tangential question, is anybody aware of any python3
projects for which requiring source would be an issue?

Geremy Condra

Daniel Fetchinson

unread,
Dec 9, 2009, 6:27:09 PM12/9/09
to python...@python.org
>> >> Sorry for top posting. My phone makes me!
>> >
>> > No, it really doesn't. If you have a broken tool, please don't
>> > inflict its brokenness on others, especially if you *know* it's
>> > broken when you use it.
>>
>> Top posting isn't that big of an issue. Drop it, please.
>
> No bigger than other problems of poor human-to-human communication. I
> agree with Eric that it deserves apology, even if you don't think it's a
> big deal.

Did you actually make a survey of c.l.p users to determine what
fraction finds top posting poor human-to-human communication? My guess
is that below 31%. From the top of my head only one name comes to mind
who thinks top posting is at least sometimes appropriate: GvR.

Note: you are free to install software that will automatically delete
any post that is top posted and voila a, you will never be bothered
again. Why not do that?

Cheers,
Daniel

--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Antoine Pitrou

unread,
Dec 9, 2009, 10:48:54 PM12/9/09
to python...@python.org
Ben Finney <ben+python@...> writes:

>
> Guido van Rossum <guido <at> python.org> writes:
>
> > Could it be as simple as this:
> >
> > -b don't read bytecode (new flag)
> > -B don't write bytecode (existing flag)
>
> Almost, but I think many in this discussion are agitating for “don't
> read orphaned bytecode” to become the default.

Either to become the default (which might require updates to things like
py2exe), or to have a dedicated flag.
On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an
use case. If you don't want to read any bytecode, don't produce/install it in
the first place.
Bytecode is useful, it reduces startup times. It's only annoying when the
original .py file has been deleted and the obsolete .pyc/.pyo is dangling on
disk.

cheers

Antoine.

Collin Winter

unread,
Dec 9, 2009, 11:47:05 PM12/9/09
to Antoine Pitrou, python...@python.org
On Wed, Dec 9, 2009 at 7:48 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Ben Finney <ben+python@...> writes:
>>
>> Guido van Rossum <guido <at> python.org> writes:
>>
>> > Could it be as simple as this:
>> >
>> > -b don't read bytecode (new flag)
>> > -B don't write bytecode (existing flag)
>>
>> Almost, but I think many in this discussion are agitating for “don't
>> read orphaned bytecode” to become the default.
>
> Either to become the default (which might require updates to things like
> py2exe), or to have a dedicated flag.
> On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an
> use case. If you don't want to read any bytecode, don't produce/install it in
> the first place.

I gave such a use-case earlier in this thread:

"""
It would also be useful when benchmarking multiple iterations of the
same VM. I've considered implementing something like this for Unladen
Swallow so that we could more effectively isolate the running binary
from global state (with a sys.dont_read_bytecode command-line flag
doing for bytecode files what -E does for environment variables).
"""

We currently handle this by deleting all .pyc/.pyo files in our
library tree, but that gets more expensive the more third-party
libraries we bring in for testing, and it's not foolproof.

Collin Winter

Antoine Pitrou

unread,
Dec 9, 2009, 11:50:40 PM12/9/09
to python...@python.org

> I gave such a use-case earlier in this thread:
>
> """
> It would also be useful when benchmarking multiple iterations of the
> same VM. I've considered implementing something like this for Unladen
> Swallow so that we could more effectively isolate the running binary
> from global state (with a sys.dont_read_bytecode command-line flag
> doing for bytecode files what -E does for environment variables).
> """

I'm not sure I understand the point. Surely importing modules isn't in
the critical path (or even in the measured path) of your benchmark, is
it?

Collin Winter

unread,
Dec 10, 2009, 12:00:19 AM12/10/09
to Antoine Pitrou, python...@python.org
On Wed, Dec 9, 2009 at 8:50 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>
>> I gave such a use-case earlier in this thread:
>>
>> """
>> It would also be useful when benchmarking multiple iterations of the
>> same VM. I've considered implementing something like this for Unladen
>> Swallow so that we could more effectively isolate the running binary
>> from global state (with a sys.dont_read_bytecode command-line flag
>> doing for bytecode files what -E does for environment variables).
>> """
>
> I'm not sure I understand the point. Surely importing modules isn't in
> the critical path (or even in the measured path) of your benchmark, is
> it?

When changing the bytecode sequence produced by the CPython compiler,
it would be useful to make sure that a module is being compiled from
scratch (and hence using the new version of the compiler) instead of
reusing older bytecode from a .pyc file. You might say that we should
simply increase the magic number with each iteration, but I've never
found that having to change more code boosts my productivity
(especially in cases where changing the magic number is not necessary
for compatibility purposes). I understand this may be a fringe
use-case, but given the number of optimization projects based on
CPython (of which ours is but one), it may still be worth considering.

Collin

Antoine Pitrou

unread,
Dec 10, 2009, 12:04:31 AM12/10/09
to Python-Ideas

> When changing the bytecode sequence produced by the CPython compiler,
> it would be useful to make sure that a module is being compiled from
> scratch (and hence using the new version of the compiler) instead of
> reusing older bytecode from a .pyc file. You might say that we should
> simply increase the magic number with each iteration,

Or simply "rm -f `find -name *.pyc`" :-)

Collin Winter

unread,
Dec 10, 2009, 12:07:47 AM12/10/09
to Antoine Pitrou, Python-Ideas
On Wed, Dec 9, 2009 at 9:04 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>
>> When changing the bytecode sequence produced by the CPython compiler,
>> it would be useful to make sure that a module is being compiled from
>> scratch (and hence using the new version of the compiler) instead of
>> reusing older bytecode from a .pyc file. You might say that we should
>> simply increase the magic number with each iteration,
>
> Or simply "rm -f `find -name *.pyc`" :-)

As I said, "We currently handle this by deleting all .pyc/.pyo files


in our library tree, but that gets more expensive the more third-party
libraries we bring in for testing, and it's not foolproof."

I tire of quoting myself.

Collin

Nick Coghlan

unread,
Dec 10, 2009, 5:38:06 AM12/10/09
to Guido van Rossum, Ben Finney, python...@python.org
Guido van Rossum wrote:
> On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon <br...@python.org> wrote:
>>
>> On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncog...@gmail.com> wrote:
>>> Ben Finney wrote:
>>>> Right, that's what I thought. I was only looking for a way to say “only
>>>> use a bytecode file if the corresponding source code file exists”, and
>>>> then trying to define “corresponding source code file”.
>>> As Guido said, the check goes the other way: the interpreter looks for
>>> source files first, and if it doesn't find one, only then does it look
>>> for orphaned bytecode files (pyo/pyc).
>>>
>> Just a data point: I reversed that order in importlib to match mental
>> semantics.
>
> IIRC zipimport also reverses the order.

Hmm, not as orthogonal as I thought then :P

I guess it is a credit to the PEP 302 API that I've never needed to care
that zipimport might have the check the other way around :)

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Nick Coghlan

unread,
Dec 10, 2009, 5:43:04 AM12/10/09
to Brett Cannon, Ben Finney, python...@python.org
Brett Cannon wrote:
> I know some people seem to think pyc/pyo fles are a good way to
> obfuscate code, but it honestly isn't, IMO. But these people stand the
> most to lose from us even considering changing default behavior.

People that think it is a good obfuscation trick often don't realise
just how powerful Python's introspection features make the disassembly
process. When decompiled software includes the original variable names
it is a lot easier to follow than the cryptic mass of symbols that is
decompiled machine code.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Nick Coghlan

unread,
Dec 10, 2009, 5:49:28 AM12/10/09
to Ben Finney, python...@python.org
Ben Finney wrote:
> No bigger than other problems of poor human-to-human communication. I
> agree with Eric that it deserves apology, even if you don't think it's a
> big deal.

I'd prefer what Eric did (making a valid post, but apologising for using
a poor tool to do so) over someone feeling they can't participate in the
list discussion just because they don't have a decent email client handy.

Now, if someone was to make a habit of it, then sure, they should be
encouraged to switch to a better client. But the occasional post while
away from your regular computer? Not a problem.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Greg Ewing

unread,
Dec 10, 2009, 6:17:12 PM12/10/09
to python...@python.org
Brett Cannon wrote:

> In a perfect world I would make pyc/pyo files completely optional and
> only an optimization that could not work w/o the corresponding source.

That wouldn't be a perfect world in every universe. For
example, consider an app installed in an embedded device
with limited memory -- the source is never going to be
seen by anyone, and all it would do is waste resources.

--
Greg

Terry Reedy

unread,
Dec 10, 2009, 6:25:13 PM12/10/09
to python...@python.org
Greg Ewing wrote:
> Brett Cannon wrote:
>
>> In a perfect world I would make pyc/pyo files completely optional and
>> only an optimization that could not work w/o the corresponding source.
>
> That wouldn't be a perfect world in every universe. For
> example, consider an app installed in an embedded device
> with limited memory -- the source is never going to be
> seen by anyone, and all it would do is waste resources.

In a perfect world, memory would not be limited ;-)

But valid point for this world.

Ben Finney

unread,
Dec 10, 2009, 8:03:08 PM12/10/09
to python...@python.org
Greg Ewing <greg....@canterbury.ac.nz> writes:

> Brett Cannon wrote:
>
> > In a perfect world I would make pyc/pyo files completely optional
> > and only an optimization that could not work w/o the corresponding
> > source.
>
> That wouldn't be a perfect world in every universe. For example,
> consider an app installed in an embedded device with limited memory --
> the source is never going to be seen by anyone, and all it would do is
> waste resources.

If we're positing a perfect world, then all embedded devices would have
the source code available and inspectable by any interested user.

--
\ “We can't depend for the long run on distinguishing one |
`\ bitstream from another in order to figure out which rules |
_o__) apply.” —Eben Moglen, _Anarchism Triumphant_, 1999 |
Ben Finney

Jesse Noller

unread,
Dec 10, 2009, 8:47:33 PM12/10/09
to Ben Finney, python...@python.org
On Thu, Dec 10, 2009 at 8:03 PM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Greg Ewing <greg....@canterbury.ac.nz> writes:
>
>> Brett Cannon wrote:
>>
>> > In a perfect world I would make pyc/pyo files completely optional
>> > and only an optimization that could not work w/o the corresponding
>> > source.
>>
>> That wouldn't be a perfect world in every universe. For example,
>> consider an app installed in an embedded device with limited memory --
>> the source is never going to be seen by anyone, and all it would do is
>> waste resources.
>
> If we're positing a perfect world, then all embedded devices would have
> the source code available and inspectable by any interested user.

Please. Seriously, can we drop this and stop complaining about top
posting? I'm pretty sure "alt.general.python.chat" is someplace else.
No one cares.

Ben Finney

unread,
Dec 11, 2009, 12:16:44 AM12/11/09
to python...@python.org
Jesse Noller <jno...@gmail.com> writes:

> Please. Seriously, can we drop this and stop complaining about top
> posting? I'm pretty sure "alt.general.python.chat" is someplace else.
> No one cares.

Er, this discussion isn't related to top posting; and it's hardly
off-topic to discuss here about importing bytecode files.

--
\ “I have had a perfectly wonderful evening, but this wasn't it.” |
`\ —Groucho Marx |
_o__) |
Ben Finney

Greg Ewing

unread,
Dec 11, 2009, 5:58:25 AM12/11/09
to python...@python.org
Ben Finney wrote:

> If we're positing a perfect world, then all embedded devices would have
> the source code available and inspectable by any interested user.

The source wouldn't have to be on the actual device
to make that possible, though.

--
Greg

Brett Cannon

unread,
Dec 11, 2009, 2:43:29 PM12/11/09
to Kristján Valur Jónsson, python...@python.org
I don't know about the rest of you, but I think it's PEP time as the conversation seems to have run its course. Looks like the popular options are a flag to not read any bytecode or to only read bytecode if the source is also available. And then whether the default behavior should change or not.

2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com>

Hello there.

We have a large project involving multiple perforce branches of hundreds of .py files each.

Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them.

 

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.  This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy.  The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved.

 

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files.  I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified.   This will ensure that the application will execute only the code represented by the checked-out .py files.  But it occurred to me that this functionality might be of interest to other people than just us.  I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

 

Do you think that such a command line option would be useful for Python at large?

 

Cheers,

Kristján

Ron Adam

unread,
Dec 12, 2009, 11:59:47 AM12/12/09
to Brett Cannon, python...@python.org

Brett Cannon wrote:
> I don't know about the rest of you, but I think it's PEP time as the
> conversation seems to have run its course. Looks like the popular
> options are a flag to not read any bytecode or to only read bytecode if
> the source is also available. And then whether the default behavior
> should change or not.

A few additional thoughts...

Could the existing -B flag be extended to not read bytecode?
It might be considered a bug if bytecode is read when the -B option is used
to prevent writing of bytecode. Is there a use case for forcing the use of
old bytecode? What was the original intent of the -B flag?

Would adding a flag to force the writing of bytecode do what is needed? It
would generate a noisy fail if a source file is moved or missing and renew
old bytecode files.

These two together would give read_none and write_all bytecode modes. With
the default mode as the write as needed mode.


It may be good to have A utility script in the python tools directory to
find and/or remove orphaned bytecode. I'm not sure that just deleting all
.py(co) files is always a good idea.

A more off the wall random thought ...

It might be nice in the future to have all bytecode in a single directory
or package combined into a single byte_cache.py(co) file. I think Writing
all and reading None bytecode files makes good sense in this context.

Ron


> 2009/12/8 Kristján Valur Jónsson <kris...@ccpgames.com

> <mailto:kris...@ccpgames.com>>

> Python...@python.org <mailto:Python...@python.org>
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
>
> ------------------------------------------------------------------------

Kevin Watters

unread,
Dec 15, 2009, 5:37:09 PM12/15/09
to python...@python.org
For what it's worth, I've got an entirely different use case than the
ones I've seen in this thread so far.

I'd like Python to read .pyo files, but not search for .py or .pyc
files. This is because we ship a py2exe app in it's "exploded" form,
where there is an .exe and a lib/ folder full of .pyos. Purely as an
optimization, it'd be nice to not have Python stat for .py and then .pyc
for every new import.

I remember glancing at Python/import.c and thinking that this could
easily be accomplished by allowing the user to customize
_PyImport_StandardFiletab at runtime--in fact there is already an
PyImport_AppendInittab; it's just not exposed to Python. With a function
like imp.set_inittab, I could get what I want with something like

imp.set_inittab(['.pyo', 'rb', imp.PY_COMPILED])

And then of course to read just .py files, you could do

imp.set_inittab([".py", "U", PY_SOURCE])

- Kevin

Kristján Valur Jónsson wrote:
> Hello there.
>
> We have a large project involving multiple perforce branches of hundreds
> of .py files each.
>
> Although we employ our own import mechanism for the bulk of these files,
> we do use the regular import mechanism for an essential core of them.
>
>
>
> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
> files. This can happen for a variety of reasons, but most often it
> occurs when .py files are being removed, or moved in the hierarchy. The
> problem is that the application will happily load and import an orphaned
> .pyo file, even though the .py file has gone or moved.
>
>
>
> I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files. I am about to implement that patch
> for our purposes, thus forcing recompilation of the .py files on each
> run if so specified. This will ensure that the application will
> execute only the code represented by the checked-out .py files. But it
> occurred to me that this functionality might be of interest to other
> people than just us. I can imagine, for example, that buildbots running
> the python regression testsuite might be running into problems with
> stray .pyo files from time to time.
>
>
>
> Do you think that such a command line option would be useful for Python
> at large?
>
>
>
> Cheers,
>
> Kristján
>
>

Brett Cannon

unread,
Dec 16, 2009, 2:20:42 PM12/16/09
to Kevin Watters, python...@python.org
On Tue, Dec 15, 2009 at 14:37, Kevin Watters <kevin....@gmail.com> wrote:
For what it's worth, I've got an entirely different use case than the ones I've seen in this thread so far.

I'd like Python to read .pyo files, but not search for .py or .pyc files. This is because we ship a py2exe app in it's "exploded" form, where there is an .exe and a lib/ folder full of .pyos.  Purely as an optimization, it'd be nice to not have Python stat for .py and then .pyc for every new import.

I remember glancing at Python/import.c and thinking that this could easily be accomplished by allowing the user to customize _PyImport_StandardFiletab at runtime--in fact there is already an PyImport_AppendInittab; it's just not exposed to Python. With a function like imp.set_inittab, I could get what I want with something like

   imp.set_inittab(['.pyo', 'rb', imp.PY_COMPILED])

And then of course to read just .py files, you could do

   imp.set_inittab([".py", "U", PY_SOURCE])


The problem with this is I could easily see it leading to tons of people using custom file extensions which seems to just be asking for trouble. Restricting that ability to only people who recompile the interpreter has kept that in check.

As for avoiding the extra stat calls, your best bet is to either compile your own version of CPython or use a custom importer (I will be giving a talk on that at PyCon).

-Brett
Reply all
Reply to author
Forward
0 new messages