Hi all
I've written a bunch of internal libraries for my company, and they
all use two space indents, and I'd like to be more consistent and
conform to PEP-8 as much as I can.
My problem is I would like to be certain that any changes do not alter
the logic of the libraries. When doing this in C, I would simply
compile each module to an object file, calculate the MD5 of the object
file, then make the whitespace changes, recompile the object file and
compare the checksums. If the checksums match, then the files are
equivalent.
Is there any way to do something semantically the same as this with python?
Cheers
Tom
Probably the logical thing would be to run your test suite against
it, but assuming that's not an option, you could run the whole
thing through dis and check that the bytecode is identical. There's
probably an easier way to do this though.
Geremy Condra
Sorry. I post via gmane.org, so cc'ing you would require some extra
work, and I'm too lazy.
> I've written a bunch of internal libraries for my company, and they
> all use two space indents, and I'd like to be more consistent and
> conform to PEP-8 as much as I can.
>
> My problem is I would like to be certain that any changes do not
> alter the logic of the libraries. When doing this in C, I would
> simply compile each module to an object file, calculate the MD5 of
> the object file, then make the whitespace changes, recompile the
> object file and compare the checksums. If the checksums match, then
> the files are equivalent.
In my experience, that doesn't work. Whitespace changes can effect
line numbers, so object files containing debug info will differ. Many
object format also contain other "meta-data" about date, time, path of
source file, etc. that can differ between semantically equivalent
files.
> Is there any way to do something semantically the same as this with python?
Have you tried compiling the python files and compare the resulting
.pyc files?
--
Grant Edwards grant.b.edwards Yow! I selected E5 ... but
at I didn't hear "Sam the Sham
gmail.com and the Pharoahs"!
> [ Please keep me cc'ed, I'm not subscribed ]
Sorry; you may read this at
http://groups.google.com/group/comp.lang.python/
> I've written a bunch of internal libraries for my company, and they
> all use two space indents, and I'd like to be more consistent and
> conform to PEP-8 as much as I can.
reindent.py (in the Tools directory of your Python installation) does
exactly that.
> My problem is I would like to be certain that any changes do not alter
> the logic of the libraries. When doing this in C, I would simply
> compile each module to an object file, calculate the MD5 of the object
> file, then make the whitespace changes, recompile the object file and
> compare the checksums. If the checksums match, then the files are
> equivalent.
If you only reindent the code (without adding/removing lines) then you can
compare the compiled .pyc files (excluding the first 8 bytes that contain
a magic number and the source file timestamp). Remember that code objects
contain line number information.
--
Gabriel Genellina
dis looks like it may be interesting.
I had looked a little at the bytecode, but only enough to rule out md5
sums as a solution. Looking closer at the bytecode for a simple
module, it seems like only a few bytes change (see below for hexdumps
of the pyc).
So in this case, only bytes 5 and 6 changed, the rest of the file
remains exactly the same. Looks like I need to do some digging to find
out what those bytes mean.
Cheers
Tom
2 space indents:
00000000 d1 f2 0d 0a 51 a7 bc 4b 63 00 00 00 00 00 00 00 |....Q..Kc.......|
00000010 00 02 00 00 00 40 00 00 00 73 28 00 00 00 64 00 |.....@...s(...d.|
00000020 00 84 00 00 5a 00 00 65 01 00 64 01 00 6a 02 00 |....Z..e..d..j..|
00000030 6f 0e 00 01 65 00 00 65 02 00 83 01 00 01 6e 01 |o...e..e......n.|
00000040 00 01 64 02 00 53 28 03 00 00 00 63 01 00 00 00 |..d..S(....c....|
00000050 01 00 00 00 03 00 00 00 43 00 00 00 73 20 00 00 |........C...s ..|
00000060 00 64 01 00 47 48 7c 00 00 6f 10 00 01 68 01 00 |.d..GH|..o...h..|
00000070 64 02 00 64 01 00 36 47 48 6e 01 00 01 64 00 00 |d..d..6GHn...d..|
00000080 53 28 03 00 00 00 4e 74 05 00 00 00 68 65 6c 6c |S(....Nt....hell|
00000090 6f 74 05 00 00 00 77 6f 72 6c 64 28 00 00 00 00 |ot....world(....|
000000a0 28 01 00 00 00 74 03 00 00 00 62 61 72 28 00 00 |(....t....bar(..|
000000b0 00 00 28 00 00 00 00 73 0e 00 00 00 74 65 73 74 |..(....s....test|
000000c0 6c 69 62 2f 66 6f 6f 2e 70 79 74 03 00 00 00 66 |lib/foo.pyt....f|
000000d0 6f 6f 01 00 00 00 73 08 00 00 00 00 01 05 01 07 |oo....s.........|
000000e0 01 03 01 74 08 00 00 00 5f 5f 6d 61 69 6e 5f 5f |...t....__main__|
000000f0 4e 28 03 00 00 00 52 03 00 00 00 74 08 00 00 00 |N(....R....t....|
00000100 5f 5f 6e 61 6d 65 5f 5f 74 04 00 00 00 54 72 75 |__name__t....Tru|
00000110 65 28 00 00 00 00 28 00 00 00 00 28 00 00 00 00 |e(....(....(....|
00000120 73 0e 00 00 00 74 65 73 74 6c 69 62 2f 66 6f 6f |s....testlib/foo|
00000130 2e 70 79 74 08 00 00 00 3c 6d 6f 64 75 6c 65 3e |.pyt....<module>|
00000140 01 00 00 00 73 04 00 00 00 09 07 0d 01 |....s........|
0000014d
4 space indents:
00000000 d1 f2 0d 0a 51 a7 bc 4b 63 00 00 00 00 00 00 00 |....Q..Kc.......|
00000010 00 02 00 00 00 40 00 00 00 73 28 00 00 00 64 00 |.....@...s(...d.|
00000020 00 84 00 00 5a 00 00 65 01 00 64 01 00 6a 02 00 |....Z..e..d..j..|
00000030 6f 0e 00 01 65 00 00 65 02 00 83 01 00 01 6e 01 |o...e..e......n.|
00000040 00 01 64 02 00 53 28 03 00 00 00 63 01 00 00 00 |..d..S(....c....|
00000050 01 00 00 00 03 00 00 00 43 00 00 00 73 20 00 00 |........C...s ..|
00000060 00 64 01 00 47 48 7c 00 00 6f 10 00 01 68 01 00 |.d..GH|..o...h..|
00000070 64 02 00 64 01 00 36 47 48 6e 01 00 01 64 00 00 |d..d..6GHn...d..|
00000080 53 28 03 00 00 00 4e 74 05 00 00 00 68 65 6c 6c |S(....Nt....hell|
00000090 6f 74 05 00 00 00 77 6f 72 6c 64 28 00 00 00 00 |ot....world(....|
000000a0 28 01 00 00 00 74 03 00 00 00 62 61 72 28 00 00 |(....t....bar(..|
000000b0 00 00 28 00 00 00 00 73 0e 00 00 00 74 65 73 74 |..(....s....test|
000000c0 6c 69 62 2f 66 6f 6f 2e 70 79 74 03 00 00 00 66 |lib/foo.pyt....f|
000000d0 6f 6f 01 00 00 00 73 08 00 00 00 00 01 05 01 07 |oo....s.........|
000000e0 01 03 01 74 08 00 00 00 5f 5f 6d 61 69 6e 5f 5f |...t....__main__|
000000f0 4e 28 03 00 00 00 52 03 00 00 00 74 08 00 00 00 |N(....R....t....|
00000100 5f 5f 6e 61 6d 65 5f 5f 74 04 00 00 00 54 72 75 |__name__t....Tru|
00000110 65 28 00 00 00 00 28 00 00 00 00 28 00 00 00 00 |e(....(....(....|
00000120 73 0e 00 00 00 74 65 73 74 6c 69 62 2f 66 6f 6f |s....testlib/foo|
00000130 2e 70 79 74 08 00 00 00 3c 6d 6f 64 75 6c 65 3e |.pyt....<module>|
00000140 01 00 00 00 73 04 00 00 00 09 07 0d 01 |....s........|
0000014d
python code: testlib/foo.py
def foo(bar):
print "hello"
if bar:
print {
'hello': 'world'
}
if __name__ == "__main__":
foo(True)
You will also have to be careful about docstrings. If you are cleaning up for
style reasons, you will also end up indenting the triple-quoted docstrings and
thus change their contents. This will be reflected in the bytecode.
In [1]: def f():
...: """This is
...: a docstring.
...: """
...:
...:
In [2]: def g():
...: """This is
...: a docstring.
...: """
...:
...:
In [3]: f.__doc__
Out[3]: 'This is \n a docstring.\n '
In [4]: g.__doc__
Out[4]: 'This is\n a docstring.\n '
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
> I've written a bunch of internal libraries for my company, and they
> all use two space indents, and I'd like to be more consistent and
> conform to PEP-8 as much as I can.
“A foolish consistency is the hobgoblin of little minds”
— Ralph Waldo Emerson
> If you only reindent the code (without adding/removing lines) then you can
> compare the compiled .pyc files (excluding the first 8 bytes that contain
> a magic number and the source file timestamp). Remember that code objects
> contain line number information.
Anybody who ever creates another indentation-controlled language should be
beaten to death with a Guido van Rossum voodoo doll.
I'll go warn Don Syme. :P I wonder how Microsoft will react.
http://blogs.msdn.com/dsyme/archive/2006/08/24/715626.aspx
Cheers,
Chris
--
http://blog.rebertia.com/2010/01/24/of-braces-and-semicolons/
No, the dumb thing was shipping a Python implementation which accepted both
tabs and spaces for indentation in the same file. Tabs vs. spaces is
a religious issue, but mixing the two is unquestionably bad.
Python 3.x, finally, detects inconsistent tab/space indentation. But that
should have been in Python 0.001, so that no inconsistent code ever escaped
into the wild.
Check to see if your code will go through Python 3.x without indentation
complaints. If it won't, you need to fix it before re-indenting.
John Nagle
> Lawrence D'Oliveiro wrote:
>
>> In message <mailman.1610.1270655...@python.org>,
>> Gabriel Genellina wrote:
>>
>>> If you only reindent the code (without adding/removing lines) then you
>>> can compare the compiled .pyc files (excluding the first 8 bytes that
>>> contain a magic number and the source file timestamp). Remember that
>>> code objects contain line number information.
>>
>> Anybody who ever creates another indentation-controlled language should
>> be beaten to death with a Guido van Rossum voodoo doll.
>
> No ...
Yes, because otherwise you wouldn’t have stupid problems like the one which
is preoccupying this thread: how to make sure indentation is consistent
without introducing logic errors into the code.
Anybody who invents another brace-delimited language should be beaten.
You always end up with a big problem trying to make sure the braces
are consistent with the program logic.
--
Grant
> Anybody who invents another brace-delimited language should be beaten.
> You always end up with a big problem trying to make sure the braces
> are consistent with the program logic.
Would you prefer “begin” and “end” word symbols, then?
> Would you prefer ???begin??? and ???end??? word symbols, then?
Nope, I categorize those as nothing more than verbose "braces".
--
Grant
You go through life and you choose your stupid problems. Well, not
really, but you make your choices, and you wind up with stupid
problems. I use Linux, and have a different set of stupid problems
than I did when I used Windows.
But as Ian Bicking said:
I think there is a meme that Python people are close-minded to
suggestions for changes in the language. I think there is significant
truth to that. But sometimes everyone else is just completely wrong. I
want nothing to do with any programmer who would mis-indent their
code. If you want to mis-indent your code you are an idiot. If you
want idiotic code to be an option you are being absurd.
And as paraphrased by Markos Gaivo ( http://markos.gaivo.net/blog/?p=126
):
A: I don’t like Python because of significant whitespace.
B: Do you indent your code?
A: Yes, of course.
B: And the problem is?
Regards,
Pat
I don't see the problem here.
The OP has code which is already correctly indented. He wants to re-
indent it, from two spaces to four. As I see it, even a simple-minded
string replacement from 2 to 4 spaces should Just Work, provided you
don't care about extra spacing potentially being introduced into strings,
comments, etc.
E.g. if you have this (already ugly) code:
def f(a):
x = 42
if a < 0:
return a # Return a untouched.
else:
for i in range( x ) : # Do pointless work
pass
this_is_a_very_long_line = ( 2, 4, 5,
7, 9 )
return a+1
and just do a simple string replacement, you get this:
def f(a):
x = 42
if a < 0:
return a # Return a untouched.
else:
for i in range( x ) : # Do pointless work
pass
this_is_a_very_long_line = ( 2, 4, 5,
7, 9 )
return a+1
which is still ugly, but continues to work correctly. The only way to
break working code by re-indenting in such a simple-minded fashion is if
the layout of string literals is significant.
But of course no professional-quality re-indenter program would just do a
simple-minded replace(' ', ' ') on the source code, any more than a
professional-quality code beautifier for a brace language would just add
newlines and spaces around every brace it saw.
A less simple-minded re-indenter would only replace *indentation*, not
random whitespace. In that case, how could it break anything? Since the
indents are correct, you are mapping indents of 2, 4, 6, 8, ... spaces to
4, 8, 12, 16, .... What do you think will break?
If you start with broken indentation, it is difficult to fix, but that's
not the OP's problem. In a brace language (whether you spell them { } or
BEGIN END or START-BLOCK-HERE and END-BLOCK-HERE) if you start with
inconsistent braces, it is equally difficult to fix. Invalid code is
invalid code no matter what language you are using, and in general can't
be mechanically fixed. If the nature of the breakage is such that the
code is inconsistent or ambiguous, you need to *read* and *understand* it
to fix it, no matter whether you have braces or indents.
--
Steven
> I want nothing to do with any programmer who would mis-indent their
> code.
But what happens when you’re trying to reconcile two different indentation
conventions? In Python, there can be problems doing that without introducing
logic errors. That’s what this thread is about.
But since those symbols already, by definition, directly correspond to
program logic, where exactly does the “big problem” arise trying to make
sure they are “consistent with the program logic”?
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
> But what happens when you’re trying to reconcile two different
> indentation conventions? In Python, there can be problems doing that
> without introducing logic errors.
What happens whe you're trying to reconcile two different
block-delimiter conventions?
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
> But since [braces] already, by definition, directly correspond to
> program logic, where exactly does the “big problem” arise trying to
> make sure they are “consistent with the program logic”?
Since indentation already, by definition, directly corresponds to
program logic in Python, where exactly is the problem that leads you to
want to eradicate indentation as syntax?
In other words: I don't see why the problems you're asserting are
problematic for one kind of block delimitation syntax but not the other.
--
\ “[Freedom of speech] isn't something somebody else gives you. |
`\ That's something you give to yourself.” —_Hocus Pocus_, Kurt |
_o__) Vonnegut |
Ben Finney
The same goes for indentation. In python it's not possible to write a
program to correctly indent code that isn't alaready correctly indented.
In a brace delimited language it's not possible to write a program to
correctly place braces in an "incorrectly braced" program.
In either: if it is already correct, it's trivial to transform it into
another correct program with a different "style".
--
Well, I think Steven pointed out how this isn't usually, really, a
problem, and I tried to point out (in a section of my posting you
didn't quote) that you will have to deal with stupid little nits in
life no matter which way you go.
Personally, for me, this particular thing is a non-issue. If I'm
getting a code snippet from somewhere, fixing it up is not a problem,
and if I'm using a module or more from somewhere, I am not obsessive
about changing it to PEP 8.
Regards,
Pat
> What happens whe you're trying to reconcile two different
> block-delimiter conventions?
For example?
> On 2010-04-10, Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>
> wrote:
>
>> In message <hpokef$gvg$1...@reader1.panix.com>, Grant Edwards wrote:
>>
>>> On 2010-04-10, Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>
>>> wrote:
>>>
>>>> In message <hpoh5j$35j$1...@reader1.panix.com>, Grant Edwards wrote:
>>>>
>>>>> Anybody who invents another brace-delimited language should be beaten.
>>>>> You always end up with a big problem trying to make sure the braces
>>>>> are consistent with the program logic.
>>>
>>>> Would you prefer “begin” and “end” word symbols, then?
>>>
>>> Nope, I categorize those as nothing more than verbose "braces".
>>
>> But since those symbols already, by definition, directly correspond to
>> program logic, where exactly does the “big problem” arise trying to make
>> sure they are “consistent with the program logic”?
>
> The same goes for indentation. In python it's not possible to write a
> program to correctly indent code that isn't alaready correctly indented.
The problem isn’t that it’s “incorrectly” indented, it’s two different
pieces of code (correctly) indented according to two different conventions,
and how you reconcile them without introducing logic errors into the code.
> In a brace delimited language it's not possible to write a program to
> correctly place braces in an "incorrectly braced" program.
But with braces it’s easy enough to reconcile different indentation
conventions without introducing logic errors into the code.
>> The same goes for indentation. In python it's not possible to write
>> a program to correctly indent code that isn't alaready correctly
>> indented.
>
> The problem isn’t that it’s “incorrectly” indented, it’s two
> different pieces of code (correctly) indented according to two
> different conventions, and how you reconcile them without introducing
> logic errors into the code.
You use 'reindent.py' which a standard part of every Python release.
Except that should shouldn't because it messes up the version history.
>
>> In a brace delimited language it's not possible to write a program to
>> correctly place braces in an "incorrectly braced" program.
>
> But with braces it’s easy enough to reconcile different indentation
> conventions without introducing logic errors into the code.
>
Except you shouldn't for the above mentioned reason.
> The problem isn?t that it?s ?incorrectly? indented, it?s two
> different pieces of code (correctly) indented according to two
> different conventions, and how you reconcile them without introducing
> logic errors into the code.
I've never run into that problem, but it shold be simple enough to do
that in an automated way as well. Don't reindent and pindent work as
advertised?
>> In a brace delimited language it's not possible to write a program to
>> correctly place braces in an "incorrectly braced" program.
> But with braces it?s easy enough to reconcile different indentation
> conventions without introducing logic errors into the code.
Reconciling indention in an indention-delimited language doesn't
correspond to reconciling indentation in a brace-delimited language,
so comparing them doesn't really make sense.
--
Grant
Anybody who invents another programming language should be beaten.
You always end up with a big problem trying to make sure the program
logic is consistent with the customers's needs :-(
-- HansM
The world still needs new programming languages. People in 2050 will
look back at 2010 and laugh at the fact that Py3K was considered
halfway innovative. Indentation based syntax will be so common that
no one will even remember squigglies, end tags, or asymmetric memory
allocation schemes.
>> Anybody who invents another brace-delimited language should be beaten.
>> You always end up with a big problem trying to make sure the braces are
>> consistent with the program logic.
>
> Anybody who invents another programming language should be beaten. You
> always end up with a big problem trying to make sure the program logic
> is consistent with the customers's needs :-(
Anyone who invents another program should be beaten. You always end up
with a big problem trying to make sure the program logic is consistent
with what the customer thinks they need.
--
Steven
Anyone who takes on new customers should be beaten. You always end up
with a big problem trying to make sure that what the customer thinks
they need is what they actually need.
--
Greg
Anyone who makes another joke about anyone who need to be beaten needs
to be beaten. You always end up with a big problem with angry bruised
people wondering why they get beaten up.