Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pylint woes

355 views
Skip to first unread message

DFS

unread,
May 7, 2016, 12:51:18 PM5/7/16
to
This more-anal-than-me program generated almost 2 warnings for every
line of code in my program. w t hey?


DFS comments
+-------------------------+------------+ -------------------------------
|message id |occurrences |
+=========================+============+
|mixed-indentation |186 | I always use tab
+-------------------------+------------+
|invalid-name |82 | every single variable name?!
+-------------------------+------------+
|bad-whitespace |65 | mostly because I line up =
signs:
var1 = value
var10 = value

+-------------------------+------------+
|trailing-whitespace |59 | heh!
+-------------------------+------------+
|multiple-statements |23 | do this to save lines.
Will continue doing it.
+-------------------------+------------+
|no-member |5 |

"Module 'pyodbc' has no 'connect' member" Yes it does.
"Module 'pyodbc' has no 'Error' member" Yes it does.

Issue with pylint, or pyodbc?

+-------------------------+------------+
|line-too-long |5 | meh
+-------------------------+------------+
|wrong-import-order |4 | does it matter?
+-------------------------+------------+
|missing-docstring |4 | what's the difference between
a docstring and a # comment?
+-------------------------+------------+
|superfluous-parens |3 | I like to surround 'or'
statments with parens
+-------------------------+------------+
|redefined-outer-name |3 | fixed. changed local var names.
+-------------------------+------------+
|redefined-builtin |2 | fixed. Was using 'zip' and 'id'
+-------------------------+------------+
|multiple-imports |2 | doesn't everyone?
+-------------------------+------------+
|consider-using-enumerate |2 | see below [1]
+-------------------------+------------+
|bad-builtin |2 | warning because I used filter?
+-------------------------+------------+
|unused-import |1 | fixed
+-------------------------+------------+
|unnecessary-pass |1 | fixed. left over from
Try..Except
+-------------------------+------------+
|missing-final-newline |1 | I'm using Notepad++, with
EOL Conversion set to
'Windows Format'. How
or should I fix this?
+-------------------------+------------+
|fixme |1 | a TODO statement
+-------------------------+------------+

Global evaluation
-----------------
Your code has been rated at -7.64/10



I assume -7.64 is really bad?

Has anyone ever in history gotten 10/10 from pylint for a non-trivial
program?




After fixes and disabling various warnings:
"Your code has been rated at 8.37/10"


That's about as good as it's gonna get!



[1]
pylint says "Consider using enumerate instead of iterating with range
and len"

the offending code is:
for j in range(len(list1)):
do something with list1[j], list2[j], list3[j], etc.

enumeration would be:
for j,item in enumerate(list1):
do something with list1[j], list2[j], list3[j], etc.

Is there an advantage to using enumerate() here?






Chris Angelico

unread,
May 7, 2016, 1:01:34 PM5/7/16
to
On Sun, May 8, 2016 at 2:51 AM, DFS <nos...@dfs.com> wrote:
> [1]
> pylint says "Consider using enumerate instead of iterating with range and
> len"
>
> the offending code is:
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.
>
> enumeration would be:
> for j,item in enumerate(list1):
> do something with list1[j], list2[j], list3[j], etc.
>
> Is there an advantage to using enumerate() here?

The suggestion from a human would be to use zip(), or possibly to
change your data structures.

for item1, item2, item3 in zip(list1, list2, list3):
do something with the items

ChrisA

Michael Selik

unread,
May 7, 2016, 2:43:06 PM5/7/16
to
On Sat, May 7, 2016 at 12:56 PM DFS <nos...@dfs.com> wrote:

> |mixed-indentation |186 | I always use tab
>

Don't mix tabs and spaces. I suggest selecting all lines and using your
editor to convert spaces to tabs. Usually there's a feature to "tabify".


> +-------------------------+------------+
> |invalid-name |82 | every single variable name?!
>

Class names should be CamelCase
Everything else should be lowercase_with_underscores


> +-------------------------+------------+
> |bad-whitespace |65 | mostly because I line up =
> signs:
> var1 = value
> var10 = value
>

Sure, that's your style. But pylint likes a different style. It's good to
use a standard. If it's just you, I suggest conforming to pylint. If you're
already on a team, use your team's standard.

+-------------------------+------------+
> |trailing-whitespace |59 | heh!
>

Get rid of it. Save some bytes.


> +-------------------------+------------+
> |multiple-statements |23 | do this to save lines.
> Will continue doing it.
>

If you want to share your code with others, you should conform to community
standards to make things easier for others to read. Further, if you think
the core contributors are expert programmers, you should probably take
their advice: "sparse is better than dense". Do your future-self a favor
and write one statement per line. Today you find it easy to read. Six
months from now you won't.


> +-------------------------+------------+
> |no-member |5 |
>
> "Module 'pyodbc' has no 'connect' member" Yes it does.
> "Module 'pyodbc' has no 'Error' member" Yes it does.
>
> Issue with pylint, or pyodbc?
>

Not sure. Maybe pyodbc is written in a way that pylint can't see it's
connect or Error method/attribute.


> +-------------------------+------------+
> |line-too-long |5 | meh
>

Yeah, I think 80 characters can be somewhat tight. Still, 5 long lines in
200ish lines of code? Sounds like you might be doing too much in those
lines or have too many levels of indentation.
"Sparse is better than dense"
"Flat is better than nested"


> +-------------------------+------------+
> |wrong-import-order |4 | does it matter?
>

No. I think pylint likes to alphabetize. With only 4 imports, it doesn't
matter. Still, why not alphabetize?


> +-------------------------+------------+
> |missing-docstring |4 | what's the difference between
> a docstring and a # comment?
>

Docstrings are tools for introspection. Many things in Python access the
__doc__ attribute to help you. Comments are never seen by module users.


> +-------------------------+------------+
> |superfluous-parens |3 | I like to surround 'or'
> statments with parens
>

Ok. But over time you'll get used to not needing them. Edward Tufte says
you should have a high "information-to-ink" ratio.


> +-------------------------+------------+
> |redefined-outer-name |3 | fixed. changed local var names.
> +-------------------------+------------+
> |redefined-builtin |2 | fixed. Was using 'zip' and 'id'
> +-------------------------+------------+
> |multiple-imports |2 | doesn't everyone?
>

Yeah, I do that as well.


> +-------------------------+------------+
> |consider-using-enumerate |2 | see below [1]
>

As Chris explained.


> +-------------------------+------------+
> |bad-builtin |2 | warning because I used filter?
>

I think pylint likes comprehensions better. IMHO filter is OK. If you're
using a lambda, change to a comprehension.


> +-------------------------+------------+
> |unused-import |1 | fixed
> +-------------------------+------------+
> |unnecessary-pass |1 | fixed. left over from
> Try..Except
> +-------------------------+------------+
> |missing-final-newline |1 | I'm using Notepad++, with
> EOL Conversion set to
> 'Windows Format'. How
> or should I fix this?
>

Add a few blank lines to the end of your file.


> +-------------------------+------------+
> |fixme |1 | a TODO statement
> +-------------------------+------------+
>
> Global evaluation
> -----------------
> Your code has been rated at -7.64/10
>
>
>
> I assume -7.64 is really bad?
>
> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
> program?
>

I'm certain of it.

Peter Pearson

unread,
May 7, 2016, 2:43:55 PM5/7/16
to
On Sat, 7 May 2016 12:51:00 -0400, DFS <nos...@dfs.com> wrote:
> This more-anal-than-me program generated almost 2 warnings for every
> line of code in my program. w t hey?

Thank you for putting a sample of pylint output in front of my eyes;
you inspired me to install pylint and try it out. If it teaches me even
half as much as it's teaching you, I'll consider it a great blessing.

--
To email me, substitute nowhere->runbox, invalid->com.

Christopher Reimer

unread,
May 7, 2016, 2:52:45 PM5/7/16
to
On 5/7/2016 9:51 AM, DFS wrote:
> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
> program?

I routinely get 10/10 for my code. While pylint isn't perfect and
idiosyncratic at times, it's a useful tool to help break bad programming
habits. Since I came from a Java background, I had to unlearn everything
from Java before I could write Pythonic code. It might help to use an
IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE).

> That's about as good as it's gonna get!

You can do better. You should strive for 10/10 whenever possible,
figure out why you fall short and ask for help on the parts that don't
make sense.

> pylint says "Consider using enumerate instead of iterating with range
> and len"
>
> the offending code is:
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.

This code is reeking with bad habits to be broken. Assigning a throwaway
variable to walk the index is unnecessary when Python can do it for you
behind the scenes. As Chris A. pointed out in his post, you should use
zip() to walk through the values of each list at the same time.

Thank you,

Chris R.

Stephen Hansen

unread,
May 7, 2016, 3:21:37 PM5/7/16
to
Pylint is very opinionated. Feel free to adjust its configuration to
suit your opinions of style.

In particular, several of these might be related to PEP8 style issues.

On Sat, May 7, 2016, at 09:51 AM, DFS wrote:
> DFS comments
> +-------------------------+------------+ -------------------------------
> |message id |occurrences |
> +=========================+============+
> |mixed-indentation |186 | I always use tab

And yet, it appears there's some space indentation in there. In
Notepad++ enable View->Show Symbol->Show White Space and Tab and Show
Indent Guide.

> +-------------------------+------------+
> |invalid-name |82 | every single variable name?!

It probably defaults to PEP8 names, which are variables_like_this, not
variablesLikeThis.

> +-------------------------+------------+
> |bad-whitespace |65 | mostly because I line up =
> signs:
> var1 = value
> var10 = value

Yeah and PEP8 says don't do that. Adjust the configuration of pylint if
you want.

> +-------------------------+------------+
> |multiple-statements |23 | do this to save lines.
> Will continue doing it.

This you really shouldn't do, imho. Saving lines is not a virtue,
readability is -- dense code is by definition less readable.

> +-------------------------+------------+
> |no-member |5 |
>
> "Module 'pyodbc' has no 'connect' member" Yes it does.
> "Module 'pyodbc' has no 'Error' member" Yes it does.
>
> Issue with pylint, or pyodbc?

Pylint.

> +-------------------------+------------+
> |line-too-long |5 | meh

I'm largely meh on this too. But again its a PEP8 thing.

> +-------------------------+------------+
> |wrong-import-order |4 | does it matter?

Its useful to have a standard so you can glance and tell what's what and
from where, but what that standard is, is debatable.

> +-------------------------+------------+
> |missing-docstring |4 | what's the difference between
> a docstring and a # comment?

A docstring is a docstring, a comment is a comment. Google python
docstrings :) Python prefers files to have a docstring on top, and
functions beneath their definition. Comments should be used as little as
possible, as they must be maintained: an incorrect comment is worse then
no comment.

Go for clear code that doesn't *need* commenting.

> +-------------------------+------------+
> |superfluous-parens |3 | I like to surround 'or'
> statments with parens

Why?

> +-------------------------+------------+
> |multiple-imports |2 | doesn't everyone?

I don't actually know what its complaining at.

> +-------------------------+------------+
> |bad-builtin |2 | warning because I used filter?

Don't know what its complaining at about here either.

> +-------------------------+------------+
> |missing-final-newline |1 | I'm using Notepad++, with
> EOL Conversion set to
> 'Windows Format'. How
> or should I fix this?

Doesn't have anything to do with it. Just scroll to the bottom and press
enter. It wants to end on a newline, not code.

> Global evaluation
> -----------------
> Your code has been rated at -7.64/10
>
> I assume -7.64 is really bad?
>
> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
> program?

No clue, I don't use pylint at all.

> [1]
> pylint says "Consider using enumerate instead of iterating with range
> and len"
>
> the offending code is:
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.
>
> enumeration would be:
> for j,item in enumerate(list1):
> do something with list1[j], list2[j], list3[j], etc.
>
> Is there an advantage to using enumerate() here?

Its cleaner, easier to read. In Python 2 where range() returns a list,
its faster. (In python2, xrange returns a lazy evaluating range)

Use the tools Python gives you. Why reinvent enumerate when its built
in?

--
Stephen Hansen
m e @ i x o k a i . i o

Stephen Hansen

unread,
May 7, 2016, 3:23:33 PM5/7/16
to
On Sat, May 7, 2016, at 11:52 AM, Christopher Reimer wrote:
> You can do better. You should strive for 10/10 whenever possible,
> figure out why you fall short and ask for help on the parts that don't
> make sense.

I think this is giving far too much weight to pylint's opinion on what
is "good" or "bad" programming habits.

Terry Reedy

unread,
May 7, 2016, 3:40:50 PM5/7/16
to
On 5/7/2016 12:51 PM, DFS wrote:
> This more-anal-than-me program generated almost 2 warnings for every
> line of code in my program. w t hey?

If you don't like it, why do you use it?

I suppose the answer is that it did find a few things to check. You
might be happier with pychecker, which is much less aggressive. I
believe will find the things you did fix.


> DFS comments
> +-------------------------+------------+ -------------------------------
> |message id |occurrences |
> +=========================+============+
> |mixed-indentation |186 | I always use tab
> +-------------------------+------------+
> |invalid-name |82 | every single variable name?!

I would need examples to comment.

> +-------------------------+------------+
> |trailing-whitespace |59 | heh!

Any code editor should have a command to fix this.
IDLE: Format => strip trailing whitespace
Notepad++: Macro => trim trailing and save, Alt-Shift-S
others ...

> +-------------------------+------------+
> |no-member |5 |
>
> "Module 'pyodbc' has no 'connect' member" Yes it does.
> "Module 'pyodbc' has no 'Error' member" Yes it does.
>
> Issue with pylint, or pyodbc?

Worth looking into. Could be a bug somewhere. But I don't have pyodbc
installed.

> +-------------------------+------------+
> |line-too-long |5 | meh

For following the PEP guideline when patching CPython, this is helpful.

> +-------------------------+------------+
> |wrong-import-order |4 | does it matter?

Consistency in imports ultimately makes easier reading.
Many idlelib files use this order: stdlib modules other than tkinter and
idlelib (alphabetically); tkinter (tkinter first, then submodules);
idlelib (alphabetically). When I edit files, I sometimes reorder
imports to conform.

> +-------------------------+------------+
> |missing-docstring |4 | what's the difference between
> a docstring and a # comment?

# Comments only appear in the source
'''Docstrings are copied to the compiled code object, are interactively
accessible, and are used for help(ojb) output.'''


> +-------------------------+------------+
> |superfluous-parens |3 | I like to surround 'or'
> statments with parens

I would need examples to comment


> +-------------------------+------------+
> |bad-builtin |2 | warning because I used filter?

If they are still doing this in the latest release, it is an arrogance
and inconsistency bug on their part. Disable this check.

> +-------------------------+------------+
> |missing-final-newline |1 | I'm using Notepad++, with
> EOL Conversion set to
> 'Windows Format'.

That says to replace final '\n' with '\r\n'. It does not affect a
missing final newline ;-)

How or should I fix this?

Fix by hitting 'Enter' at the end of the last line.
Should you? I think it a good habit.

> After fixes and disabling various warnings:
> "Your code has been rated at 8.37/10"

Being able to customize pylint by turning off warnings is its saving
feature.

--
Terry Jan Reedy


Christopher Reimer

unread,
May 7, 2016, 3:43:55 PM5/7/16
to
On 5/7/2016 12:23 PM, Stephen Hansen wrote:
> On Sat, May 7, 2016, at 11:52 AM, Christopher Reimer wrote:
>> You can do better. You should strive for 10/10 whenever possible,
>> figure out why you fall short and ask for help on the parts that don't
>> make sense.
> I think this is giving far too much weight to pylint's opinion on what
> is "good" or "bad" programming habits.

I forgot to add the warning, "Use pylint with a dash of salt on a lemon
slice and a shot of tequila." :)

Thank you,

Chris R.

Ray Cote

unread,
May 7, 2016, 3:53:36 PM5/7/16
to
On Sat, May 7, 2016 at 2:52 PM, Christopher Reimer <
christoph...@icloud.com> wrote:

> On 5/7/2016 9:51 AM, DFS wrote:
>
>> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
>> program?
>>
>
> I routinely get 10/10 for my code. While pylint isn't perfect and
> idiosyncratic at times, it's a useful tool to help break bad programming
> habits.
>

I’m impressed with 10/10.
My approach is to ensure flake8 (a combination of pyflakes and pep8
checking) does not report any warnings and then run pyLint as a final
check.
Code usually ends up in the 9.0 to 9.5 range, sometimes a bit higher.
Also find it useful to add some additional short names we use to the
allowed names list.

Biggest issue I have with pyLint is that it complains when function
parameters are indented twice vs. once. pyFlakes likes the twice.
Example:
def function_name(
parm_1,
long_parm_name,
….
end_of_long_list_of params)
parm_1 = long_parm_name

—Ray




--
Raymond Cote, President
voice: +1.603.924.6079 email: rga...@AppropriateSolutions.com skype:
ray.cote

Christopher Reimer

unread,
May 7, 2016, 4:20:19 PM5/7/16
to
On 5/7/2016 12:52 PM, Ray Cote wrote:

> I’m impressed with 10/10.
> My approach is to ensure flake8 (a combination of pyflakes and pep8
> checking) does not report any warnings and then run pyLint as a final
> check.

I just installed pyflakes and ran it against my 10/10 files. It's not
complaining about anything. So I ran it against my unit tests that I'm
still writing, haven't cleaned up and checked against pylint. I got
dinged for using a star import on a file with a common variables, which
was an easy fix.

Thank you,

Chris R.

Chris Angelico

unread,
May 7, 2016, 5:56:51 PM5/7/16
to
On Sun, May 8, 2016 at 4:42 AM, Michael Selik <michae...@gmail.com> wrote:
>
>> +-------------------------+------------+
>> |line-too-long |5 | meh
>>
>
> Yeah, I think 80 characters can be somewhat tight. Still, 5 long lines in
> 200ish lines of code? Sounds like you might be doing too much in those
> lines or have too many levels of indentation.
> "Sparse is better than dense"
> "Flat is better than nested"

Others have commented on this, but I'll weigh in with one point that
hasn't been mentioned yet. A lot of tools will complain when you
exceed 80 (or 79) characters per line; but it depends somewhat on *how
far* you exceeded it. Some people opt instead for a 100-character
limit, or even 120, but most programmers agree that a 200-character
line (or more!) is too long.

So if this is complaining about five lines out of your entire program
that just snuck over the 80-character limit (eg 86 characters long),
it's not a concern, and my recommendation would be to relax the
restriction. And if those few lines are ginormous hunks of data
(static list initialization, or something), you might consider dumping
them out to external files rather than wrapping them into big code
blocks. But if they're truly long lines of code, wrap or split them.

ChrisA

DFS

unread,
May 7, 2016, 9:16:48 PM5/7/16
to
On 5/7/2016 1:01 PM, Chris Angelico wrote:
> On Sun, May 8, 2016 at 2:51 AM, DFS <nos...@dfs.com> wrote:
>> [1]
>> pylint says "Consider using enumerate instead of iterating with range and
>> len"
>>
>> the offending code is:
>> for j in range(len(list1)):
>> do something with list1[j], list2[j], list3[j], etc.
>>
>> enumeration would be:
>> for j,item in enumerate(list1):
>> do something with list1[j], list2[j], list3[j], etc.
>>
>> Is there an advantage to using enumerate() here?
>
> The suggestion from a human would be to use zip(), or possibly to
> change your data structures.

Happens like this:

address data is scraped from a website:

names = tree.xpath()
addr = tree.xpath()

I want to store the data atomically, so I parse street, city, state, and
zip into their own lists.

"1250 Peachtree Rd, Atlanta, GA 30303

street = [s.split(',')[0] for s in addr]
city = [c.split(',')[1].strip() for c in addr]
state = [s[-8:][:2] for s in addr]
zipcd = [z[-5:] for z in addr]

names = ["Taco Bell", "Wendy's"]
addr = ['928 Buford Dr, Tucker, GA 30043', '4880 Ptree Pkwy, Atlanta,
GA 30303']
street = ['928 Buford Dr', '4880 Sugarloaf Pkwy']
city = ['Tucker','Atlanta']
state = ['GA','GA']
zipcd = ['30043','30303']


When you say 'possibly change data structures'... to what?



> for item1, item2, item3 in zip(list1, list2, list3):
> do something with the items

ziplists = zip(names,street,city,state,zipcd)
print ziplists

[('Taco Bell', '928 Buford Dr', 'Tucker', 'GA', '30043'),
("Wendy's", '4880 Sugarloaf Pkwy', 'Atlanta', 'GA', '30303')]



Why is it better to zip() them up and use:

for item1, item2, item3 in zip(list1, list2, list3):
do something with the items

than

Chris Angelico

unread,
May 7, 2016, 9:36:41 PM5/7/16
to
On Sun, May 8, 2016 at 11:16 AM, DFS <nos...@dfs.com> wrote:
> On 5/7/2016 1:01 PM, Chris Angelico wrote:
>> The suggestion from a human would be to use zip(), or possibly to
>> change your data structures.
>
>
> Happens like this:
>
> address data is scraped from a website:
>
> names = tree.xpath()
> addr = tree.xpath()
>
> I want to store the data atomically, so I parse street, city, state, and zip
> into their own lists.
>
> "1250 Peachtree Rd, Atlanta, GA 30303
>
> street = [s.split(',')[0] for s in addr]
> city = [c.split(',')[1].strip() for c in addr]
> state = [s[-8:][:2] for s in addr]
> zipcd = [z[-5:] for z in addr]

So you're iterating over addr lots of times, and building separate
lists. As an alternative, you could iterate over it *once*, and have a
single object representing an address.

> Why is it better to zip() them up and use:
>
> for item1, item2, item3 in zip(list1, list2, list3):
> do something with the items
>
> than
>
>
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.

Because 'j' is insignificant here, as is the length of the list. What
you're doing is iterating over three parallel lists - not counting
numbers. Imagine that, instead of lists, you just have *sequences* -
ordered collections of things. You can follow a recipe without knowing
the numbers of the individual lines; you just need to know the
sequence. Here, iterate over this collection:

* Collect ingredients.
* Cream the butter and the sugar.
* Sift the salt into the flour.
* Fold the mixture into an origami crane.

These instructions work whether they're numbered or not.

ChrisA

Terry Reedy

unread,
May 7, 2016, 9:45:19 PM5/7/16
to
On 5/7/2016 3:52 PM, Ray Cote wrote:

> Biggest issue I have with pyLint is that it complains when function
> parameters are indented twice vs. once. pyFlakes likes the twice.
> Example:
> def function_name(
> parm_1,
> long_parm_name,
> ….
> end_of_long_list_of params)
> parm_1 = long_parm_name

This is the recommendation in PEP 8. I would otherwise insert a blank
line before the body.

tjr



Stephen Hansen

unread,
May 7, 2016, 10:14:42 PM5/7/16
to
On Sat, May 7, 2016, at 06:16 PM, DFS wrote:

> Why is it better to zip() them up and use:
>
> for item1, item2, item3 in zip(list1, list2, list3):
> do something with the items
>
> than
>
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.

Although Chris has a perfectly good and valid answer why conceptually
the zip is better, let me put forth: the zip is simply clearer, more
readable and more maintainable.

This is a question of style and to a certain degree aesthetics, so is
somewhat subjective, but range(len(list1)) and list1[j] are all
indirection, when item1 is clearly (if given a better name then 'item1')
something distinct you're working on.

DFS

unread,
May 7, 2016, 10:15:54 PM5/7/16
to
On 5/7/2016 9:36 PM, Chris Angelico wrote:
> On Sun, May 8, 2016 at 11:16 AM, DFS <nos...@dfs.com> wrote:
>> On 5/7/2016 1:01 PM, Chris Angelico wrote:
>>> The suggestion from a human would be to use zip(), or possibly to
>>> change your data structures.
>>
>>
>> Happens like this:
>>
>> address data is scraped from a website:
>>
>> names = tree.xpath()
>> addr = tree.xpath()
>>
>> I want to store the data atomically, so I parse street, city, state, and zip
>> into their own lists.
>>
>> "1250 Peachtree Rd, Atlanta, GA 30303
>>
>> street = [s.split(',')[0] for s in addr]
>> city = [c.split(',')[1].strip() for c in addr]
>> state = [s[-8:][:2] for s in addr]
>> zipcd = [z[-5:] for z in addr]
>
> So you're iterating over addr lots of times, and building separate
> lists. As an alternative, you could iterate over it *once*, and have a
> single object representing an address.


I like the idea of one iteration, but how? (I'll be trying myself
before I check back in)

Remember, it's required to split the data up, to give flexibility in
sorting, searching, output, etc.

I saw a cool example where someone built a list and used it to do a bulk
INSERT. That probably won't work well here, because one of the options
I give the user is # of addresses to store. So I do invididual INSERTs
using the 'for j in range()' method, which makes it easier to track how
many addresses have been stored.


>> Why is it better to zip() them up and use:
>>
>> for item1, item2, item3 in zip(list1, list2, list3):
>> do something with the items
>>
>> than
>>
>>
>> for j in range(len(list1)):
>> do something with list1[j], list2[j], list3[j], etc.
>
> Because 'j' is insignificant here, as is the length of the list.

Sorry, but I don't understand what you mean by insignificant. j keeps
track of the position in the list - regardless of the length of the list.


> What
> you're doing is iterating over three parallel lists - not counting
> numbers. Imagine that, instead of lists, you just have *sequences* -
> ordered collections of things. You can follow a recipe without knowing
> the numbers of the individual lines; you just need to know the
> sequence. Here, iterate over this collection:
>
> * Collect ingredients.
> * Cream the butter and the sugar.
> * Sift the salt into the flour.
> * Fold the mixture into an origami crane.
>
> These instructions work whether they're numbered or not.

Again, not following you.


The only reason

for j in range(len(list1)):
do something with list1[j], list2[j], list3[j], etc.

or

for item1, item2, item3 in zip(list1, list2, list3):
do something with the items

works is because each list has the same number of items.


MRAB

unread,
May 7, 2016, 10:21:55 PM5/7/16
to
On 2016-05-08 03:14, Stephen Hansen wrote:
> On Sat, May 7, 2016, at 06:16 PM, DFS wrote:
>
>> Why is it better to zip() them up and use:
>>
>> for item1, item2, item3 in zip(list1, list2, list3):
>> do something with the items
>>
>> than
>>
>> for j in range(len(list1)):
>> do something with list1[j], list2[j], list3[j], etc.
>
> Although Chris has a perfectly good and valid answer why conceptually
> the zip is better, let me put forth: the zip is simply clearer, more
> readable and more maintainable.
>
> This is a question of style and to a certain degree aesthetics, so is
> somewhat subjective, but range(len(list1)) and list1[j] are all
> indirection, when item1 is clearly (if given a better name then 'item1')
> something distinct you're working on.
>
+1

If you're iterating through multiple sequences in parallel, zip is the
way to go.

Chris Angelico

unread,
May 7, 2016, 10:51:01 PM5/7/16
to
On Sun, May 8, 2016 at 12:15 PM, DFS <nos...@dfs.com> wrote:
> On 5/7/2016 9:36 PM, Chris Angelico wrote:
>>
>> On Sun, May 8, 2016 at 11:16 AM, DFS <nos...@dfs.com> wrote:
>>>
>>> street = [s.split(',')[0] for s in addr]
>>> city = [c.split(',')[1].strip() for c in addr]
>>> state = [s[-8:][:2] for s in addr]
>>> zipcd = [z[-5:] for z in addr]
>>
>>
>> So you're iterating over addr lots of times, and building separate
>> lists. As an alternative, you could iterate over it *once*, and have a
>> single object representing an address.
>
> I like the idea of one iteration, but how? (I'll be trying myself before I
> check back in)
>
> Remember, it's required to split the data up, to give flexibility in
> sorting, searching, output, etc.

Start by unpacking the comprehensions into statement form.

street = []
for s in addr:
street.append(s.split(',')[0])
city = []
for c in addr:
city.append(c.split(',')[1].strip())
state = []
for s in addr:
state.append(s[-8:][:2])
zipcd = []
for z in addr:
zipcd.append(z[-5:])

Now see how you're doing the same thing four times? Let's start by
keeping it the way it is, but combine the loops.

street, city, state, zipcd = [], [], [], []
for a in addr:
street.append(a.split(',')[0])
city.append(a.split(',')[1].strip())
state.append(a[-8:][:2])
zipcd.append(a[-5:])

Side point: I prefer collections to be named in the plural, so these
would be "streets", and "addrs". This lets you follow a very simple
rule of iteration: "for item in collection" or "for singular in
plural". In this case, "for address in addresses" is classic
iteration.

So, now that you have a single loop picking up the different pieces,
it's easy to build up a simple object that represents an address.

# Either this
from collections import namedtuple
Address = namedtuple("Address", ["street", "city", "state", "zipcd"])
# or this
from types import SimpleNamespace
class Address(SimpleNamespace): pass

addresses = []
for a in addr:
addresses.append(Address(
street=a.split(',')[0],
city=a.split(',')[1].strip(),
state=a[-8:][:2],
zipcd=a[-5:],
)

Voila! One iteration, and a single object representing an address.

> I saw a cool example where someone built a list and used it to do a bulk
> INSERT. That probably won't work well here, because one of the options I
> give the user is # of addresses to store. So I do invididual INSERTs using
> the 'for j in range()' method, which makes it easier to track how many
> addresses have been stored.

You could slice it if you actually want that.

>>> Why is it better to zip() them up and use:
>>>
>>> for item1, item2, item3 in zip(list1, list2, list3):
>>> do something with the items
>>>
>>> than
>>>
>>>
>>> for j in range(len(list1)):
>>> do something with list1[j], list2[j], list3[j], etc.
>>
>>
>> Because 'j' is insignificant here, as is the length of the list.
>
> Sorry, but I don't understand what you mean by insignificant. j keeps track
> of the position in the list - regardless of the length of the list.

Right, but *who cares* what the position is? All you want to do is the
"do something" bit. Don't think in terms of concrete and discrete
operations in a computer; think in the abstract (what are you trying
to accomplish?), and then represent that in code.

>> What
>> you're doing is iterating over three parallel lists - not counting
>> numbers. Imagine that, instead of lists, you just have *sequences* -
>> ordered collections of things. You can follow a recipe without knowing
>> the numbers of the individual lines; you just need to know the
>> sequence. Here, iterate over this collection:
>>
>> * Collect ingredients.
>> * Cream the butter and the sugar.
>> * Sift the salt into the flour.
>> * Fold the mixture into an origami crane.
>>
>> These instructions work whether they're numbered or not.
>
> Again, not following you.
>
>
> The only reason
>
> for j in range(len(list1)):
> do something with list1[j], list2[j], list3[j], etc.
>
> or
>
> for item1, item2, item3 in zip(list1, list2, list3):
> do something with the items
>
> works is because each list has the same number of items.

Sure, but who cares what each item's position is? All that matters is
that they have corresponding positions, which is what zip() does.

Imagine you don't even have the whole lists yet. Imagine someone's
still writing stuff to them as you work. Maybe they're infinite in
length. You can't iterate up to the length of list1, because it
doesn't HAVE a length. But you can still zip it up with other parallel
collections, and iterate over them all.

ChrisA

DFS

unread,
May 7, 2016, 11:04:29 PM5/7/16
to
The lists I actually use are:

for j in range(len(nms)):
cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
vals = nms[j],street[j],city[j],state[j],zipcd[j]


The enumerated version would be:

ziplists = zip(nms,street,city,state,zipcd)
for nm,street,city,state,zipcd in ziplists:
cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
vals = nm,street,city,state,zipcd


I guess the enumeration() is a little nicer to look at. Why do you
think it's more maintainable?


Aside: I haven't tried, but is 'names' a bad idea or illegal for the
name of a python list or variable?


Thanks

Steven D'Aprano

unread,
May 7, 2016, 11:26:07 PM5/7/16
to
On Sun, 8 May 2016 02:51 am, DFS wrote:

> This more-anal-than-me program generated almost 2 warnings for every
> line of code in my program. w t hey?
>
>
> DFS comments
> +-------------------------+------------+ -------------------------------
> |message id |occurrences |
> +=========================+============+
> |mixed-indentation |186 | I always use tab

Obviously not. There are 186 occurrences where you use mixed tabs and
spaces.

Try running Tabnanny on you file:

python -m tabnanny <path to your file>


> +-------------------------+------------+
> |invalid-name |82 | every single variable name?!

Maybe. What are they called?


> +-------------------------+------------+
> |bad-whitespace |65 | mostly because I line up =
> signs:
> var1 = value
> var10 = value

Yuck. How much time do you waste aligning assignments whenever you add or
delete or edit a variable?


> +-------------------------+------------+
> |trailing-whitespace |59 | heh!
> +-------------------------+------------+
> |multiple-statements |23 | do this to save lines.
> Will continue doing it.

Why? Do you think that there's a world shortage of newline characters? Is
the Enter key on your keyboard broken?


> +-------------------------+------------+
> |no-member |5 |
>
> "Module 'pyodbc' has no 'connect' member" Yes it does.
> "Module 'pyodbc' has no 'Error' member" Yes it does.
>
> Issue with pylint, or pyodbc?

*shrug* More likely with Pylint.


> +-------------------------+------------+
> |line-too-long |5 | meh
> +-------------------------+------------+
> |wrong-import-order |4 | does it matter?

Probably not. I'm curious what it thinks is the right import order.



> +-------------------------+------------+
> |missing-docstring |4 | what's the difference between
> a docstring and a # comment?

Comments exist only in the source code.

Docstrings are available for interactive use with help(), for runtime
introspection, and for doctests.

https://docs.python.org/2/library/doctest.html


> +-------------------------+------------+
> |multiple-imports |2 | doesn't everyone?

You mean something like this?

import spam, ham, eggs, cheese

*shrug* It's a style thing.


> +-------------------------+------------+
> |consider-using-enumerate |2 | see below [1]

Absolutely use enumerate.


> +-------------------------+------------+
> |bad-builtin |2 | warning because I used filter?

Well that's just stupid. Bad PyLint. This should absolutely not be turned on
by default.


--
Steven

DFS

unread,
May 7, 2016, 11:29:11 PM5/7/16
to
On 5/7/2016 3:40 PM, Terry Reedy wrote:
> On 5/7/2016 12:51 PM, DFS wrote:
>> This more-anal-than-me program generated almost 2 warnings for every
>> line of code in my program. w t hey?
>
> If you don't like it, why do you use it?


I've never used it before last night. I was shocked at what it spewed
back at me.



> I suppose the answer is that it did find a few things to check. You
> might be happier with pychecker, which is much less aggressive.

I'll give it a shot.



> I believe will find the things you did fix.

I'm not parsing this statement. You mean pychecker will find the same
things pylint found, and that I fixed?

If it finds them after I fixed them... it's a magical program :)



DFS comments
>> +-------------------------+------------+ -------------------------------
>> |message id |occurrences |
>> +=========================+============+
>> |mixed-indentation |186 | I always use tab
>> +-------------------------+------------+
>> |invalid-name |82 | every single variable name?!
>
> I would need examples to comment.


Invalid constant name "cityzip" (invalid-name)
Invalid constant name "state" (invalid-name)
Invalid constant name "miles" (invalid-name)
Invalid constant name "store" (invalid-name)
Invalid variable name "rs" (invalid-name)



>> +-------------------------+------------+
>> |trailing-whitespace |59 | heh!
>
> Any code editor should have a command to fix this.
> IDLE: Format => strip trailing whitespace
> Notepad++: Macro => trim trailing and save, Alt-Shift-S
> others ...

That did it.


>> +-------------------------+------------+
>> |no-member |5 |
>>
>> "Module 'pyodbc' has no 'connect' member" Yes it does.
>> "Module 'pyodbc' has no 'Error' member" Yes it does.
>>
>> Issue with pylint, or pyodbc?
>
> Worth looking into. Could be a bug somewhere. But I don't have pyodbc
> installed.
>
>> +-------------------------+------------+
>> |line-too-long |5 | meh
>
> For following the PEP guideline when patching CPython, this is helpful.
>
>> +-------------------------+------------+
>> |wrong-import-order |4 | does it matter?
>
> Consistency in imports ultimately makes easier reading.
> Many idlelib files use this order: stdlib modules other than tkinter and
> idlelib (alphabetically); tkinter (tkinter first, then submodules);
> idlelib (alphabetically). When I edit files, I sometimes reorder
> imports to conform.


It complains 2x about this:

import os, sys, time, datetime
import pyodbc, sqlite3
import re, requests
from lxml import html


But I think there are some pylint bugs here:
-------------------------------------------------------------------------

standard import "import pyodbc, sqlite3" comes before "import pyodbc,
sqlite3" (wrong-import-order)

* complains that the line comes before itself?

-------------------------------------------------------------------------

standard import "import re, requests" comes before "import pyodbc,
sqlite3" (wrong-import-order)

* So I switched them, and then it complained about that:

standard import "import pyodbc, sqlite3" comes before "import re,
requests" (wrong-import-order)

-------------------------------------------------------------------------

You can't win with pylint...

And, the author probably isn't a native English-speaker, since when he
says 'comes before' I think he means 'should come before'.





>> +-------------------------+------------+
>> |missing-docstring |4 | what's the difference between
>> a docstring and a # comment?
>
> # Comments only appear in the source
> '''Docstrings are copied to the compiled code object, are interactively
> accessible, and are used for help(ojb) output.'''
>
>
>> +-------------------------+------------+
>> |superfluous-parens |3 | I like to surround 'or'
>> statments with parens
>
> I would need examples to comment


if ("Please choose a state" in str(matches)):
if (var == "val" or var2 == "val2"):


>> +-------------------------+------------+
>> |bad-builtin |2 | warning because I used filter?
>
> If they are still doing this in the latest release, it is an arrogance
> and inconsistency bug on their part. Disable this check.

$ pylint --version
No config file found, using default configuration
pylint 1.5.5,
astroid 1.4.5
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:32:19) [MSC v.1500
32 bit (Intel)]


It says "Used builtin function 'filter'. Using a list comprehension can
be clearer. (bad-builtin)"



>> +-------------------------+------------+
>> |missing-final-newline |1 | I'm using Notepad++, with
>> EOL Conversion set to
>> 'Windows Format'.
>
> That says to replace final '\n' with '\r\n'. It does not affect a
> missing final newline ;-)
>
> How or should I fix this?
>
> Fix by hitting 'Enter' at the end of the last line.
> Should you? I think it a good habit.

Done


>> After fixes and disabling various warnings:
>> "Your code has been rated at 8.37/10"
>
> Being able to customize pylint by turning off warnings is its saving
> feature.

Yes. If I had to see 300-350 lines of output every time I wouldn't ever
use it again.

Overall, I do like a majority of the things it suggested.

DFS

unread,
May 7, 2016, 11:38:42 PM5/7/16
to
On 5/7/2016 2:52 PM, Christopher Reimer wrote:
> On 5/7/2016 9:51 AM, DFS wrote:
>> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
>> program?
>
> I routinely get 10/10 for my code. While pylint isn't perfect and
> idiosyncratic at times, it's a useful tool to help break bad programming
> habits. Since I came from a Java background, I had to unlearn everything
> from Java before I could write Pythonic code. It might help to use an
> IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE).
>
>> That's about as good as it's gonna get!
>
> You can do better.

10/10 on pylint isn't better. It's being robotic and conforming to the
opinions of the author of that app.

In fact, I think:

import os, sys, time, socket

is much more readable than, and preferable to,

import os
import sys
import time
import socket

but pylint complains about the former.




> You should strive for 10/10 whenever possible,

nah


> figure out why you fall short and ask for help on the parts that don't
> make sense.

I actually agree with ~3/4 of the suggestions it makes. My code ran
fine before pylint tore it a new one, and it doesn't appear to run any
better after making various fixes.

But between you clp guys and pylint, the code is definitely improving.



>> pylint says "Consider using enumerate instead of iterating with range
>> and len"
>>
>> the offending code is:
>> for j in range(len(list1)):
>> do something with list1[j], list2[j], list3[j], etc.
>
> This code is reeking with bad habits to be broken. Assigning a throwaway
> variable to walk the index is unnecessary when Python can do it for you
> behind the scenes.

Don't you think python also allocates a throwaway variable for use with
zip and enumerate()?



> As Chris A. pointed out in his post, you should use
> zip() to walk through the values of each list at the same time.

Yeah, zip looks interesting. I just started using python a month ago,
and didn't know about zip until pylint pointed it out (it said I
redefined a builtin by using 'zip' as a list name).

Edit: I already put zip() it in place. Only improvement I think is it
looks cleaner - got rid of a bunch of [j]s.





> Thank you,
>
> Chris R.


No, thank /you/,

DFS




Stephen Hansen

unread,
May 7, 2016, 11:47:15 PM5/7/16
to
On Sat, May 7, 2016, at 08:04 PM, DFS wrote:
> The lists I actually use are:
>
> for j in range(len(nms)):
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>
>
> The enumerated version would be:
>
> ziplists = zip(nms,street,city,state,zipcd)
> for nm,street,city,state,zipcd in ziplists:
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
> vals = nm,street,city,state,zipcd
>
>
> I guess the enumeration() is a little nicer to look at. Why do you
> think it's more maintainable?

Code is read more then its written.

That which is nicer to look at, therefore, is easier to read.

That which is easier to read is easier to maintain.

Beyond that, its simpler, and more clearly articulates in the local
space what's going on.

> Aside: I haven't tried, but is 'names' a bad idea or illegal for the
> name of a python list or variable?

Nothing wrong with names. Or 'name', for that matter. Try to avoid
abbreviations.

Chris Angelico

unread,
May 7, 2016, 11:51:19 PM5/7/16
to
On Sun, May 8, 2016 at 1:28 PM, DFS <nos...@dfs.com> wrote:
> Invalid constant name "cityzip" (invalid-name)
> Invalid constant name "state" (invalid-name)
> Invalid constant name "miles" (invalid-name)
> Invalid constant name "store" (invalid-name)
> Invalid variable name "rs" (invalid-name)

... huh?? The first four seem to have been incorrectly detected as
constants. How are they used?

The last one is probably "too short". Or something.

> standard import "import re, requests" comes before "import pyodbc, sqlite3"
> (wrong-import-order)
>
> * So I switched them, and then it complained about that:
>
> standard import "import pyodbc, sqlite3" comes before "import re, requests"
> (wrong-import-order)
>
> -------------------------------------------------------------------------
>
> You can't win with pylint...

Probably that means it got confused by the alphabetization - "pyodbc"
should come before "re" and "requests", but "sqlite3" should come
after. Either fix the first problem by splitting them onto separate
lines, or ignore this as a cascaded error.

My general principle is that things on one line should *belong* on one
line. So having "import re, requests" makes no sense, but I might have
something like "import os, sys" when the two modules are both used in
one single line of code and never again. Otherwise, splitting them out
is the easiest.


>>> +-------------------------+------------+
>>> |superfluous-parens |3 | I like to surround 'or'
>>> statments with parens
>>
>>
>> I would need examples to comment
>
>
>
> if ("Please choose a state" in str(matches)):
> if (var == "val" or var2 == "val2"):

Cut the parens. Easy!

> It says "Used builtin function 'filter'. Using a list comprehension can be
> clearer. (bad-builtin)"

Kill that message and keep using filter.

ChrisA

Stephen Hansen

unread,
May 7, 2016, 11:55:40 PM5/7/16
to
On Sat, May 7, 2016, at 08:28 PM, DFS wrote:
> >> +-------------------------+------------+
> >> |superfluous-parens |3 | I like to surround 'or'
> >> statments with parens
> >
> > I would need examples to comment
>
>
> if ("Please choose a state" in str(matches)):
> if (var == "val" or var2 == "val2"):

Gah, don't do that. You're adding meaningless noise.

Especially in the first case.

Chris Angelico

unread,
May 7, 2016, 11:57:14 PM5/7/16
to
On Sun, May 8, 2016 at 1:38 PM, DFS <nos...@dfs.com> wrote:
>> This code is reeking with bad habits to be broken. Assigning a throwaway
>> variable to walk the index is unnecessary when Python can do it for you
>> behind the scenes.
>
>
> Don't you think python also allocates a throwaway variable for use with zip
> and enumerate()?

Nope. But even if it did, it wouldn't matter. Concern yourself with
your code, and let the implementation take care of itself.

ChrisA

DFS

unread,
May 8, 2016, 12:10:49 AM5/8/16
to
On 5/7/2016 11:25 PM, Steven D'Aprano wrote:
> On Sun, 8 May 2016 02:51 am, DFS wrote:
>
>> This more-anal-than-me program generated almost 2 warnings for every
>> line of code in my program. w t hey?
>>
>>
>> DFS comments
>> +-------------------------+------------+ -------------------------------
>> |message id |occurrences |
>> +=========================+============+
>> |mixed-indentation |186 | I always use tab
>
> Obviously not. There are 186 occurrences where you use mixed tabs and
> spaces.


I mean I always use tab after :

The program won't run otherwise. If I use spaces, 100% of the time it
throws:

IndentationError: unindent does not match any outer indentation level




> Try running Tabnanny on you file:
>
> python -m tabnanny <path to your file>


Didn't seem to do anything.



>> +-------------------------+------------+
>> |invalid-name |82 | every single variable name?!
>
> Maybe. What are they called?
>
>
>> +-------------------------+------------+
>> |bad-whitespace |65 | mostly because I line up =
>> signs:
>> var1 = value
>> var10 = value
>
> Yuck. How much time do you waste aligning assignments whenever you add or
> delete or edit a variable?

Lots. It takes hours to add or delete 3 whitespaces.



>> +-------------------------+------------+
>> |trailing-whitespace |59 | heh!
>> +-------------------------+------------+
>> |multiple-statements |23 | do this to save lines.
>> Will continue doing it.
>
> Why? Do you think that there's a world shortage of newline characters? Is
> the Enter key on your keyboard broken?

I do it because I like it.

if verbose: print var

python doesn't complain.





>> +-------------------------+------------+
>> |no-member |5 |
>>
>> "Module 'pyodbc' has no 'connect' member" Yes it does.
>> "Module 'pyodbc' has no 'Error' member" Yes it does.
>>
>> Issue with pylint, or pyodbc?
>
> *shrug* More likely with Pylint.



>> +-------------------------+------------+
>> |line-too-long |5 | meh
>> +-------------------------+------------+
>> |wrong-import-order |4 | does it matter?
>
> Probably not. I'm curious what it thinks is the right import order.


"wrong-import-order (C0411):
%s comes before %s Used when PEP8 import order is not respected
(standard imports first, then third-party libraries, then local imports)"

https://docs.pylint.org/features.html



I think there are some pylint bugs here:
-------------------------------------------------------------------------

standard import "import pyodbc, sqlite3" comes before "import pyodbc,
sqlite3" (wrong-import-order)

* complains that the line comes before itself?

-------------------------------------------------------------------------

standard import "import re, requests" comes before "import pyodbc,
sqlite3" (wrong-import-order)

* So I switched them, and then it complained about that:

standard import "import pyodbc, sqlite3" comes before "import re,
requests" (wrong-import-order)

-------------------------------------------------------------------------





>> +-------------------------+------------+
>> |missing-docstring |4 | what's the difference between
>> a docstring and a # comment?
>
> Comments exist only in the source code.
>
> Docstrings are available for interactive use with help(), for runtime
> introspection, and for doctests.
>
> https://docs.python.org/2/library/doctest.html

Thanks


>> +-------------------------+------------+
>> |multiple-imports |2 | doesn't everyone?
>
> You mean something like this?
>
> import spam, ham, eggs, cheese
>
> *shrug* It's a style thing.

pylint gives you demerits for that.



>> +-------------------------+------------+
>> |consider-using-enumerate |2 | see below [1]
>
> Absolutely use enumerate.


Everyone else says so, too. But other than cleaner-looking code, I'm
not understanding how it's a real advantage over:

for j in range(len(list)):




>> +-------------------------+------------+
>> |bad-builtin |2 | warning because I used filter?
>
> Well that's just stupid. Bad PyLint. This should absolutely not be turned on
> by default.


Chris Angelico

unread,
May 8, 2016, 12:22:10 AM5/8/16
to
On Sun, May 8, 2016 at 2:10 PM, DFS <nos...@dfs.com> wrote:
>>> +-------------------------+------------+
>>> |trailing-whitespace |59 | heh!
>>> +-------------------------+------------+
>>> |multiple-statements |23 | do this to save lines.
>>> Will continue doing it.
>>
>>
>> Why? Do you think that there's a world shortage of newline characters? Is
>> the Enter key on your keyboard broken?
>
>
> I do it because I like it.
>
> if verbose: print var
>
> python doesn't complain.

That's a massively-debated point. In that specific example, I'd
support the one-liner; however, I'd also recommend this technique:

# for Python 2, you need to start with this line
from __future__ import print_function

if verbose:
verbiage = print
else:
def verbiage(*args): pass

Then, instead of "if verbose: print(var)", you would use
"verbiage(var)". Of course, you want something better than "verbiage"
as your name; the nature of your verbose output might give a clue as
to what name would work.

ChrisA

DFS

unread,
May 8, 2016, 12:41:13 AM5/8/16
to
On 5/7/2016 11:51 PM, Chris Angelico wrote:
> On Sun, May 8, 2016 at 1:28 PM, DFS <nos...@dfs.com> wrote:
>> Invalid constant name "cityzip" (invalid-name)
>> Invalid constant name "state" (invalid-name)
>> Invalid constant name "miles" (invalid-name)
>> Invalid constant name "store" (invalid-name)
>> Invalid variable name "rs" (invalid-name)
>
> ... huh?? The first four seem to have been incorrectly detected as
> constants. How are they used?

The first four are set once and not changed. Probably that's why it
calls it a constant.




> The last one is probably "too short". Or something.

In this case, rs is a pyodbc row object.

rs = cursor.fetchone()



>> standard import "import re, requests" comes before "import pyodbc, sqlite3"
>> (wrong-import-order)
>>
>> * So I switched them, and then it complained about that:
>>
>> standard import "import pyodbc, sqlite3" comes before "import re, requests"
>> (wrong-import-order)
>>
>> -------------------------------------------------------------------------
>>
>> You can't win with pylint...
>
> Probably that means it got confused by the alphabetization - "pyodbc"
> should come before "re" and "requests", but "sqlite3" should come
> after. Either fix the first problem by splitting them onto separate
> lines, or ignore this as a cascaded error.
>
> My general principle is that things on one line should *belong* on one
> line. So having "import re, requests" makes no sense, but I might have
> something like "import os, sys" when the two modules are both used in
> one single line of code and never again. Otherwise, splitting them out
> is the easiest.


I like to put them on a related line. Didn't know where re belonged,
and I don't like putting them on single line each.



>>>> +-------------------------+------------+
>>>> |superfluous-parens |3 | I like to surround 'or'
>>>> statments with parens
>>>
>>>
>>> I would need examples to comment
>>
>>
>>
>> if ("Please choose a state" in str(matches)):
>> if (var == "val" or var2 == "val2"):
>
> Cut the parens. Easy!


Maybe. I actually like my 'or' parens. Habit maybe, because of this
situation:

if (var == "val" or var2 == "val2") and (var3 == val3 or var4 == val4):




>> It says "Used builtin function 'filter'. Using a list comprehension can be
>> clearer. (bad-builtin)"
>
> Kill that message and keep using filter.


Unfortunately, 'bad-builtin' caught 2 truly bad uses of built-ins (zip()
and id()), so I'll leave that warning in.


2.7.11 built-ins:

abs() divmod() input() open() staticmethod()
all() enumerate() int() ord() str()
any() eval() isinstance() pow() sum()
basestring() execfile() issubclass() print() super()
bin() file() iter() property() tuple()
bool() filter() len() range() type()
bytearray() float() list() raw_input() unichr()
callable() format() locals() reduce() unicode()
chr() frozenset() long() reload() vars()
classmethod() getattr() map() repr() xrange()
cmp() globals() max() reversed() zip()
compile() hasattr() memoryview() round() __import__()
complex() hash() min() set()
delattr() help() next() setattr()
dict() hex() object() slice()
dir() id() oct() sorted()


I probably would've used dict as an object name at some point, too.


Chris Angelico

unread,
May 8, 2016, 12:56:10 AM5/8/16
to
On Sun, May 8, 2016 at 2:40 PM, DFS <nos...@dfs.com> wrote:
>>> It says "Used builtin function 'filter'. Using a list comprehension can
>>> be
>>> clearer. (bad-builtin)"
>>
>>
>> Kill that message and keep using filter.
>
>
>
> Unfortunately, 'bad-builtin' caught 2 truly bad uses of built-ins (zip() and
> id()), so I'll leave that warning in.
>

Hrm, that would be called "shadowing" built-ins, not bad use of them.
Shadowing isn't usually a problem - unless you actually need id(),
there's nothing wrong with using the name id for a database key. Very
different from this message though.

ChrisA

Ian Kelly

unread,
May 8, 2016, 1:10:14 AM5/8/16
to
On Sat, May 7, 2016 at 9:28 PM, DFS <nos...@dfs.com> wrote:
> But I think there are some pylint bugs here:
> -------------------------------------------------------------------------
>
> standard import "import pyodbc, sqlite3" comes before "import pyodbc,
> sqlite3" (wrong-import-order)
>
> * complains that the line comes before itself?

I think that it actually wants you to import sqlite3 (a standard
library module) before pyodbc (a third-party module). The message is
confusing because they happen to be on the same line. PEP8 has some
advice on import ordering, which this probably follows.

>
> -------------------------------------------------------------------------
>
> standard import "import re, requests" comes before "import pyodbc, sqlite3"
> (wrong-import-order)
>
> * So I switched them, and then it complained about that:
>
> standard import "import pyodbc, sqlite3" comes before "import re, requests"
> (wrong-import-order)

Same thing. It wants the re import to come before pyodbc, and it wants
sqlite3 to come before requests.

Jussi Piitulainen

unread,
May 8, 2016, 1:50:27 AM5/8/16
to
DFS writes:

> The lists I actually use are:
>
> for j in range(len(nms)):
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>
>
> The enumerated version would be:
>
> ziplists = zip(nms,street,city,state,zipcd)
> for nm,street,city,state,zipcd in ziplists:
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
> vals = nm,street,city,state,zipcd
>
>
> I guess the enumeration() is a little nicer to look at. Why do you
> think it's more maintainable?

The following variations avoid the naming of the result of zip at all,
and also save a line or two, depending on what you actually do in the
loop, without introducing overly long lines. Judge for yourself.

You don't need to name the individual components, if you only actually
use vals:

for vals in zip(nms,street,city,state,zipcd):
cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"


Or you can opt to name the tuple and its components the other way
around:

for vals in zip(nms,street,city,state,zipcd):
nm,street,city,state,zipcd = vals

Steven D'Aprano

unread,
May 8, 2016, 7:36:40 AM5/8/16
to
On Sun, 8 May 2016 11:16 am, DFS wrote:

> address data is scraped from a website:
>
> names = tree.xpath()
> addr = tree.xpath()

Why are you scraping the data twice?

names = addr = tree.xpath()

or if you prefer the old-fashioned:

names = tree.xpath()
addr = names

but that raises the question, how can you describe the same set of data as
both "names" and "addr[esses]" and have them both be accurate?


> I want to store the data atomically,

I'm not really sure what you mean by "atomically" here. I know what *I* mean
by "atomically", which is to describe an operation which either succeeds
entirely or fails. But I don't know what you mean by it.



> so I parse street, city, state, and
> zip into their own lists.

None of which is atomic.

> "1250 Peachtree Rd, Atlanta, GA 30303
>
> street = [s.split(',')[0] for s in addr]
> city = [c.split(',')[1].strip() for c in addr]
> state = [s[-8:][:2] for s in addr]
> zipcd = [z[-5:] for z in addr]

At this point, instead of iterating over the same list four times, doing the
same thing over and over again, you should do things the old-fashioned way:

streets, cities, states, zipcodes = [], [], [], []
for word in addr:
items = word.split(',')
streets.append(items[0])
cities.append(items[1].strip())
states.append(word[-8:-2])
zipcodes.append(word[-5:])


Oh, and use better names. "street" is a single street, not a list of
streets, note plural.



--
Steven

D'Arcy J.M. Cain

unread,
May 8, 2016, 8:58:20 AM5/8/16
to
On Sun, 8 May 2016 14:21:49 +1000
Chris Angelico <ros...@gmail.com> wrote:
> if verbose:
> verbiage = print
> else:
> def verbiage(*args): pass

I have never understood why the def couldn't start on the same line as
the else:

if verbose: verbiage = print
else: def verbiage(*args): pass

The colon effectively starts a block so why not allow it?

By the way, I think you meant "def verbiage(*args, **kws): pass"

> Then, instead of "if verbose: print(var)", you would use
> "verbiage(var)". Of course, you want something better than "verbiage"
> as your name; the nature of your verbose output might give a clue as
> to what name would work.

How about "print"?

if not verbose:
def print(*args, **kws): pass

--
D'Arcy J.M. Cain
Vybe Networks Inc.
http://www.VybeNetworks.com/
IM:da...@Vex.Net VoIP: sip:da...@VybeNetworks.com

Chris Angelico

unread,
May 8, 2016, 9:02:16 AM5/8/16
to
On Sun, May 8, 2016 at 10:50 PM, D'Arcy J.M. Cain
<da...@vybenetworks.com> wrote:
> On Sun, 8 May 2016 14:21:49 +1000
> Chris Angelico <ros...@gmail.com> wrote:
>> if verbose:
>> verbiage = print
>> else:
>> def verbiage(*args): pass
>
> I have never understood why the def couldn't start on the same line as
> the else:
>
> if verbose: verbiage = print
> else: def verbiage(*args): pass
>
> The colon effectively starts a block so why not allow it?

Having two colons makes it a bit messy, so I can fully accept that
this *shouldn't* be done. Whether or not it's reasonable that it
*can't* be done is a question for the parser; but even if the parser
permitted it, I would expect style guides to advise against it.

> By the way, I think you meant "def verbiage(*args, **kws): pass"

In the general case, yes. But in this specific case, I actually prefer
not to accept keyword args in a null function; maybe permit sep=" "
and end="\n", but if someone sets file or flush, it's probably a
mistake (you most likely don't want verbiage("message", file=logfile)
to silently not do it). YMMV; maybe you want that, so yeah, toss in
the kwargs absorber.

>> Then, instead of "if verbose: print(var)", you would use
>> "verbiage(var)". Of course, you want something better than "verbiage"
>> as your name; the nature of your verbose output might give a clue as
>> to what name would work.
>
> How about "print"?
>
> if not verbose:
> def print(*args, **kws): pass

The danger of that is that it's too general. I like to recommend a
little thing called "IIDPIO debugging" - If In Doubt, Print It Out.
That means: If you have no idea what a piece of code is doing, slap in
a print() call somewhere. It'll tell you that (a) the code is actually
being executed, and (b) whatever info you put between the parens
(ideally, some key variable or parameter). Part A is often the
important bit :) The trouble with a verbose flag controlling all
print() calls is that IIDPIO debugging suddenly doesn't work; plus,
it's easy to copy and paste code to some other module and not notice
that you don't have a verbosity check at the top, and then wonder why
disabling verbose doesn't fully work. Both problems are solved by
having a dedicated spam function, which will simply error out if you
didn't set it up properly.

But again, if you know what you're doing, go for it! This is exactly
why print became a function in Py3 - so that you *can* override it.

ChrisA

Peter Otten

unread,
May 8, 2016, 10:16:58 AM5/8/16
to
Chris Angelico wrote:

> On Sun, May 8, 2016 at 1:28 PM, DFS <nos...@dfs.com> wrote:
>> Invalid constant name "cityzip" (invalid-name)
>> Invalid constant name "state" (invalid-name)
>> Invalid constant name "miles" (invalid-name)
>> Invalid constant name "store" (invalid-name)
>> Invalid variable name "rs" (invalid-name)
>
> ... huh?? The first four seem to have been incorrectly detected as
> constants. How are they used?

As globals. pylint doesn't like it when you put your normal code outside a
function, and I agree with the general idea. The problem is that it's not
smart enough to recognize the exceptions like

plus_one = make_adder(1)

where plus_one obeys the naming convention for a function, but the linter
asks you to change the line to

PLUS_ONE = make_adder(1)

In

Row = collections.namedtuple("Row", "alpha beta")

though pylint does recognize that collections.namedtuple() produces a class,
so there might also be a way to teach it how to handle the custom factory.

The OP should of course put the whole shebang into a main() function. That
also simplifies it to determine which values should be passed as arguments
when he breaks his blob into smaller parts.

Peter Otten

unread,
May 8, 2016, 10:23:00 AM5/8/16
to
DFS wrote:

> On 5/7/2016 2:52 PM, Christopher Reimer wrote:
>> On 5/7/2016 9:51 AM, DFS wrote:
>>> Has anyone ever in history gotten 10/10 from pylint for a non-trivial
>>> program?
>>
>> I routinely get 10/10 for my code. While pylint isn't perfect and
>> idiosyncratic at times, it's a useful tool to help break bad programming
>> habits. Since I came from a Java background, I had to unlearn everything
>> from Java before I could write Pythonic code. It might help to use an
>> IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE).
>>
>>> That's about as good as it's gonna get!
>>
>> You can do better.
>
> 10/10 on pylint isn't better.

Not always, but where you and pylint disagree I'm more likely to side with
the tool ;)

> It's being robotic and conforming to the
> opinions of the author of that app.

The problem are the tool's limitations, the "being robotic" rather than
following someone else's opinions.

> In fact, I think:
>
> import os, sys, time, socket
>
> is much more readable than, and preferable to,
>
> import os
> import sys
> import time
> import socket
>
> but pylint complains about the former.

Do you use version control?

>> You should strive for 10/10 whenever possible,
>
> nah
>
>
>> figure out why you fall short and ask for help on the parts that don't
>> make sense.
>
> I actually agree with ~3/4 of the suggestions it makes. My code ran
> fine before pylint tore it a new one, and it doesn't appear to run any
> better after making various fixes.

Do you write unit tests?


DFS

unread,
May 8, 2016, 10:26:20 AM5/8/16
to
On 5/8/2016 1:50 AM, Jussi Piitulainen wrote:
> DFS writes:
>
>> The lists I actually use are:
>>
>> for j in range(len(nms)):
>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>>
>>
>> The enumerated version would be:
>>
>> ziplists = zip(nms,street,city,state,zipcd)
>> for nm,street,city,state,zipcd in ziplists:
>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>> vals = nm,street,city,state,zipcd
>>
>>
>> I guess the enumeration() is a little nicer to look at. Why do you
>> think it's more maintainable?
>
> The following variations avoid the naming of the result of zip at all,
> and also save a line or two, depending on what you actually do in the
> loop, without introducing overly long lines. Judge for yourself.


I tried:

for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):

but felt it was too long and wordy.



> You don't need to name the individual components, if you only actually
> use vals:
>
> for vals in zip(nms,street,city,state,zipcd):
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"

I like that one. But I do one more thing (get a category ID) than just
use the vals.

--------------------------------------------------------------------
ziplists = zip(categories,names,streets,cities,states,zipcodes)
for category,name,street,city,state,zipcode in ziplists:
dupeRow, pyodbcErr = False, False
catID = getDataID("catID","CATEGORIES","catDesc",category)
cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?,?,?,?,?)"
vals =
datasrcID,searchID,catID,name,street,city,state,zipcode,str(loaddt)
try: db.execute(cSQL, vals)
except (pyodbc.Error) as programError:
if str(programError).find("UNIQUE constraint failed") > 0:
dupeRow = True
dupes +=1
print " * duplicate address found: "+name+", "+street
else:
pyodbcErr = True
print "ODBC error: %s " % programError
addrReturned += 1
if not dupeRow and not pyodbcErr:
addrSaved += 1
if addrWant != "all":
if addrSaved >= addrWant: break
conn.commit()
--------------------------------------------------------------------

That's the 'post to db' routine


> Or you can opt to name the tuple and its components the other way
> around:
>
> for vals in zip(nms,street,city,state,zipcd):
> nm,street,city,state,zipcd = vals
> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"


I like the first one better. python is awesome, but too many options
for doing the same thing also makes it difficult. For me, anyway.

Thanks


DFS

unread,
May 8, 2016, 10:26:46 AM5/8/16
to
On 5/7/2016 11:46 PM, Stephen Hansen wrote:
> On Sat, May 7, 2016, at 08:04 PM, DFS wrote:
>> The lists I actually use are:
>>
>> for j in range(len(nms)):
>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>>
>>
>> The enumerated version would be:
>>
>> ziplists = zip(nms,street,city,state,zipcd)
>> for nm,street,city,state,zipcd in ziplists:
>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>> vals = nm,street,city,state,zipcd
>>
>>
>> I guess the enumeration() is a little nicer to look at. Why do you
>> think it's more maintainable?
>
> Code is read more then its written.
>
> That which is nicer to look at, therefore, is easier to read.
>
> That which is easier to read is easier to maintain.
>
> Beyond that, its simpler, and more clearly articulates in the local
> space what's going on.


That last one sounds like an art critic trying to exlain why Jackson
Pollock's work doesn't suck.



>> Aside: I haven't tried, but is 'names' a bad idea or illegal for the
>> name of a python list or variable?
>
> Nothing wrong with names. Or 'name', for that matter. Try to avoid
> abbreviations.

np





Chris Angelico

unread,
May 8, 2016, 10:36:20 AM5/8/16
to
On Mon, May 9, 2016 at 12:25 AM, DFS <nos...@dfs.com> wrote:
> for category,name,street,city,state,zipcode in ziplists:
> try: db.execute(cSQL, vals)
> except (pyodbc.Error) as programError:
> if str(programError).find("UNIQUE constraint failed") > 0:
> dupeRow = True
> dupes +=1
> print " * duplicate address found: "+name+", "+street
> else:
> pyodbcErr = True
> print "ODBC error: %s " % programError
> conn.commit()
> --------------------------------------------------------------------
>

... and then you just commit???!?

ChrisA

DFS

unread,
May 8, 2016, 11:06:37 AM5/8/16
to
That's what commit() does.




Stephen Hansen

unread,
May 8, 2016, 11:11:57 AM5/8/16
to
On Sun, May 8, 2016, at 07:25 AM, DFS wrote:
> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):

> > for vals in zip(nms,street,city,state,zipcd):
> > nm,street,city,state,zipcd = vals
> > cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>
>
> I like the first one better. python is awesome, but too many options
> for doing the same thing also makes it difficult. For me, anyway.

Eeh, Now you're just making trouble for yourself.

for name, street, city, state, zipcd in zip(names, streets, cities,
states, zipcds):
....

may be sorta vaguely long, but its not that long. Just do it and move
on. Get over whatever makes you not like it.

Stephen Hansen

unread,
May 8, 2016, 11:15:51 AM5/8/16
to
On Sun, May 8, 2016, at 08:06 AM, DFS wrote:
> On 5/8/2016 10:36 AM, Chris Angelico wrote:
> > ... and then you just commit???!?
> >
>
> That's what commit() does.
>

I assure you, he knows what commit does :)

The point is, you don't usually commit after an error happens. You
rollback. Or correct the data. Since the data didn't go in, there should
(in theory) be nothing TO commit if an error happens. Or, there should
be partial data in that needs a rollback before you decide to do
something else.

Chris Angelico

unread,
May 8, 2016, 11:15:52 AM5/8/16
to
Yes. Even if you got an error part way through, you just blithely commit. What?!

And yes, I am flat-out boggling at this.

ChrisA

Steven D'Aprano

unread,
May 8, 2016, 11:51:39 AM5/8/16
to
On Mon, 9 May 2016 12:25 am, DFS wrote:

>>> for j in range(len(nms)):
>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>>> vals = nms[j],street[j],city[j],state[j],zipcd[j]

Why are you assigning cSQL to the same string over and over again?

Sure, assignments are cheap, but they're not infinitely cheap. They still
have a cost. Instead of paying that cost once, you pay it over and over
again, which adds up.

Worse, it is misleading. I had to read that code snippet three or four times
before I realised that cSQL was exactly the same each time.


> I tried:
>
> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):
>
> but felt it was too long and wordy.

It's long and wordy because you're doing something long and wordy. It is
*inherently* long and wordy to process five things, whether you write it
as:

for i in range(len(names)):
name = names[i]
street = streets[i]
city = cities[i]
state = states[i]
zipcode = zipcodes[i]
process(...)

or as:

for name, street, city, state, zipcode in zip(
names, streets, cities, states, zipcodes
):
process(...)



> I like the first one better. python is awesome, but too many options
> for doing the same thing also makes it difficult. For me, anyway.


That's the difference between a master and an apprentice. The apprentice
likes to follow fixed steps the same way each time. The master craftsman
knows her tools backwards, and can choose the right tool for the job, and
when the choice of tool really doesn't matter and you can use whatever
happens to be the closest to hand.




--
Steven

Steven D'Aprano

unread,
May 8, 2016, 1:25:28 PM5/8/16
to
On Sun, 8 May 2016 02:10 pm, DFS wrote:

> I mean I always use tab after :
>
> The program won't run otherwise. If I use spaces, 100% of the time it
> throws:
>
> IndentationError: unindent does not match any outer indentation level

Then you should be more careful about your spaces. If you indent by four
spaces, you have to outdent by four -- not three, not five, but four.

The best way to do this is to use an editor that will count the spaces for
you. Any decent programmer's editor will allow you to set the TAB key to
indent by X spaces, and the Shift-TAB key to dedent by the same amount. If
you're counting spaces yourself, you're just making more work for yourself.

Or use tabs -- that's acceptable as well.

Just don't mix tabs and spaces in the same file.


>>> +-------------------------+------------+
>>> |bad-whitespace |65 | mostly because I line up =
>>> signs:
>>> var1 = value
>>> var10 = value
>>
>> Yuck. How much time do you waste aligning assignments whenever you add or
>> delete or edit a variable?
>
> Lots. It takes hours to add or delete 3 whitespaces.

Yes, you're right. It takes you five minutes to line everything up the first
time. Then you change the name of a variable, and now you have to realign
everything -- that's an extra minute gone. Then you add another line, and
have to realign again, another couple of minutes. Over the lifespan of the
program, you'll probably have spent multiple hours wasting time realigning
blocks of assignments.


>>> +-------------------------+------------+
>>> |trailing-whitespace |59 | heh!
>>> +-------------------------+------------+
>>> |multiple-statements |23 | do this to save lines.
>>> Will continue doing it.
>>
>> Why? Do you think that there's a world shortage of newline characters? Is
>> the Enter key on your keyboard broken?
>
> I do it because I like it.
>
> if verbose: print var
>
> python doesn't complain.

Hmmm. Well, that's not too bad. I thought you mean something like:

addr = getaddress(key); addr[2] = addr.upper(); print addr

which is just horrible.




--
Steven

Larry Hudson

unread,
May 8, 2016, 4:45:18 PM5/8/16
to
On 05/08/2016 06:01 AM, Chris Angelico wrote:
[snip...]
> ... I like to recommend a
> little thing called "IIDPIO debugging" - If In Doubt, Print It Out.
> That means: If you have no idea what a piece of code is doing, slap in
> a print() call somewhere. It'll tell you that (a) the code is actually
> being executed, and (b) whatever info you put between the parens
> (ideally, some key variable or parameter)...

My personal variation of IIPPID debugging is to use input() instead of print(). For example:

input('x = {}, y = {} --> '.format(x, y))

Then the program stops at this point so you can examine the values. <Enter> will continue the
program or ^C will abort (if you see what the problem is now). Of course this can't be used in
all situations, but it's handy where it can.

Note that my personal preference is to stick that "-->" as a prompt at the end, but obviously
this (or a similar marker) is optional.

Dan Sommers

unread,
May 8, 2016, 4:53:34 PM5/8/16
to
On Sun, 08 May 2016 23:01:55 +1000, Chris Angelico wrote:

> ... I like to recommend a little thing called "IIDPIO debugging" - If
> In Doubt, Print It Out. That means: If you have no idea what a piece
> of code is doing, slap in a print() call somewhere. It'll tell you
> that (a) the code is actually being executed, and (b) whatever info
> you put between the parens (ideally, some key variable or
> parameter). Part A is often the important bit :) ...

Having spent a long time developing embedded systems, I wholeheartedly
agree. In spirit. Isn't that what the logging module is for? Fine
grained control, as centralized or distributed as is warranted, over
program output?

> ... The trouble with a verbose flag controlling all print() calls is
> that IIDPIO debugging suddenly doesn't work; plus, it's easy to copy
> and paste code to some other module and not notice that you don't have
> a verbosity check at the top, and then wonder why disabling verbose
> doesn't fully work. Both problems are solved by having a dedicated
> spam function, which will simply error out if you didn't set it up
> properly.

Hey! That sounds just like the logging module.... ;-)

Dan

DFS

unread,
May 8, 2016, 5:04:23 PM5/8/16
to
On 5/8/2016 11:51 AM, Steven D'Aprano wrote:
> On Mon, 9 May 2016 12:25 am, DFS wrote:
>
>>>> for j in range(len(nms)):
>>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>>>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>
> Why are you assigning cSQL to the same string over and over again?

I like it in cloxe proximity to the vals statement.


> Sure, assignments are cheap, but they're not infinitely cheap. They still
> have a cost. Instead of paying that cost once, you pay it over and over
> again, which adds up.

Adds up to what?



> Worse, it is misleading. I had to read that code snippet three or four times
> before I realised that cSQL was exactly the same each time.

You had to read 5 words three or four times? Seriously?



>> I tried:
>>
>> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):
>>
>> but felt it was too long and wordy.
>
> It's long and wordy because you're doing something long and wordy. It is
> *inherently* long and wordy to process five things, whether you write it
> as:
>
> for i in range(len(names)):
> name = names[i]
> street = streets[i]
> city = cities[i]
> state = states[i]
> zipcode = zipcodes[i]
> process(...)
>
> or as:
>
> for name, street, city, state, zipcode in zip(
> names, streets, cities, states, zipcodes
> ):
> process(...)


I like mine best of all:

ziplists = zip(names,streets,cities,states,zipcodes)
for name,street,city,state,zipcode in ziplists:



>> I like the first one better. python is awesome, but too many options
>> for doing the same thing also makes it difficult. For me, anyway.
>
>
> That's the difference between a master and an apprentice. The apprentice
> likes to follow fixed steps the same way each time. The master craftsman
> knows her tools backwards, and can choose the right tool for the job, and
> when the choice of tool really doesn't matter and you can use whatever
> happens to be the closest to hand.

"her tools"... you're a woman?



DFS

unread,
May 8, 2016, 5:05:42 PM5/8/16
to
On 5/7/2016 2:43 PM, Peter Pearson wrote:
> On Sat, 7 May 2016 12:51:00 -0400, DFS <nos...@dfs.com> wrote:
>> This more-anal-than-me program generated almost 2 warnings for every
>> line of code in my program. w t hey?
>
> Thank you for putting a sample of pylint output in front of my eyes;
> you inspired me to install pylint and try it out. If it teaches me even
> half as much as it's teaching you, I'll consider it a great blessing.

Cool.

I don't agree with some of them, but there's no doubt adhering to them
will result in more well-formed code.


DFS

unread,
May 8, 2016, 5:06:54 PM5/8/16
to
I'm boggling that you're boggling.




DFS

unread,
May 8, 2016, 5:17:17 PM5/8/16
to
On 5/8/2016 1:25 PM, Steven D'Aprano wrote:
> On Sun, 8 May 2016 02:10 pm, DFS wrote:


>>>> +-------------------------+------------+
>>>> |bad-whitespace |65 | mostly because I line up =
>>>> signs:
>>>> var1 = value
>>>> var10 = value
>>>
>>> Yuck. How much time do you waste aligning assignments whenever you add or
>>> delete or edit a variable?
>>
>> Lots. It takes hours to add or delete 3 whitespaces.
>
> Yes, you're right. It takes you five minutes to line everything up the first
> time. Then you change the name of a variable, and now you have to realign
> everything -- that's an extra minute gone. Then you add another line, and
> have to realign again, another couple of minutes. Over the lifespan of the
> program, you'll probably have spent multiple hours wasting time realigning
> blocks of assignments.


Do you actually believe what you just wrote?

If yes, you should quit programming.

If not, why did you say it?







>>>> +-------------------------+------------+
>>>> |trailing-whitespace |59 | heh!
>>>> +-------------------------+------------+
>>>> |multiple-statements |23 | do this to save lines.
>>>> Will continue doing it.
>>>
>>> Why? Do you think that there's a world shortage of newline characters? Is
>>> the Enter key on your keyboard broken?
>>
>> I do it because I like it.
>>
>> if verbose: print var
>>
>> python doesn't complain.
>
> Hmmm. Well, that's not too bad. I thought you mean something like:
>
> addr = getaddress(key); addr[2] = addr.upper(); print addr
>
> which is just horrible.


I was surprised to see the PEP8 guide approve of:

"Yes: if x == 4: print x, y; x, y = y, x"

https://www.python.org/dev/peps/pep-0008/#pet-peeves







DFS

unread,
May 8, 2016, 5:24:30 PM5/8/16
to
On 5/8/2016 7:36 AM, Steven D'Aprano wrote:
> On Sun, 8 May 2016 11:16 am, DFS wrote:
>
>> address data is scraped from a website:
>>
>> names = tree.xpath()
>> addr = tree.xpath()
>
> Why are you scraping the data twice?


Because it exists in 2 different sections of the document.

names = tree.xpath('//span[@class="header_text3"]/text()')
addresses = tree.xpath('//span[@class="text3"]/text()')


I thought you were a "master who knew her tools", and I was the
apprentice?

So why did "the master" think xpath() was magic?






> names = addr = tree.xpath()
>
> or if you prefer the old-fashioned:
>
> names = tree.xpath()
> addr = names
>
> but that raises the question, how can you describe the same set of data as
> both "names" and "addr[esses]" and have them both be accurate?
>
>
>> I want to store the data atomically,
>
> I'm not really sure what you mean by "atomically" here. I know what *I* mean
> by "atomically", which is to describe an operation which either succeeds
> entirely or fails.

That's atomicity.



> But I don't know what you mean by it.

http://www.databasedesign-resource.com/atomic-database-values.html



>> so I parse street, city, state, and
>> zip into their own lists.
>
> None of which is atomic.

All of which are atomic.



>> "1250 Peachtree Rd, Atlanta, GA 30303
>>
>> street = [s.split(',')[0] for s in addr]
>> city = [c.split(',')[1].strip() for c in addr]
>> state = [s[-8:][:2] for s in addr]
>> zipcd = [z[-5:] for z in addr]
>
> At this point, instead of iterating over the same list four times, doing the
> same thing over and over again, you should do things the old-fashioned way:
>
> streets, cities, states, zipcodes = [], [], [], []
> for word in addr:
> items = word.split(',')
> streets.append(items[0])
> cities.append(items[1].strip())
> states.append(word[-8:-2])
> zipcodes.append(word[-5:])



That's a good one.

Chris Angelico mentioned something like that, too, and I already put it
place.



> Oh, and use better names. "street" is a single street, not a list of
> streets, note plural.


I'll use whatever names I like.





Stephen Hansen

unread,
May 8, 2016, 5:38:43 PM5/8/16
to
On Sun, May 8, 2016, at 02:16 PM, DFS wrote:
> I was surprised to see the PEP8 guide approve of:
>
> "Yes: if x == 4: print x, y; x, y = y, x"
>
> https://www.python.org/dev/peps/pep-0008/#pet-peeves

That is not approving of that line of code as something to mimic, its
speaking *only* about *whitespace*.

ALL its saying is, "don't put spaces before commas, colons or
semicolons". You can infer nothing else about it.

Joel Goldstick

unread,
May 8, 2016, 5:39:58 PM5/8/16
to
> --
> https://mail.python.org/mailman/listinfo/python-list

Starting to look like trolling. Lots of good advice here. If you
ask, and don't like the advice, don't use it.
--
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays

DFS

unread,
May 8, 2016, 5:46:47 PM5/8/16
to
I can infer that it's 100% approved of, since he used it as an example.




Stephen Hansen

unread,
May 8, 2016, 6:05:40 PM5/8/16
to
Not if you don't want to be a fool.

And at this point I'm signing out from helping you.

Chris Angelico

unread,
May 8, 2016, 6:07:40 PM5/8/16
to
Neat technique. Not something to use *every* time (and not always
sensible - eg you don't normally want to stall out a GUI thread), but
definitely worth keeping in the arsenal.

ChrisA

Chris Angelico

unread,
May 8, 2016, 6:11:00 PM5/8/16
to
Absolutely. I say "print" in IIDPIO because it's a word that people
understand across languages, across frameworks, etc, etc, but when you
start doing more of it, the logging module is definitely superior - if
you need just one reason to use it, it would be to *leave those prints
in place* so the next person doesn't need to reach for IIDPIO at all.

(Also, I teach print() because it's one less module to explain. But
experienced programmers should get some familiarity with it.)

ChrisA

DFS

unread,
May 8, 2016, 6:24:59 PM5/8/16
to
On 5/8/2016 6:05 PM, Stephen Hansen wrote:
> On Sun, May 8, 2016, at 02:46 PM, DFS wrote:
>> On 5/8/2016 5:38 PM, Stephen Hansen wrote:
>>> On Sun, May 8, 2016, at 02:16 PM, DFS wrote:
>>>> I was surprised to see the PEP8 guide approve of:
>>>>
>>>> "Yes: if x == 4: print x, y; x, y = y, x"
>>>>
>>>> https://www.python.org/dev/peps/pep-0008/#pet-peeves
>>>
>>> That is not approving of that line of code as something to mimic, its
>>> speaking *only* about *whitespace*.
>>>
>>> ALL its saying is, "don't put spaces before commas, colons or
>>> semicolons". You can infer nothing else about it.
>>
>> I can infer that it's 100% approved of, since he used it as an example.
>
> Not if you don't want to be a fool.

Why would you label the author of that style guide - Guido van Rossum -
a fool?




> And at this point I'm signing out from helping you.


A fool and a fool are soon parted.




Gregory Ewing

unread,
May 8, 2016, 9:18:00 PM5/8/16
to
Stephen Hansen wrote:
> The point is, you don't usually commit after an error happens. You
> rollback.

He might want to commit the ones that *did* go in.
That's not necessarily wrong. It all depends on the
surrounding requirements and workflow.

--
Greg

Larry Hudson

unread,
May 8, 2016, 9:28:59 PM5/8/16
to
Agreed. As I said in my post, it is certainly not a universally valid approach, but I do find
it useful in many cases.

-=- Larry -=-

Chris Angelico

unread,
May 8, 2016, 10:18:36 PM5/8/16
to
If that's the case, they should probably be being committed part way.
Generally, once the transaction state is in error, further operations
can't be done. (At least, that's how it is with *good* database
backends. I don't know what MySQL does.)

ChrisA

DFS

unread,
May 8, 2016, 10:59:04 PM5/8/16
to
Bingo.


Steven D'Aprano

unread,
May 8, 2016, 11:09:18 PM5/8/16
to
On Mon, 9 May 2016 07:04 am, DFS wrote:

> On 5/8/2016 11:51 AM, Steven D'Aprano wrote:
>> On Mon, 9 May 2016 12:25 am, DFS wrote:
>>
>>>>> for j in range(len(nms)):
>>>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>>>>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>>
>> Why are you assigning cSQL to the same string over and over again?
>
> I like it in cloxe proximity to the vals statement.

The line immediately above the loop is in close proximity. Is that not close
enough?


>> Sure, assignments are cheap, but they're not infinitely cheap. They still
>> have a cost. Instead of paying that cost once, you pay it over and over
>> again, which adds up.
>
> Adds up to what?

Potentially a significant waste of time. You know what they say about
financial waste: "a million here, a million there, and soon we're talking
about real money".

The first point is that this is a micro-pessimisation: even though it
doesn't cost much individually, its still a needless expense that your
program keeps paying. Its like friction on your code.

Python is neither the fastest nor the slowest language available. It is
often "fast enough", but it is also an easy language to write slow code in.
If you were programming in C, the compiler would almost surely see that the
assignment was to a constant, and automatically and silently hoist it
outside the loop. That's one reason why C is so fast: it aggressively
optimizes your code. (Too aggressively, in my opinion, but that's another
story.) But the Python compiler isn't that sophisticated. It's up to us,
the programmers, to be mindful of the friction we add to our code, because
if we aren't mindful of it, we can easily end up with needlessly slow code.

But as I said, that's not the major problem with doing the assignment in the
loop. It's more about readability and your reader's expectations than the
extra time it costs.


>> Worse, it is misleading. I had to read that code snippet three or four
>> times before I realised that cSQL was exactly the same each time.
>
> You had to read 5 words three or four times? Seriously?

Yes, seriously, because when most people read code they skim it, speed
reading, looking for salient points of interest. They don't point their
finger under each word and read it aloud syllable by syllable like a
pre-schooler with reading difficulties. (At least I don't, I can't speak
for others.) So the first couple of times I glanced at it, it just looked
like any other assignment without the details registering.

Then the next couple of times I thought that it must be *me* making the
mistake, I must be reading it wrong. Maybe there's something I missed? I
read code with the default assumption that it is more or less sensible.

Over the various posts and replies to posts, I had probably glanced at that
line a dozen times, and then read it more carefully three or four times,
before I was sure I had read what I thought I had read.


[...]
> I like mine best of all:
>
> ziplists = zip(names,streets,cities,states,zipcodes)
> for name,street,city,state,zipcode in ziplists:

Using a temporary, single-use variable is okay, but it's often unnecessary.

But be aware that there is a particular risk with loops. You may be tempted
to think you can re-use ziplists to iterate over it twice:

data = zip(names, streets, cities, states, zipcodes)
for a, b, c in data:
do_stuff()
...
# later
for x, y, z in data:
do_something_else()

and that's perfectly fine, *but* there is a risk that if the for loop data
being iterated over is an iterator, it will have been exhausted by the
first loop and the second loop won't run at all.

In Python 2, zip() returns a list, but in Python 3, it returns an iterator.
So beware of using such temp variables unless you know what you're doing.


>>> I like the first one better. python is awesome, but too many options
>>> for doing the same thing also makes it difficult. For me, anyway.
>>
>>
>> That's the difference between a master and an apprentice. The apprentice
>> likes to follow fixed steps the same way each time. The master craftsman
>> knows her tools backwards, and can choose the right tool for the job, and
>> when the choice of tool really doesn't matter and you can use whatever
>> happens to be the closest to hand.
>
> "her tools"... you're a woman?

What makes you think I was talking about myself? I was talking about people
in general -- when we are beginners, its only natural that (like most
beginners) we're more comfortable with a limited amount of choice. Its hard
to remember what option to use when there's only one, let alone when
there's ten. But as we progress to mastery of the language, we'll come to
understand the subtle differences between options, when they matter, and
when they don't.

If I had said "his tools", would you have thought I was talking specifically
about a man?

Does it matter if I'm a woman? Would that make my advice better or worse?




--
Steven

Steven D'Aprano

unread,
May 8, 2016, 11:46:39 PM5/8/16
to
On Mon, 9 May 2016 07:24 am, DFS wrote:

> On 5/8/2016 7:36 AM, Steven D'Aprano wrote:
>> On Sun, 8 May 2016 11:16 am, DFS wrote:
>>
>>> address data is scraped from a website:
>>>
>>> names = tree.xpath()
>>> addr = tree.xpath()
>>
>> Why are you scraping the data twice?
>
>
> Because it exists in 2 different sections of the document.
>
> names = tree.xpath('//span[@class="header_text3"]/text()')
> addresses = tree.xpath('//span[@class="text3"]/text()')

How was I supposed to know that you were providing two different arguments
to the method? It looked like a pure-function call, which should always
return the same thing. Communication errors are on the sender, not the
receiver.

You didn't say what tree was, so I judged that xpath was some argument-less
method of tree that returned some attribute, sufficiently cleaned up.

It would be more obvious to pass a placeholder argument:

names = tree.xpath(this)
addr = tree.xpath(that)



>>> I want to store the data atomically,
>>
>> I'm not really sure what you mean by "atomically" here. I know what *I*
>> mean by "atomically", which is to describe an operation which either
>> succeeds entirely or fails.
>
> That's atomicity.

Right. And that doesn't apply to the portion of your code we're discussing.
There's no storage involved. The list comps do either succeed entirely or
fail, until you actually get to writing to the database, there's nothing
that "store the data atomically" would apply to that I saw.


[...]
>>> so I parse street, city, state, and
>>> zip into their own lists.
>>
>> None of which is atomic.
>
> All of which are atomic.

Sorry, my poor choice of words. None of which are atomic *storage*.



>> Oh, and use better names. "street" is a single street, not a list of
>> streets, note plural.
>
> I'll use whatever names I like.

*shrug*

Its your code, you can name all your variables after "Mr. Meeseeks" if you
want.

for meeseeks, Meeseeks, meeSeeks, MEEseeks in zip(
MESEEKS, meeeseeks, MeesEeks, MEEsEEkS):
mEEsEeKSS.append(meeSeeks)
...


Just don't ask others to read it.



--
Steven

DFS

unread,
May 10, 2016, 6:36:57 PM5/10/16
to
On 5/7/2016 10:50 PM, Chris Angelico wrote:
> On Sun, May 8, 2016 at 12:15 PM, DFS <nos...@dfs.com> wrote:

>> The only reason
>>
>> for j in range(len(list1)):
>> do something with list1[j], list2[j], list3[j], etc.
>>
>> or
>>
>> for item1, item2, item3 in zip(list1, list2, list3):
>> do something with the items
>>
>> works is because each list has the same number of items.
>
> Sure, but who cares what each item's position is? All that matters is
> that they have corresponding positions, which is what zip() does.

They have corresponding positions because zip() possibly truncates data!



> Imagine you don't even have the whole lists yet. Imagine someone's
> still writing stuff to them as you work. Maybe they're infinite in
> length. You can't iterate up to the length of list1, because it
> doesn't HAVE a length. But you can still zip it up with other parallel
> collections, and iterate over them all.

Disregarding a list of infinite length.


If lists are still being created:

* at every moment in time, len(list1) returns a length that doesn't
change even if data is added to the list after the call to len().

Example: If the list has 100 items in it at the point len(list) is called:

for i in range(len(list1))

will never iterate more than 100x, no matter how large list1 grows to.

Caveat: since list1 may be bigger or smaller than the other lists at
that moment in time, an error may occur when using list2[i], list3[i].


Is that all correct as you understand it?



* at every moment in time, zip(list1, list2, etc) will return a fixed,
same-length lists of tuples, which doesn't change even if data is added
to any of the lists after the call to zip().

Example: if the lists have 100, 97 and 102 items in them at the point
zip(lists) is called:

for item1, item2, item3 in zip(list1, list2, list3)

will never iterate beyond 97x, even if the lists grow while the
enumeration is occurring.

Caveat: since zip() possibly truncates lists, the results - the usage of
the data - could be completely invalid.


Is that all correct as you understand it?


So, if that's all correct, it doesn't matter whether you use 'for i in
range(len(lists))' or 'for item in zip(lists)': neither will guarantee
data integrity. I haven't timed them, but as I see it, neither has a
definite advantage over the other.

So what I decided to do was build the lists, check that the lengths are
all the same (exit the program if not), zip() them, and use enumeration
because it looks less a little less clunky. I see no advantage beyond
appearance.


MRAB

unread,
May 10, 2016, 9:02:25 PM5/10/16
to
On 2016-05-10 23:36, DFS wrote:
[snip]
>
> If lists are still being created:
>
> * at every moment in time, len(list1) returns a length that doesn't
> change even if data is added to the list after the call to len().
>
> Example: If the list has 100 items in it at the point len(list) is called:
>
> for i in range(len(list1))
>
> will never iterate more than 100x, no matter how large list1 grows to.
>
> Caveat: since list1 may be bigger or smaller than the other lists at
> that moment in time, an error may occur when using list2[i], list3[i].
>
>
> Is that all correct as you understand it?
>
Yes.
>
>
> * at every moment in time, zip(list1, list2, etc) will return a fixed,
> same-length lists of tuples, which doesn't change even if data is added
> to any of the lists after the call to zip().
>
> Example: if the lists have 100, 97 and 102 items in them at the point
> zip(lists) is called:
>
> for item1, item2, item3 in zip(list1, list2, list3)
>
> will never iterate beyond 97x, even if the lists grow while the
> enumeration is occurring.
>
> Caveat: since zip() possibly truncates lists, the results - the usage of
> the data - could be completely invalid.
>
>
> Is that all correct as you understand it?
>
In Python 2, zip iterates through the arguments immediately and returns
a list of tuples, so the answer is yes.

In Python 3, zip returns a lazy iterator (like itertools.izip in Python
2) that gets the values from the arguments _on demand_, so the answer is no.

[snip]

0 new messages