Re: [ccc-gistemp] Python 3

6 views
Skip to first unread message

David Jones

unread,
Oct 29, 2010, 10:27:03 AM10/29/10
to ccc-giste...@googlegroups.com
On 29 October 2010 15:16, John Keyes <john...@gmail.com> wrote:
>
> I tried with Python 3.2a2 initially but noticed it won't run due to
> syntax errors.

Ah yes, Python 3.

I haven't done any Python 3 development yet (though I do generally
read the release notes).

Have you any feel for how difficult it would be to convert to Python
3? This, by the way, is a long way off, it's just that it would be
nice to plan ahead a bit. I'm loathe to do "from __future__ import
print_function" at the moment because that doesn't start working until
Python 2.6, and currently we still work with 2.4 and 2.5.

Cheers,
drj

John Keyes

unread,
Oct 29, 2010, 1:31:12 PM10/29/10
to ccc-giste...@googlegroups.com
I've done some mucking about here with the code. Most of the changes
are to do with byte vs string operations, and casting some floats back
to ints (I haven't investigated why this is the case). The results are
incorrect: http://goo.gl/SSTu

I'll email the diff to show the changes later.

-John

> --
> You received this message because you are subscribed to the Google Groups "CCC GISTEMP discussion" group.
> To post to this group, send email to ccc-giste...@googlegroups.com.
> To unsubscribe from this group, send email to ccc-gistemp-dis...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/ccc-gistemp-discuss?hl=en.
>
>

John Keyes

unread,
Oct 29, 2010, 8:17:31 PM10/29/10
to ccc-giste...@googlegroups.com
Actually, there was a small bug in fetch.py that meant the inputs were invalid.

So with that bug fixed here are the results.

Python 2.6.4 results - I changed the color to blue in the URL:

http://img.skitch.com/20101030-qtf8igej1ab8p1b4hd7ekys12j.png

Python 3.2a2 results:

http://img.skitch.com/20101030-khnf9nyp7xck1c2ksbsm3iuk7w.png

Laying the 2.6.4 results on top of the 3.2a2 results.

http://img.skitch.com/20101030-f6p4ap9aq3q1rqrrutbk636je6.png

Laying the 3.2a2 results on top of the 2.6.4 results.

http://img.skitch.com/20101030-bbhybj5nidpx3p73kprhpmejyc.png

So they are close but not identical. What will I do with the changes I've made?

-John

John Keyes

unread,
Oct 29, 2010, 10:20:13 PM10/29/10
to ccc-giste...@googlegroups.com
What's the best way to compare the results? Checking graph vs graph
isn't very useful.

Thanks,
-John

David Jones

unread,
Oct 31, 2010, 3:20:27 PM10/31/10
to ccc-giste...@googlegroups.com
On 30 October 2010 01:17, John Keyes <john...@gmail.com> wrote:
> Actually, there was a small bug in fetch.py that meant the inputs were invalid.
>
> So with that bug fixed here are the results.
>
> Python 2.6.4 results - I changed the color to blue in the URL:
>
> http://img.skitch.com/20101030-qtf8igej1ab8p1b4hd7ekys12j.png
>
> Python 3.2a2 results:
>
> http://img.skitch.com/20101030-khnf9nyp7xck1c2ksbsm3iuk7w.png
>
> Laying the 2.6.4 results on top of the 3.2a2 results.
>
> http://img.skitch.com/20101030-f6p4ap9aq3q1rqrrutbk636je6.png

[tiny differences]

Interesting. It's known (by me), that the exact results are sensitive
to orderings that are not well specified, for example, the order in
which items are returned by a dictionary iterator. Probably those
change between Python 2.6 and Python 3, and perhaps that is
responsible for the difference. I haven't done an investigation of
the sensitivity (yet).

> Laying the 3.2a2 results on top of the 2.6.4 results.
>
> http://img.skitch.com/20101030-bbhybj5nidpx3p73kprhpmejyc.png
>
> So they are close but not identical. What will I do with the changes I've made?

Let's create a branch on Monday and we can put the changes there.

drj

David Jones

unread,
Oct 31, 2010, 3:22:00 PM10/31/10
to ccc-giste...@googlegroups.com
On 30 October 2010 03:20, John Keyes <john...@gmail.com> wrote:
> What's the best way to compare the results? Checking graph vs graph
> isn't very useful.

tool/compare.py will provide a more detailed comparison of two result/
directories, it's buggy in the 0.6.1 release, but I fixed it just
after, in svn: http://code.google.com/p/ccc-gistemp/source/detail?r=596

drj

John Keyes

unread,
Oct 31, 2010, 3:27:22 PM10/31/10
to ccc-giste...@googlegroups.com
> tool/compare.py will provide a more detailed comparison of two result/
> directories, it's buggy in the 0.6.1 release, but I fixed it just
> after, in svn: http://code.google.com/p/ccc-gistemp/source/detail?r=596

I'll give this a spin soon.

-John
>

John Keyes

unread,
Oct 31, 2010, 5:49:56 PM10/31/10
to ccc-giste...@googlegroups.com
Results attached.

I tried to run this tool as Python 3 but there was a decode utf8 issue
which I didn't spend any time looking into. I just dropped the latest
version into the 0.6.1 release and ran with Python 2.6.4.

-John

3vs2.html

David Jones

unread,
Nov 1, 2010, 4:20:17 AM11/1/10
to ccc-giste...@googlegroups.com
On 31 October 2010 21:49, John Keyes <john...@gmail.com> wrote:
> Results attached.

Ah splendid. Except that all the images URLs shortened so they don't
work (for me!); shame, because the images would've worked without
being shorted (they're all google chart tools URLs).

However, we can see from the textual reports and tables that the
differences for hemispheres and at monthly resolution are also tiny.

>
> I tried to run this tool as Python 3 but there was a decode utf8 issue
> which I didn't spend any time looking into. I just dropped the latest
> version into the 0.6.1 release and ran with Python 2.6.4.

Yes, About 5 minutes after sending my previous e-mail I thought that
compare.py probably wouldn't work with Python 3.

drj

David Jones

unread,
Nov 1, 2010, 4:33:51 AM11/1/10
to ccc-giste...@googlegroups.com

John, I've added you as a committer to the googlecode SVN repository.

Are you okay with creating a branch and committing your changes there?
Or would you like me to do it?

New policy: branches will be called: branch/YYYY-MM-DD/name/ (for what
it's worth, this is how we name branches in Ravenbrook).

Let's call this one branch/2010-11-01/python3/

This should create the branch:
svn cp https://ccc-gistemp.googlecode.com/svn/trunk/
https://ccc-gistemp.googlecode.com/svn/branch/2010-11-01/python3/
(use "creating development branch for Python 3" or somesuch as the
submit message)

and then you can checkout that branch somewhere and copy our changes
onto it, then submit that.

Let me know if that's unclear or you need me to do something. Perhaps
we can IM?
drj

John Keyes

unread,
Nov 1, 2010, 5:33:36 AM11/1/10
to ccc-giste...@googlegroups.com
I've created the branch. I'll get the changes in there later today.

-John

John Keyes

unread,
Nov 1, 2010, 7:27:36 PM11/1/10
to ccc-giste...@googlegroups.com

John Keyes

unread,
Nov 1, 2010, 7:46:50 PM11/1/10
to ccc-giste...@googlegroups.com
> Ah splendid.  Except that all the images URLs shortened so they don't
> work (for me!); shame, because the images would've worked without
> being shorted (they're all google chart tools URLs).

Must be a Gmail thing that.

I've written a short blog post on the branch and linked to the results
which I have hosted on my server if you want to see the charts too.

http://keyes.ie/ccc-gistemp-python-3-branch/

-John

David Jones

unread,
Nov 2, 2010, 5:17:08 AM11/2/10
to ccc-giste...@googlegroups.com

Excellent, thanks for that.

I looked at the diff (tedious notes in the appendix), and it basically
looks fine. I'm encouraged by 2to3. Thanks for doing this little
investigation John.

We (I) should probably try and eliminate the things that require
fixing up by hand, so that we can "just" run 2to3 and at least that
will work. Judging from the diff, that's:
- not using "long" or "list" as a variable name;
- using // instead of / when I know that I want integer division;
- those things with strings.

Cheers,
drj

From code/eqarea.py:
220 220 z = math.sin(lat)
221 221 c = math.cos(lat)
222 222 long = i[1]*math.pi/180
223 - x = math.cos(long) * c
224 - y = math.sin(long) * c
223 + x = math.cos(int) * c
224 + y = math.sin(int) * c
225 225 return (x,y,z)

This is immediately alarming. Yes, it's not a good idea for me to
call a local variable "long", but 2to3 has changed my variable "long"
to "int". Can't see how this would ever work (it won't, it's in code
that isn't routinely called). Will probably change the variable
"long" to "lon" or "longitude". I guess it's one of those things
where it never occurred to me that "long" was a Python builtin. :)

There is a similiar thing in fetch.py.

Obviously 2to3 changes map(...) to list(map(...)) and similarly for
zip. Some of these turn out to be necessary, some do not. I often
use map or zip when I could've used itertools.imap and itertools.izip
and I don't care which I get.

From code/step2.py:

176 176 annual_anoms = []
177 177 first = None
178 - for y in range(len(series)/12):
178 + for y in range(int(len(series)/12)):
179 179 # Seasons are Dec-Feb, Mar-May, Jun-Aug, Sep-Nov.
180 180 # (Dec from previous year).

I guess this was a case where you inserted "int()" by hand? I should
probably use "//12" instead of "/12". Then that will be identical in
Python 3. There are probably a few other cases where "I know" that
the division is integer (browsing through the code makes me wonder if
these are all of the form "/12", that would be amusing). Conversely
there are a few cases where I deliberately infect an arithmetic
operation with a float, and those won't be necessary in Python 3.

I note that 2to3 does some aggressive replacement of map(lambda ...)
with list comprehension.

John Keyes

unread,
Nov 2, 2010, 7:09:04 AM11/2/10
to ccc-giste...@googlegroups.com
Comments inline.

> I looked at the diff (tedious notes in the appendix), and it basically
> looks fine.  I'm encouraged by 2to3.  Thanks for doing this little
> investigation John.

You're welcome.

> We (I) should probably try and eliminate the things that require
> fixing up by hand, so that we can "just" run 2to3 and at least that
> will work.  Judging from the diff, that's:
> - not using "long" or "list" as a variable name;
> - using // instead of / when I know that I want integer division;
> - those things with strings.

Yeah I was thinking about this last night when I was in bed (sad I
know). The best thing is to make the 2.x code easy to translate to
3.x. I didn't ask why I was making changes, I just wanted to get it
running and then let you have a look at the diff.

> From code/eqarea.py:
> 220     220                   z = math.sin(lat)
> 221     221                   c = math.cos(lat)
> 222     222                   long = i[1]*math.pi/180
> 223             -             x = math.cos(long) * c
> 224             -             y = math.sin(long) * c
> 223     +             x = math.cos(int) * c
> 224     +             y = math.sin(int) * c
> 225     225                   return (x,y,z)
>
> This is immediately alarming.  Yes, it's not a good idea for me to
> call a local variable "long", but 2to3 has changed my variable "long"
> to "int".  Can't see how this would ever work (it won't, it's in code
> that isn't routinely called).  Will probably change the variable
> "long" to "lon" or "longitude".  I guess it's one of those things
> where it never occurred to me that "long" was a Python builtin. :)

A simple change.

> There is a similiar thing in fetch.py.
>
> Obviously 2to3 changes map(...) to list(map(...)) and similarly for
> zip.  Some of these turn out to be necessary, some do not.  I often
> use map or zip when I could've used itertools.imap and itertools.izip
> and I don't care which I get.
>
> From code/step2.py:
>
> 176     176                 annual_anoms = []
> 177     177                 first = None
> 178             -           for y in range(len(series)/12):
> 178     +           for y in range(int(len(series)/12)):
> 179     179                     # Seasons are Dec-Feb, Mar-May, Jun-Aug, Sep-Nov.
> 180     180                     # (Dec from previous year).
>
> I guess this was a case where you inserted "int()" by hand?  I should
> probably use "//12" instead of "/12".  Then that will be identical in
> Python 3.  There are probably a few other cases where "I know" that
> the division is integer (browsing through the code makes me wonder if
> these are all of the form "/12", that would be amusing).  Conversely
> there are a few cases where I deliberately infect an arithmetic
> operation with a float, and those won't be necessary in Python 3.

Yeap this was me. Again I just wanted it to work. Error message was
TypeError int needed.

> I note that 2to3 does some aggressive replacement of map(lambda ...)
> with list comprehension.

It does. I think list comprehension is clearer code than map(lambda
...) so this is something that could be changed in the 2.x code if you
agree with that.

I'll haven't examined the diff in detail yet, but I will.

What's the definitive way to check if a change has broken anything? I
think some validation tests are required.

-John

David Jones

unread,
Nov 2, 2010, 11:14:47 AM11/2/10
to ccc-giste...@googlegroups.com
On 2 November 2010 11:09, John Keyes <john...@gmail.com> wrote:
>
>> We (I) should probably try and eliminate the things that require
>> fixing up by hand, so that we can "just" run 2to3 and at least that
>> will work.  Judging from the diff, that's:
>> - not using "long" or "list" as a variable name;
>> - using // instead of / when I know that I want integer division;
>> - those things with strings.
>
> Yeah I was thinking about this last night when I was in bed (sad I
> know). The best thing is to make the 2.x code easy to translate to
> 3.x. I didn't ask why I was making changes, I just wanted to get it
> running and then let you have a look at the diff.

Yes, I totally understand, and it's a good approach.

>>
>> I guess this was a case where you inserted "int()" by hand?  I should
>> probably use "//12" instead of "/12".  Then that will be identical in
>> Python 3.  There are probably a few other cases where "I know" that
>> the division is integer (browsing through the code makes me wonder if
>> these are all of the form "/12", that would be amusing).  Conversely
>> there are a few cases where I deliberately infect an arithmetic
>> operation with a float, and those won't be necessary in Python 3.
>
> Yeap this was me. Again I just wanted it to work. Error message was
> TypeError int needed.

Again, totally understand, for the experiment of getting it working in
2to3, it doesn't really matter which one you did.

>> I note that 2to3 does some aggressive replacement of map(lambda ...)
>> with list comprehension.
>
> It does. I think list comprehension is clearer code than map(lambda
> ...) so this is something that could be changed in the 2.x code if you
> agree with that.

Well, I guess Guido agrees with you. As for the output of 2to3, I
think sometimes the list comprehension is clearer, sometimes the map
is. But when we do switch over and use 2to3, I'm not going to cry
about all those maps turning into list comprehensions. Conversely,
I'm not going to switch them all in the 2.x code either.

>
> What's the definitive way to check if a change has broken anything? I
> think some validation tests are required.

There isn't really any and this is a bug.

Conceptually it's actually quite a tricky problem. It depends if you
make a change that you expect to change the answer or not. The
published result, the two hemispheric and global temperature series
printed to 2 decimal places, depends on all sorts of things that
aren't particularly well specified (the exact result of trigonometric
functions, whether you used 32-bit or 64-bit floats, the order that
items came out of a dict iterator); it doesn't depend on these things
very much, but a little bit (for example, it's easy to show that
changing the order of items in get_longest() in step1.py will change
the result by a tiny bit).

So the published result can change for legitimate reasons: example, we
changed a dict to a set. I would like to lock these down, but that's
a different matter.

Conversely, genuine bugs can change the result by less than some of
the above stations. For example, accidentally dropping a few stations
(see Issue 84) can go unnoticed. It's definitely a bug, but one that
hardly affects the result.

Investigation tiny changes in the result (such as the ones from 2to3)
is usually a matter of poring over log files and determining that
nothing suspicious is going on. It's not easy.

drj

Reply all
Reply to author
Forward
0 new messages