Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[OT] Map of email origins to Python list

0 views
Skip to first unread message

Claire McLister

unread,
Nov 7, 2005, 12:21:54 PM11/7/05
to pytho...@python.org
We've been working with Google Maps, and have created a web service to
map origins of emails to a group. As a trial, we've developed a map of
emails to this group at:

http://www.zeesource.net/maps/map.do?group=668

This represents emails sent to the group since October 27.

Would like to hear what you think of it.

Thanks for listening.

Claire

--
Claire McLister                       mcli...@zeesource.net
1684 Nightingale Avenue     Suite 201
Sunnyvale, CA 94087        408-733-2737(fax)

http://www.zeemaps.com

Paul McGuire

unread,
Nov 7, 2005, 12:55:47 PM11/7/05
to
"Claire McLister" <mcli...@zeesource.net> wrote in message
news:mailman.221.11313841...@python.org...

We've been working with Google Maps, and have created a web service to
map origins of emails to a group. As a trial, we've developed a map of
emails to this group at:

http://www.zeesource.net/maps/map.do?group=668

This represents emails sent to the group since October 27.

Would like to hear what you think of it.

------------------------------

<sigh>
Another sleepless camera pointed at the fishbowl that is my online life.

I guess it's a great way to find where there might be Python jobs to be
found, or at least kindred souls (or dissident Python posters in countries
where Internet activity is closely monitored...)

To me, it's either cool in a creepy sort of way, or creepy in a cool sort of
way.

-- Paul

Rocco Moretti

unread,
Nov 7, 2005, 1:23:15 PM11/7/05
to
Paul McGuire wrote:
> "Claire McLister" <mcli...@zeesource.net> wrote in message
> news:mailman.221.11313841...@python.org...
> We've been working with Google Maps, and have created a web service to
> map origins of emails to a group. As a trial, we've developed a map of
> emails to this group at:
>
> http://www.zeesource.net/maps/map.do?group=668
>
> This represents emails sent to the group since October 27.
>
> Would like to hear what you think of it.
> ------------------------------
>
> <sigh>
> Another sleepless camera pointed at the fishbowl that is my online life.
>

It's also a testament to the limited value of physically locating people
by internet addresses - If you zoom in on the San Fransico bay area, and
click on the southern most bubble (south of San Jose), you'll see the
entry for the Mountain View postal code (94043) - a massive list which
contains mostly gmail.com accounts, but also contains accounts with .de
.ca .uk .pl .it .tw and .za domains. I doubt all of the people in that
list live in sunny California, let alone in Mountain View proper.

Steve Holden

unread,
Nov 7, 2005, 1:55:29 PM11/7/05
to pytho...@python.org
Claire McLister wrote:
> We've been working with Google Maps, and have created a web service to
> map origins of emails to a group. As a trial, we've developed a map of
> emails to this group at:
>
> http://www.zeesource.net/maps/map.do?group=668
>
> This represents emails sent to the group since October 27.
>
> Would like to hear what you think of it.
>
> Thanks for listening.
>
Mostly I wonder what the point is. For example, given my own somewhat
nomadic life I wondered what location has been used to map my own
contributions.

Examination of the maps source reveals you used the code

_m = createMarker(new GPoint(-82.775497, 40.3736),
'red', "<table width='300px'><tr><th colspan='2'
align='middle'>Ohio, United States</th></tr><tr><th colspan='2'
align='left'><a title='click to change'
href='editform.do?group=668&item=21364'>s...@holdenweb.com</a></th></tr></table>");
_m.city = '';
_m.country = 'United States';
_m.name = ' s...@holdenweb.com';
marar.push(_m);

to generate my reference. This has never been correct (I am not sure
I've ever been to Ohio) and it certainly isn't now (since I moved
continents recently).

Nonetheless I'm sure that before long these maps will be used to prove
some spurious facts about newsgroup readership to gullible members of
the business community.

If I've got you wrong then please forgive my slight hostility, but I am
particularly suspicious of the "click to change" functionality. You are
clearly expecting people to update their locations, and other
information that might be related to their domains, and I can't help
wondering what purposes that information is intended for.

Finally, considering email address from domains like "verizon.net",
"aol.com" and other large ISPs I can't see that you have a chance in
hell of extracting useful demographics. Which all leads me back to
"what's the point?" - just because you can?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

mensa...@aol.com

unread,
Nov 7, 2005, 2:06:05 PM11/7/05
to

North of that bubble is a second massive list also labeled Mountain
View
94043. I found my name on that list and I live in the Chicago area.
Moutain View is, perhaps, where aol.com is located? These bubbles are
showing the location of the server that's registered under the domain
name?

Rocco Moretti

unread,
Nov 7, 2005, 2:43:56 PM11/7/05
to
mensa...@aol.com wrote:

> Rocco Moretti wrote:
>
>>It's also a testament to the limited value of physically locating people
>>by internet addresses - If you zoom in on the San Fransico bay area, and
>>click on the southern most bubble (south of San Jose), you'll see the
>>entry for the Mountain View postal code (94043) - a massive list which
>>contains mostly gmail.com accounts, but also contains accounts with .de
>>.ca .uk .pl .it .tw and .za domains. I doubt all of the people in that
>>list live in sunny California, let alone in Mountain View proper.
>
>
> North of that bubble is a second massive list also labeled Mountain
> View
> 94043. I found my name on that list and I live in the Chicago area.
> Moutain View is, perhaps, where aol.com is located? These bubbles are
> showing the location of the server that's registered under the domain
> name?

Actually, it looks like they are the *same* list. I haven't gone through
all of the names, but I spot checked a few, and it looks like yours,
among others, are listed in both spots. (The southern one looks like it
is a mislocated duplicate, as it is nowhere close to Mountain View, and
is stuck in the middle of a golf course.)

Robert Kern

unread,
Nov 7, 2005, 3:21:55 PM11/7/05
to pytho...@python.org
mensa...@aol.com wrote:

> North of that bubble is a second massive list also labeled Mountain
> View
> 94043. I found my name on that list and I live in the Chicago area.
> Moutain View is, perhaps, where aol.com is located? These bubbles are
> showing the location of the server that's registered under the domain
> name?

Most of AOL's offices are in Dulles, VA. Google's headquarters are in
Mountain View, CA.

--
Robert Kern
rk...@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

mensa...@aol.com

unread,
Nov 7, 2005, 3:56:03 PM11/7/05
to

Robert Kern wrote:
> mensa...@aol.com wrote:
>
> > North of that bubble is a second massive list also labeled Mountain
> > View
> > 94043. I found my name on that list and I live in the Chicago area.
> > Moutain View is, perhaps, where aol.com is located? These bubbles are
> > showing the location of the server that's registered under the domain
> > name?
>
> Most of AOL's offices are in Dulles, VA. Google's headquarters are in
> Mountain View, CA.

Aha, I post to the usenet through Google. Makes the map application
all the more stupid, doesn't it?

Alan Kennedy

unread,
Nov 7, 2005, 4:05:46 PM11/7/05
to
[Robert Kern]

>>Most of AOL's offices are in Dulles, VA. Google's headquarters are in
>>Mountain View, CA.

[mensa...@aol.com]


> Aha, I post to the usenet through Google. Makes the map application
> all the more stupid, doesn't it?

Actually, no, because Google Groups sets the NNTP-Posting-Host header to
the IP address from which the user connected to Google. So your post to
which I'm replying came from IP address "68.73.244.37", which reverses
to "adsl-68-73-244-37.dsl.chcgil.ameritech.net".

http://groups.google.com/group/comp.lang.python/msg/ca06957210fe12ae?dmode=source

So presumably "chcgil" indicates you're in Chicago, Illinois?

Although I do have to point out that the map makes it appear as if I've
been busy posting from all over Dublin's Southside, which, as anyone who
has seen "The Commitments" can attest, is a deep insult a born-and-bred
Northsider such as myself ;-)

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jorge Godoy

unread,
Nov 7, 2005, 4:11:11 PM11/7/05
to
Claire McLister <mcli...@zeesource.net> writes:

> We've been working with Google Maps, and have created a web service to map
> origins of emails to a group. As a trial, we've developed a map of emails to
> this group at:
>
> http://www.zeesource.net/maps/map.do?group=668
>
> This represents emails sent to the group since October 27.
>
> Would like to hear what you think of it.

Hmmmm... I don't see mine listed there: I'm in South America, Brasil. More
specifically in Curitiba, ParanĂ¡, Brasil. :-)

--
Jorge Godoy <go...@ieee.org>

mensa...@aol.com

unread,
Nov 7, 2005, 4:53:59 PM11/7/05
to

Alan Kennedy wrote:
> [Robert Kern]
> >>Most of AOL's offices are in Dulles, VA. Google's headquarters are in
> >>Mountain View, CA.
>
> [mensa...@aol.com]
> > Aha, I post to the usenet through Google. Makes the map application
> > all the more stupid, doesn't it?
>
> Actually, no, because Google Groups sets the NNTP-Posting-Host header to
> the IP address from which the user connected to Google. So your post to
> which I'm replying came from IP address "68.73.244.37", which reverses
> to "adsl-68-73-244-37.dsl.chcgil.ameritech.net".
>
> http://groups.google.com/group/comp.lang.python/msg/ca06957210fe12ae?dmode=source
>
> So presumably "chcgil" indicates you're in Chicago, Illinois?

Yes, but why, then, is my name logged into Mountain View, CA?

That justifies my claim of "all the more stupid", doesn't it?

George Sakkis

unread,
Nov 7, 2005, 5:20:05 PM11/7/05
to
"Jorge Godoy" <go...@ieee.org>:

> Hmmmm... I don't see mine listed there: I'm in South America, Brasil. More
> specifically in Curitiba, ParanĂ¡, Brasil. :-)

That's funny; I was looking for mine and I stumbled across yours at
Piscataway, NJ, US. :-)

George

Claire McLister

unread,
Nov 7, 2005, 5:26:41 PM11/7/05
to pytho...@python.org
On Nov 7, 2005, at 9:55 AM, Paul McGuire wrote:

> I guess it's a great way to find where there might be Python jobs to be
> found, or at least kindred souls (or dissident Python posters in
> countries
> where Internet activity is closely monitored...)

Possibly. But there are so many in-accuracies, that this is possibly a
guide at best.

>
> To me, it's either cool in a creepy sort of way, or creepy in a cool
> sort of
> way.
>

An interesting perspective. Not to increase your sense of 'creepy', but
a lot of big corporations now have access to this kind of information
and more.


Alan Kennedy

unread,
Nov 7, 2005, 5:32:51 PM11/7/05
to
[Alan Kennedy]

>>So presumably "chcgil" indicates you're in Chicago, Illinois?

[mensa...@aol.com]


> Yes, but why, then, is my name logged into Mountain View, CA?

Presumably the creators of the map have chosen to use a mechanism other
than NNTP-Posting-Host IP address to geolocate posters.

Claire, what mechanism did you use?

> That justifies my claim of "all the more stupid", doesn't it?

Well, to me it just says that the map creation software has some bugs
that need fixing.

Claire McLister

unread,
Nov 7, 2005, 5:33:20 PM11/7/05
to Rocco Moretti, pytho...@python.org
On Nov 7, 2005, at 10:23 AM, Rocco Moretti wrote:

> It's also a testament to the limited value of physically locating
> people
> by internet addresses - If you zoom in on the San Fransico bay area,
> and
> click on the southern most bubble (south of San Jose), you'll see the
> entry for the Mountain View postal code (94043) - a massive list which
> contains mostly gmail.com accounts, but also contains accounts with .de
> .ca .uk .pl .it .tw and .za domains. I doubt all of the people in that
> list live in sunny California, let alone in Mountain View proper.
>

Indeed, locating people from IP is not that easy or correct. We are,
however, not trying to suggest that we can find people's locations this
way. We are just trying to pin-point the origins of emails to a group.

The flaw that you point out is due to problems in our approach of how
we find the 'origin' IP.

We try to get a best guess estimate of the originating IP and its
location. If we cannot find that, we fall back on the earliest server
that has a location information. Clearly this marks quite a few email
origins in the wrong way. It doesn't do the collection of ALL gmail
addresses this way, however. If you do a filter on 'gmail' in the
'Name' filter, you'll see a lot of gmail addresses all over the world.
So, we need to do a better job of guessing the originating IP, and not
try to go too far forward.

Jorge Godoy

unread,
Nov 7, 2005, 5:36:02 PM11/7/05
to
"George Sakkis" <gsa...@rutgers.edu> writes:

Phew! Thanks for finding me. I was feeling a bit lost... :-)


Be seeing you,
--
Jorge Godoy <go...@ieee.org>

Claire McLister

unread,
Nov 7, 2005, 5:42:36 PM11/7/05
to Steve Holden, pytho...@python.org
On Nov 7, 2005, at 10:55 AM, Steve Holden wrote:

> Mostly I wonder what the point is. For example, given my own somewhat
> nomadic life I wondered what location has been used to map my own
> contributions.

Just for fun, really. We try to a best job of mapping the IP location
closest to the origins of your email. Again, we are not saying that
this is your location. All we are saying is that that particular email
seems to have originated from that location.

> Nonetheless I'm sure that before long these maps will be used to prove
> some spurious facts about newsgroup readership to gullible members of
> the business community.

Well, we really don't know what the maps tell us. If they say something
interesting, then it doesn't hurt to tell more people about it.

> If I've got you wrong then please forgive my slight hostility, but I am
> particularly suspicious of the "click to change" functionality. You are
> clearly expecting people to update their locations, and other
> information that might be related to their domains, and I can't help
> wondering what purposes that information is intended for.

Sorry, I should have made it clear why we are doing this. We have a
service that allows people to mark their group of people or places on a
Google Map. So, the 'click to change' is for those maps, and really
not meant to be used for these email list maps. It's just that as we
were building those maps we saw some people offering services of IP to
locations, and thought wouldn't it be interesting to find out where
emails are coming from to various Open Source projects.

> Finally, considering email address from domains like "verizon.net",
> "aol.com" and other large ISPs I can't see that you have a chance in
> hell of extracting useful demographics. Which all leads me back to
> "what's the point?" - just because you can?

See my previous response. Sometimes we still can get better origins
than just where the bulk server is. Yes, really, it was just an
exercise to see what comes out of it.

Claire McLister

unread,
Nov 7, 2005, 5:50:15 PM11/7/05
to Alan Kennedy, pytho...@python.org
Thanks, Alan. You are absolutely right, we are not using the
NNTP-Posting-Host header for obtaining the IP address.

The Python list is unique among the lists that we have handled so far,
in that it has a cross-posting mechanism with a net news. Hence, it
seems we are getting many more wrong locations here than any other
email list maps we've done so far. We've done them for Linux kernel,
postresql, apache, tomcat, etc. You can find them by searching their
names in the 'find' box. Not many people reported wrong locations on
those maps.

So, we'll have to go back and fix the script that is extracting the IP
address (which is written in Python, btw). Let me know if someone is
interested in taking a look at it and I can post it somewhere.

> --
> http://mail.python.org/mailman/listinfo/python-list

Neil Hodgson

unread,
Nov 7, 2005, 6:13:02 PM11/7/05
to
Claire McLister:

> We try to get a best guess estimate of the originating IP and its
> location. If we cannot find that, we fall back on the earliest server
> that has a location information. Clearly this marks quite a few email
> origins in the wrong way. It doesn't do the collection of ALL gmail
> addresses this way, however. If you do a filter on 'gmail' in the 'Name'
> filter, you'll see a lot of gmail addresses all over the world. So, we
> need to do a better job of guessing the originating IP, and not try to
> go too far forward.

The points are labelled with the email address which won't always be
the account posted from. I'm listed in both Sydney (correct) and
Melbourne with my gmail account (actually a subaddress,
nyamatong...@gmail.com, only used for news posting) but I post to
comp.lang.python through Thunderbird on my local machine through my
ISP's news server. I expect the marked locations are for the ISP's news
hubs. Gmail only comes into the picture when I'm sent spam in response
to a post.

Multiple locations for gmail doesn't imply discovery of real origins
of traffic through gmail.

Neil

Alan Kennedy

unread,
Nov 7, 2005, 6:26:50 PM11/7/05
to
[Claire McLister]

> Thanks, Alan. You are absolutely right, we are not using the
> NNTP-Posting-Host header for obtaining the IP address.

Aha, that would explain the lack of precision in many cases. A lot of
posters in this list/group go through NNTP (either with an NNTP client
or through NNTP-aware services like Google Groups) which should give
very good results, when available.

> So, we'll have to go back and fix the script that is extracting the IP
> address (which is written in Python, btw).

What better language to write in :-)

> Let me know if someone is
> interested in taking a look at it and I can post it somewhere.

Sure, please do make it available, or at least the geolocation component
anyway. I'm sure you'll get lots of useful comments from the many clever
and experienced folk who frequent this group.

Don't be aggrieved at the negative comment you've received: I think what
you're doing is fascinating.

But don't forget that a lot of people are not aware that this kind of
geolocation can be done, along with the many other inferences that can
be drawn from message and browser headers. So don't be surprised if some
of them try to "shoot the messenger".

I look forward to the map with updated precision :-)

Mike Meyer

unread,
Nov 7, 2005, 6:40:32 PM11/7/05
to
Claire McLister <mcli...@zeesource.net> writes:
> Thanks, Alan. You are absolutely right, we are not using the
> NNTP-Posting-Host header for obtaining the IP address.

Yes, but what are you using?

> The Python list is unique among the lists that we have handled so far,
> in that it has a cross-posting mechanism with a net news. Hence, it
> seems we are getting many more wrong locations here than any other
> email list maps we've done so far. We've done them for Linux kernel,
> postresql, apache, tomcat, etc. You can find them by searching their
> names in the 'find' box. Not many people reported wrong locations on
> those maps.

Hmm. Are you using a different method than you used for the mail
lists? Because my mail and news follows the same path, using the same
host name. The only difference is that my ISP uses supernews.com news
servers, so my postings appear to go direct from my domain to
supernews - but the only place this shows up is in the Path: header.

For the record - I (and my servers) are in Virginia, the domain name I
use is registered to an address in Oklahoma, and everything is relayed
through my ISP in Berkeley. Your map has me in San Francisco. Ok, you
nearly got my ISPs hardware.

> So, we'll have to go back and fix the script that is extracting the IP
> address (which is written in Python, btw). Let me know if someone is
> interested in taking a look at it and I can post it somewhere.

What IP address it is extracting? Well, if you post it, I'll look at
it and figure it out from that.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Mike Meyer

unread,
Nov 7, 2005, 6:41:48 PM11/7/05
to
Claire McLister <mcli...@zeesource.net> writes:
> An interesting perspective. Not to increase your sense of 'creepy',
> but a lot of big corporations now have access to this kind of
> information and more.

You mean my creditors are going to be looking for me in San Francisco,
even though I'm in Virginia? Cool.

Claire McLister

unread,
Nov 7, 2005, 9:43:21 PM11/7/05
to Alan Kennedy, pytho...@python.org
On Nov 7, 2005, at 3:26 PM, Alan Kennedy wrote:

> Sure, please do make it available, or at least the geolocation
> component
> anyway. I'm sure you'll get lots of useful comments from the many
> clever
> and experienced folk who frequent this group.
>

I've made the script available on our downloads page at:

http://www.zeesource.net/downloads/e2i

Let me know if you have any trouble accessing it. Sorry to disappoint,
but we actually use a commercial service to convert from IP to
location. There are several of them available on the net, and we picked
this one after some testing. I think it has some location problems
(like putting Mountain View south of San Jose in California), but
otherwise seemed to be one of the better ones available.

> Don't be aggrieved at the negative comment you've received: I think
> what
> you're doing is fascinating.

Thanks.

> I look forward to the map with updated precision :-)

Me too. Please let me know how we should modify the script.

Alan Kennedy

unread,
Nov 8, 2005, 3:52:28 PM11/8/05
to
[Claire McLister]

> I've made the script available on our downloads page at:
>
> http://www.zeesource.net/downloads/e2i

[Alan Kennedy]


>> I look forward to the map with updated precision :-)

[Claire McLister]


> Me too. Please let me know how we should modify the script.

Having examined your script, I'm not entirely sure what your input
source is, so I'm assuming it's an mbox file of the archives from
python-list, e.g. as appears on this page

http://mail.python.org/pipermail/python-list/

or at this URL

http://mail.python.org/pipermail/python-list/2005-November.txt

Those messages are the email versions, so all of the NNTP headers, e.g.
NNTP-Posting-Host, will have been dropped. You will need these in order
to get the geographic location of posts that have been made through NNTP.

In order to be able to get those headers, you need somehow to get the
NNTP originals of messages that originated on UseNet. You can see an
example of the format, i.e. your message to which I am replying, at this URL

http://groups.google.com/group/comp.lang.python/msg/56e3baabcd4498f2?dmode=source

The NNTP-Posting-Host for that message is '194.109.207.14', which
reverses to 'bag.python.org', which is presumably the machine that
gatewayed the message from python-list onto comp.lang.python.

So there are a couple of different approaches

1. Get an archive of the UseNet postings to comp.lang.python (anybody
know where?)
A: messages sent through email will have the NNTP-Posting-Host as
a machine at python.org, so fall back to your original algorithm for
those messages
B: messages sent through UseNet, or a web gateway to same, will have an
NNTP-Posting-Host elsewhere than python.org, so do your geo-lookup
on that IP address.

2. Get the python-list archive
A: Figure out which messages came through the python.org NNTP gateway
(not sure offhand if this is possible). Automate a query to Google
groups to find the NNTP-Posting-Host (using a URL like the one
above). Requires being able to map the python-list message-id to the
google groups message-id. Do your geo-lookup on that
NNTP-Posting-Host value
B: Use your original algorithm for messages sent through email.

2A message-id lookup should be achievable through the advanced google
groups search, at this URL

http://groups.google.com/advanced_search?q=&

See the "Lookup the message with message ID" at the bottom.

Sorry I don't have time to supply code for any of this. Perhaps some one
can add more details, or better still some code?

Tom Anderson

unread,
Nov 9, 2005, 1:42:44 PM11/9/05
to
On Mon, 7 Nov 2005, Claire McLister wrote:

> We've been working with Google Maps, and have created a web service to map
> origins of emails to a group.

Top stuff! The misses are, if anything, more interesting than the hits!

I, apparently, am in Norwich. I have been to Norwich a few times, and, in
fact, i think i've walked along the very street where i'm supposedly
located, but i don't think i've ever posted news from there. I read this
group via an SSH connection from my office (in north central London) or
home (in north-east inner London), or elsewhere, to a shell account on
urchin.earth.li, a machine colocated at an ISP (in Docklands, London),
which peers at three POPs (probably also in Docklands, London).

The domain earth.li, in which the machine lives, however, was registered
by someone who gives their address as being in Norwich, which i guess is
where that comes from.

What it doesn't explain is why Sion Arrowsmith is also down as being in
Norwich - i don't know Sion from Eve, but based on the fact that she's a
chiark.greenend.org.uk user, i'd guess she's in Cambridge. Now, chiark has
no links to Norwich that i can see, but it is also colocated at the same
ISP as urchin (chiark and urchin are sort of mirror images of each other
in many ways) - is this a case of 'Norwich by association'?

tom

--
Exceptions say, there was a problem. Someone must deal with it. If you
won't deal with it, I'll find someone who will.

Michael

unread,
Nov 9, 2005, 12:37:05 PM11/9/05
to
Paul McGuire wrote:

As long as it gets my location WRONG, I'm happy.

:-|


Michael.

0 new messages