Quirks in address parser

2 views
Skip to first unread message

Tall Steve

unread,
Apr 11, 2006, 2:41:25 PM4/11/06
to byCycle
I was going through the addresses of Pedalpalooza events, trying to
make sure that as many as possible are parseable by
tripplanner.bycycle.org, and I found several addresses that can't be
resolved because the tripplanner finds multiple *IDENTICAL* matches, as
though there's some duplication in your map data:

10950 SE Division
2503 SE Division
Glisan and NE 39th Ave
1996 SE Ladd Ave
NE 22nd and Killingsworth

The following can't be resolved either. No mystery, but it would have
been nice. This is the official address of the Rose Garden.

One Center Court

Also, Google has some interesting quirks in its own map data. The map
shows front straightaway at Portland International Raceway as "N.
Cottonwood St.", and the back straightaway as "N. Vanport Rd." I knew
that PIR was built on the abandoned streets of Vanport after the flood,
but it's still kind of amusing that Google would be using such old
names.

Wyatt Baldwin

unread,
Apr 11, 2006, 3:07:51 PM4/11/06
to byC...@googlegroups.com
> I was going through the addresses of Pedalpalooza events, trying to
> make sure that as many as possible are parseable by
> tripplanner.bycycle.org, and I found several addresses that can't be
> resolved because the tripplanner finds multiple *IDENTICAL* matches, as
> though there's some duplication in your map data:

Funny that you mention it. We are working on this problem right now
(and some related issues).


> 10950 SE Division
> 2503 SE Division
> Glisan and NE 39th Ave
> 1996 SE Ladd Ave
> NE 22nd and Killingsworth

For intersection addresses, I would suggest saying, for example 2200
NE Killingsworth. The problem here is with a jog in the street where
there are technically two intersections.

For some of the postal addresses, you might try adding the zip code or
street type to help disambiguate. I found that 10950 SE Division could
be either 97216 or 97140.

Other than that, there are some cases of multiple matches that might
not be resolveable.


> The following can't be resolved either. No mystery, but it would have
> been nice. This is the official address of the Rose Garden.
>
> One Center Court

This is the other big problem with our address lookup. In this case,
the system thinks "court" is the street type instead of part of the
name. Also, it won't recognize "one" as a street number. This should
work: 1 center court st.

> Also, Google has some interesting quirks in its own map data. The map
> shows front straightaway at Portland International Raceway as "N.
> Cottonwood St.", and the back straightaway as "N. Vanport Rd." I knew
> that PIR was built on the abandoned streets of Vanport after the flood,
> but it's still kind of amusing that Google would be using such old
> names.

I've also noticed that Google's (NavTeq or whoever) data isn't
accurate in a lot of places. You can see this sometimes by getting a
route that looks like it's off on the map, but then turn on the
satellite imagery and it actually lines up. Metro's data is just about
perfect.

~wyatt

Tall Steve

unread,
Apr 13, 2006, 11:09:55 PM4/13/06
to byCycle
Whoa, the code to look up addresses is trickier than I imagined.

Do you suppose you could have a rule that, if multiple possible matches
are all within 30 meters of each other, then just use their average
location? The thing is, I found those five addresses while digging
through maybe 60 different addresses for events. That's about 8%
rejections, which seems kind of high considering *I* can find all of
them on a paper map without any significant ambiguity.

Wyatt Baldwin

unread,
Apr 13, 2006, 11:56:05 PM4/13/06
to byC...@googlegroups.com
On 4/13/06, Tall Steve <skirk...@dsl-only.net> wrote:
>
> Whoa, the code to look up addresses is trickier than I imagined.

It's about to to get trickier. We're tricking it out right now. Well,
not right at this moment, but this week, and probably next.


> Do you suppose you could have a rule that, if multiple possible matches
> are all within 30 meters of each other, then just use their average
> location?

We are working on a different solution that should get rid of most of
these types of duplicate matches, but this is an interesting fallback
idea.


~wyatt

Jack Hirt

unread,
Apr 14, 2006, 9:05:39 AM4/14/06
to byCycle

All,

I'm not sure if I missed a previous email about this or not but I also have this problem in Milwaukee. Multiple addresses are found, espically when trying to enter an intersection. Then when I try to pick one from the list the application gives you listing the multiple addresses found, the whole cycle starts over again and I'm not able to use the intersection.

Is this part of the problem that is trying to be addressed here?

Jack Hirt
The Bicycle Federation of WI


>
> From: "Tall Steve" <skirk...@dsl-only.net>
> Date: 2006/04/14 Fri AM 03:09:55 GMT
> To: "byCycle" <byC...@googlegroups.com>
> Subject: [byCycle 5] Re: Quirks in address parser
>
>
> Whoa, the code to look up addresses is trickier than I imagined.
>

> Do you suppose you could have a rule that, if multiple possible matches
> are all within 30 meters of each other, then just use their average

uuellbee

unread,
Apr 24, 2006, 7:13:55 PM4/24/06
to byCycle
We have uploaded some changes to the address parsing and geocoding code
that seems to have taken care of the issues discussed here. Please let
us know of any addresses that still aren't being parsed correctly.

Reply all
Reply to author
Forward
0 new messages