City Correction

20 views
Skip to first unread message

tim.d...@seisan.com

unread,
May 12, 2009, 2:18:58 PM5/12/09
to jgeocoder-users
I've run into the issue of the
AddressStandardizer.normalizeParsedAddress call is "correcting" the
city. This data seems to be out of date or incorrect. Vine Grove KY
40175 keeps correcting to Custer KY 40175. Can this be disable or
overridden. Or can we have access to the data source for this
information to update and correct it.

Ryan

unread,
May 13, 2009, 11:56:28 AM5/13/09
to jgeocoder-users
I ran into this problem a while ago:

http://groups.google.com/group/jgeocoder-users/browse_thread/thread/6bf83657218be3c7?hl=en

Vine Grove is just a small example of the underlying issue.

This postal aliasing is actually very useful for geocoding, because it
allows non-city things (like a neighborhood name) to turn into
something that is found in the Tiger DB. However, it's hell if you
use the normalized version for your parsing. Either use the city from
the parsed version or if you can write Java, it's pretty easy to move
the normalization step out of the AddressStandardizer and into the
space in-between the normalization and the geocoding. Or at least I
think it is, my version is very hacked up right now. But that's one
of the first things I did.

BTW, you can actually modify the postal alias data file pretty easily,
it's a flat file in the resources directory, unlike the Berkeley DB or
the Tiger data. But that's a slippery slope that I would avoid.

Ryan Levering

On May 12, 2:18 pm, "tim.den...@seisan.com" <tim.den...@seisan.com>
wrote:

tim.d...@seisan.com

unread,
May 13, 2009, 12:07:13 PM5/13/09
to jgeocoder-users
Ya, I found he flat file. The problem is I'm only using jgeocode to
parse the address string. I need to check an override database before
sending the geocode to the mapquest geocoder. I just really wanted it
to standardize things like: Change Road to RD, Drive to DR, West to W,
etc. The major issue is fixing cities is not standardization it's
correction. The thing that is weird is that the file doesn't follow
the USPS standard, I know tiger data is limited but it's usually close
the USPS zip results.

Jay Liang

unread,
May 13, 2009, 1:15:49 PM5/13/09
to jgeocod...@googlegroups.com
I am sure at this point the data files are quite out-dated, unfortunately I dont have time to keep the data up to date any longer, but since you have the source codes, you can see where the alias correction happens, it's at

http://jgeocoder.svn.sourceforge.net/viewvc/jgeocoder/jgeocoder/src/main/java/net/sourceforge/jgeocoder/us/AddressStandardizer.java?revision=234
  175   private static String resolveCityAlias(String city, String state){
176 return AliasResolver.resolveCityAlias(city, state);
177 }

you are free to update the data file, see http://bend-ing.blogspot.com/2008/06/us-city-alias-and-string-interning.html for how it was generated
or you can simply disable the alias name correction by making the function returns the original city input (require recompile)

lastly, different applications have different address parsing requirements, jgeocoder is not at a point where you can plug and play, you may need to pick it apart and use the bits and pieces that are useful in your apps.

Ryan

unread,
May 14, 2009, 7:23:08 AM5/14/09
to jgeocoder-users
The only "standardization" that is does on the city field is a St ->
Saint replace, so you can just use the parsed city field if you don't
want to mess with the code.

The postal file is also missing several hundred (maybe several
thousand? I forget exactly) zip codes when I ran it against a
different paid zip code DB that I have, if you want another reason to
avoid the aliasing. They are probably new zip codes or something,
since we didn't hit them in our data for a while.

On May 13, 12:07 pm, "tim.den...@seisan.com" <tim.den...@seisan.com>
wrote:
Reply all
Reply to author
Forward
0 new messages