Awesome - you beat me to playing around with Placemaker. It looks to
be giving some good results. With the things I've been playing around
services over Googles, but I still need to do further work. Thanks for
> hi
> This is still a work in progress but I thought others might be interested.
> I've been having a look at geoparsing a small portion of the corpus of
> digital nz metadata to see if it relevant text can be extracted and
> geographic locations extracted. I think I might have found a way forward...
> Initially (and this was code from a previous project a year ago) I tried
> parsing the text and throwing strings of consecutive words starting with
> capitals at the google geocoder (e.g. "Wellington High School, Wellington").
> However this seemed to produce noisy results, especially when the google
> geocoder return locations for the likes of "View", "Photographer" and other
> random strings especially at the start of sentences. The main problem I had
> was in isolating a geographical region with reasonably precision without the
> noise getting in the way.
> Other than random words that seemed innocuous producing geographical
> results, less innocuous ones also caused issues - as an example the string
> "Looking along Oriental Parade, Wellington. Taken by Sydney Charles Smith
> circa 1912" would end up matching Sydney in Australia which is not
> exactly desirable. As such it was time to look for other APIs to attempt to
> filter out non geographic text prior to geoparsing.
> Open Calais (http://www.opencalais.com/) was the first API to come to the
> rescue. It is provided by Reuters and attempts to extract metadata such as
> politicians, organisations, people and many other categories from submitted
> text alone. With this I was able to identify the likes of "Bank of New
> Zealand" as an organisation (thus avoid 'Bank' geomatching somewhere the
> USA) and our friend Sydney Charles Smith as a person, and thus exclude them
> from geo parsing queries.
> Still not quite there yet though :) I also came across Yahoo Placemaker (http://developer.yahoo.com/geo/placemaker/) a couple of nights ago and it
> seems to do a good job of identifying a region for a piece of text, the main
> issue I was having. It was still matching Sydney Charles Smith as Sydney,
> and indeed appears to be case insensitive as 'sydney charles smith' also
> matched the Australian city. I instead opted to remove such text completely
> using Open Calais and started to see much more accurate results.
> Currently I have ruby scripts that do the following:
> - Download a natlib record by id and squirrel the metadata in a database on
> my laptop to avoid a second hit against the API
> - Create a metadata string for geoparsing by adding title, description,
> coverage, subject and placenames together
> - Use open calais to remove organisations and people names (there may be
> others worth removing) and thus avoid noise
> - Use Yahoo Placemaker to get a list of regions / towns / suburbs, as well
> as geographic extending (lat/lon bounding box)
> The next step (yet to be done) is to use my previous attempts at parsing
> using the Google Geocoder but use the regional information and bounding box
> provided by the Yahoo API to filter out noise.
> Appended below is some example output from my scripts - comments welcome :)
> Cheers
> Gordon
> ORIGINAL METATEXT FOR RECORD:75636
> ====
> Electric tram in Invercargill
> Electric tram in Invercargill (destination: Georgetown), circa 1912. It
> bears an advertisement for Dominion Tread Tires. Photographer unidentified.
> Invercargill City
> 1912
> Trams - New Zealand - Southland Region
> FILTERING
> ========
> CALAIS TAGS:
> ---
> Position:
> - Photographer
> City:
> - Georgetown
> - Invercargill City
> Country:
> - New Zealand
> Relations:
> - ""
> TEXT FOR YAHOO
> ======
> Electric tram in Invercargill
> Electric tram in Invercargill (destination: Georgetown), circa 1912. It
> bears an advertisement for Dominion Tread Tires. Photographer unidentified.
> Invercargill City
> 1912
> Trams - New Zealand - Southland Region
> TEXT FOR GOOGLE
> =======
> Electric tram in Invercargill
> Electric tram in Invercargill (destination: Georgetown), circa 1912. It
> bears an advertisement for Dominion Tread Tires. Photographer unidentified.
> Invercargill City
> 1912
> Trams - New Zealand - Southland Region
> RESULTS
> LOC:2348887 Invercargill, Southland, NZ Town -46.4118 168.352 WEIGHT=1
> CONFIDENCE=10
> LOC:15021750 Southland, NZ State -45.4649 167.853 WEIGHT=1 CONFIDENCE=10
> LOC:28644394 Georgetown, Invercargill, Southland, NZ Suburb -46.4201 168.369
> WEIGHT=1 CONFIDENCE=10
> LOC:55875899 Invercargill City, Southland, NZ County -46.474 168.369
> WEIGHT=1 CONFIDENCE=10
> EXTENTS
> SW: -47.2911, 166.427
> NE: -44.2551, 169.279
> GEO SCOPE
> Invercargill City, Southland, NZ
> ADMIN SCOPE
> Invercargill City, Southland, NZ
> ORIGINAL METATEXT FOR RECORD:63489
> ====
> Part 2 of a 2 part panorama of Wellington Public Hospital, Newtown,
> Wellington
> Part 2 of a 2 part panorama of Wellington Public Hospital, Newtown,
> Wellington, taken in 1910 by Sydney Charles Smith.
> Newtown
> 1910
> Hospitals - New Zealand - Wellington Region
> Wellington Hospital
> FILTERING
> ========
> CALAIS TAGS:
> ---
> Person:
> - Sydney Charles Smith
> City:
> - Newtown
> - Wellington
> Country:
> - New Zealand
> Relations:
> - ""
> Organization:
> - Wellington Public Hospital
> *REMOVING FROM GEO SEARCH:Sydney Charles Smith*
> *REMOVING FROM GEO SEARCH:Wellington Public Hospital*
> TEXT FOR YAHOO
> ======
> Part 2 of a 2 part panorama of , Newtown, Wellington
> Part 2 of a 2 part panorama of , Newtown, Wellington, taken in 1910 by .
> Newtown
> 1910
> Hospitals - New Zealand - Wellington Region
> Wellington Hospital
> TEXT FOR GOOGLE
> =======
> Part 2 of a 2 part panorama of wellington public hospital, Newtown,
> Wellington
> Part 2 of a 2 part panorama of wellington public hospital, Newtown,
> Wellington, taken in 1910 by sydney charles smith.
> Newtown
> 1910
> Hospitals - New Zealand - Wellington Region
> Wellington Hospital
> RESULTS
> LOC:2351310 Wellington, Wellington, NZ Town -41.2805 174.767 WEIGHT=1
> CONFIDENCE=6
> LOC:22726472 Newtown, Wellington, Wellington, NZ Suburb -41.3142 174.779
> WEIGHT=1 CONFIDENCE=10
> EXTENTS
> SW: -41.3491, 174.694
> NE: -41.1882, 174.847
> GEO SCOPE
> Wellington, Wellington, NZ
> ADMIN SCOPE
> Wellington, Wellington, NZ
> ORIGINAL METATEXT FOR RECORD:26868
> ====
> Bowling Green Invercargill F G R 6824
> Showing the exterior of the Invercargill Bowling Club in Yarrow Street,
> between Doon and Elles Road
> Southland Region (N.Z.)
> Southland Region (N.Z.)
> Invercargill
> Bowling Clubs
> Yarrow Street
> FILTERING
> ========
> CALAIS TAGS:
> ---
> Facility:
> - Elles Road
> - Yarrow Street
> Relations:
> - ""
> Organization:
> - Invercargill Bowling Club
> REMOVING FROM GEO SEARCH:Invercargill Bowling Club
> TEXT FOR YAHOO
> ======
> Bowling Green Invercargill F G R 6824
> Showing the exterior of the in Yarrow Street, between Doon and Elles Road
> Southland Region (N.Z.)
> Southland Region (N.Z.)
> Invercargill
> Bowling Clubs
> Yarrow Street
> TEXT FOR GOOGLE
> =======
> Bowling Green Invercargill F G R 6824
> Showing the exterior of the invercargill bowling club in Yarrow Street,
> between Doon and Elles Road
> Southland Region (N.Z.)
> Southland Region (N.Z.)
> Invercargill
> Bowling Clubs
> Yarrow Street
> RESULTS
> LOC:2348887 Invercargill, Southland, NZ Town -46.4118 168.352 WEIGHT=1
> CONFIDENCE=7
> LOC:2367481 Bowling Green, KY, US Town 36.9946 -86.4456 WEIGHT=1
> CONFIDENCE=3
> LOC:2392943 Doon, IA, US Town 43.2782 -96.2359 WEIGHT=1 CONFIDENCE=5
> LOC:15021750 Southland, NZ State -45.4649 167.853 WEIGHT=1 CONFIDENCE=7
> LOC:23424916 New Zealand Country -43.5877 170.367 WEIGHT=1 CONFIDENCE=7
> EXTENTS
> SW: -52.6171, -96.2436
> NE: 43.2865, 169.279
> GEO SCOPE
> Pacific/Auckland, ZZ
> ADMIN SCOPE
> New Zealand