hi
Better late than never... moving countries does take a while :)
I've finally put aside some time and deployed my geoparser online at
http://digitalnzgeoparser.tripodtravel.co.nz. A Google like search
screen is presented on the front page. Upon entering a term a search
is performed against the Digital NZ search API, and if not cached
metadata records are retrieved one by patient one from those search
results. This can result in slow page times, as a search returning 10
uncached entries will hit the API 11 times (the search API does not
return the full metadata which I require to extract geographical
placenames).
As an example (cached) search try 'railway station', a search term
independent of any geographical location so as to provide a decent
geographical spread of results (ie searching for Hataitai would
produce lots of results in the same area).
http://digitalnzgeoparser.tripodtravel.co.nz/archive_searches/search?q=railway+station
The initial page returns some images from Flickr and the Alexander
Turnbull collection. Clicking on a title of a result will redirect
the browser to the source web page, and clicking on 'Metadata' will
render a page containing the original image with all of the extracted
text I could find from the given metadata record. Where things get
more interesting is viewing the metadata for records that have been
geoparsed, so click on 'Map' on the first 2 results:
http://digitalnzgeoparser.tripodtravel.co.nz/natlib_metadatas/64803/map
http://digitalnzgeoparser.tripodtravel.co.nz/natlib_metadatas/86312/map
These metadata pages render a google map using placenames extracted
from the text.
To see more (currently) geoparsed results click on the following:
- 'Images' under Category (a search result page containing 20 images
will be rendered)
- 'Alexander Turnbull Library' under creator
The search results rendered are now 'Images from the Alexander
Turnbull Collection for the search term "railway station"' -
http://digitalnzgeoparser.tripodtravel.co.nz/archive_searches/search?q=railway%20station&page=1&f[]=21&f[]=10
The first couple of result pages have maps associated with the records.
Note that ideally I'd like the facet URLs ultimately to look more like this:
http://digitalnzgeoparser.tripodtravel.co.nz/search/railway+station/category/images/provider/alexander+turnbull+collection
Caveats:
======
1) Slow - due to hitting the search API multiple times for uncached
searches expect slow response times.
2) Geoparsing is too slow to run live on a live web search. The
hosting provider also prevents RAM hungry batch jobs and does not have
memcache, so I am running a cron from my laptop directly to the
production database. Geoparsing is taking roughly a minute per
national library record at the moment, whereas locally 10 seconds is
more the norm locally (currently trying to fix). As such if you
search for a place it may be day before you see maps for it.
3) No check is made for the Digital NZ search API failing (ie Digital
NZ search server is down or my key has gone over the limit for the
day)
4) I've allowed up to 50,000 facet results to be returned for any one
field. The main problem here is that the creator list can be very
long, see
http://digitalnzgeoparser.tripodtravel.co.nz/archive_searches/search?q=cricket&page=1&f[]=10
for an example. Also, see
http://groups.google.co.nz/group/digitalnz/browse_thread/thread/ae01eacfacfbd9a4?pli=1
for previous comments I have made about this.
Please feel free to have a play around and search for the likes of
where you live, where you have been on holiday etc as this will give
the cached metadata records a better geographical spread than by just
typing in random search terms myself. I'll keep the geoparsing
running on my laptop in order to get more records mapped.
Suggestions / comments welcome
Cheers
Gordon