Hi all,
I've gone ahead and switched the script to use
sws.geonames.org, so
that should hopefully get picked up in tomorrow's run:
https://github.com/ryanfb/pleiades-plus/commit/00fe86995a40de72176f83951cd1d8304965ddc0
Leif also asked that I repost some information about the reusability
of the Pleiades Plus script on this thread:
In its current state it will probably work best for aligning
gazetteers which, like GeoNames, have machine-actionable geo
coordinates associated with names in some way. For some databases it
might be necessary to perform a first pass against it that adds
approximate location based on some other criteria (country, region,
etc.) before using the pleiades-plus logic to align them. This is
mostly just to filter down results to likely candidates, as otherwise
in any ambiguous cases you'll get all permutations of name match
combinations. Thinking about it now, a tool for doing such first-pass
processing of hierarchically-organized place resources might be
generally useful and a nice separation of concerns for the tooling (we
have other databases we'd like to align against Pleiades that have the
same problem of having their own project-specific geographic
hierarchical organization without explicit geo coordinates).
It would also be good to add some logic to pleiades-plus that goes
beyond exact string match for finding candidates, which would probably
help with GeoNames alignment as well. I'm actually at the Code4Lib
conference this week and attended a session on OpenRefine yesterday,
which got me thinking about the potential for both Pleiades and
GeoNames etc. reconciliation services for OpenRefine. One nice thing
about this is OpenRefine already has some string similarity and
clustering built in. I also wonder if there might be some potential
for a general geospatial processing extension for OpenRefine (for e.g.
spatial operations). I got a very initial first pass at a Pleiades
name reconciliation service working which I've gone ahead and put up
here:
https://github.com/ryanfb/reconciliation_service_skeleton
Best,
-Ryan
On Mon, Mar 24, 2014 at 4:54 PM, Leif Isaksen <
lei...@googlemail.com> wrote:
> Fantastic, thanks Hugh!
>
> @Ryan, I think Hugh is right here. I guess this should be a two-letter tweak
> to your script and the cron-job will do the rest?
>
> @Hugh can sameAs.org be updated on a regular basis? Ryan's script is
> intended to run as a nightly job so that new entries to Pleiades and
> Geonames are included. If that complicates things, are there things that
> would make it easier?
>
> @Rainer, if Hugh does this for similar alignments, I wonder if your
> gazetteer alignment tool could draw from SameAs.org directly, rather than
> locating and parsing individual alignment files?
>
> @Everyone else - as Tom suggests, it would be great to hear about similar
> alignment activity, or even just requests for specific alignments
> (PastPlace? TGN? TMGeo? Ordnance Survey?). In many cases, Ryan's work may
> get us most of the way there already.
>
> All the best
>
> L.
>
>
>
>
> On Mon, Mar 24, 2014 at 4:40 PM, Hugh Glaser <
hugh....@gmail.com> wrote:
>>