Re: Gaz Data Model

4 views
Skip to first unread message

Shekhar Krishnan

unread,
Oct 28, 2012, 12:31:45 AM10/28/12
to nypl-ga...@googlegroups.com
Folks:

This is in reply to the request for the Gazetteer database and schema/model.

See the first database release for the Library of Congress gazetteer at:

http://topomancy.com/gazetteer/release/20121008.zip

The ZIP file is about 1.3 GB in size, and contains data in compressed
JSON format for 12 million unconflated places between OpenStreetMap and
Geonames, including full geometries and alternate names.

The data is broken into 2,552 separate files for direct import into
ElasticSearch. The archive also includes the database schema, a
semi-automated import script, installation documentation, and sample
queries.

We'll be building the NYPL Gazetteer around this schema, but on an urban
scale for buildings and similar features. See this link from Matt for
the data sources which will comprise the NYPL Gaz:

https://docs.google.com/a/topomancy.com/folder/d/0B-Ng8P9yTR2Ub3lpRFF3T1RjbU0/edit

We discussed the Labs team helping on ETL (extract transform load)
scripts to get these variegated data sources into a preliminary NYC
Gazetteer db for our work this week.

Best,


Shekhar


On Tuesday 23 October 2012 11:37 AM, David Riordan wrote:
> Hey guys - would it be possible for us to get a copy of the latest
> gazetteer data model?
>
>
> David Riordan | Product Manager, NYPL Labs | @NYPL_Labs
> <http://twitter.com/nypl_labs>
> davidr...@nypl.org <mailto:davidr...@nypl.org> | m 203.521.1222 |
> @riordan <http://twitter.com/riordan>
>

--

Shekhar Krishnan
Topomancy LLC
48, Whipple Street, #1B
Brooklyn, NY 11206, U.S.A.

http://shekhar.cc

Schuyler Erle

unread,
Oct 28, 2012, 2:51:13 AM10/28/12
to Shekhar Krishnan, nypl-ga...@googlegroups.com

And we'll have a new version of this ready on Monday, along with access to the Git repo for everyone I have a key for.



Shekhar Krishnan <she...@topomancy.com> wrote:

SDE

--
Sent over a 300 baud modem from a rotary dial payphone.

Matt Knutzen

unread,
Oct 28, 2012, 6:22:39 PM10/28/12
to Shekhar Krishnan, nypl-ga...@googlegroups.com
Nypl officially closed tomorrow.
> --
>
>

Schuyler Erle

unread,
Oct 28, 2012, 6:33:30 PM10/28/12
to Matt Knutzen, Shekhar Krishnan, nypl-ga...@googlegroups.com

Nevertheless, if the power stays on, we still hope to have a ticket tracker up, give everyone access to the code repo, and have a new database release by Tuesday.

I'd still like to discuss in further detail what we hope to accomplish after the tide waters recede and the dove comes back with the olive branch.



Matt Knutzen <mattk...@gmail.com> wrote:
Nypl officially closed tomorrow. 

On Oct 28, 2012, at 12:31 AM, Shekhar Krishnan <she...@topomancy.com> wrote:

Folks:

This is in reply to the request for the Gazetteer database and schema/model.

See the first database release for the Library of Congress gazetteer at:

http://topomancy.com/gazetteer/release/20121008.zip

The ZIP file is about 1.3 GB in size, and contains data in compressed JSON format for 12 million unconflated places between OpenStreetMap and Geonames, including full geometries and alternate names.

The data is broken into 2,552 separate files for direct import into ElasticSearch. The archive also includes the database schema, a semi-automated import script, installation documentation, and sample queries.

We'll be building the NYPL Gazetteer around this schema, but on an urban scale for buildings and similar features. See this link from Matt for the data sources which will comprise the NYPL Gaz:

https://docs.google.com/a/topomancy.com/folder/d/0B-Ng8P9yTR2Ub3lpRFF3T1RjbU0/edit

We discussed the Labs team helping on ETL (extract transform load) scripts to get these variegated data sources into a preliminary NYC Gazetteer db for our work this week.

Best,


Shekhar


On Tuesday 23 October 2012 11:37 AM, David Riordan wrote:
Hey guys - would it be possible for us to get a copy of the latest
gazetteer data model?


David Riordan | Product Manager, NYPL Labs | @NYPL_Labs


--

Shekhar Krishnan
Topomancy LLC
48, Whipple Street, #1B
Brooklyn, NY 11206, U.S.A.

http://shekhar.cc

--
Reply all
Reply to author
Forward
0 new messages