We're now preparing to put our place data in an exportable form and I
wanted to get feedback from the tech community on what would be the
most preferred format. The issue is that it's fairly sizable currently
in the 100GB range, and growing. By the nature of what we're doing,
we'll have real-time updates in the future, but wanted to start a
little simpler right now. I was thinking a weekly full dump, and then
ongoing daily deltas. Also complicating the matter in terms of
exporting is that we're always adding to the types of structured data
we collect so we don't want our changing schema to disrupt our
partners.
Formats we were considering include:
1) CSV - always a good standby
2) XML - flexible but you pay the cost in added size
3) Native DB formats - but which one?
4) AWS snapshots - we use AWS so always a possibility
5) Others?
I would really appreciate any feedback if you have a preference...
Thank you!
-Grant
+++++++++++++++++++++++
A bit about our data:
+ We'll start publishing about 20 million places in over 90 countries,
we see this growing to around 100 million. We'll have complete
coverage of US and Canada and we're now working on UK, Australia,
Scandinavia and India.
+ We're language agnostic and can publish data in our supported
languages (currently, we support English, Spanish, French, Finnish,
Norwegian, Dutch and Polish) -- and this will grow as more users
translate
+ We collect the basics like place name, address, contact information,
website, lat / long
+ We collect very place specific data depending on the place type.
(for a restaurant, this includes type of cuisine, payment types
accepted, etc.)
+ We're in the process of verifying the places and adding more
information. Some places are mere stubs, and others are well built
out.
Our mission is to have the cleanest data on every physical place in
the world. We're building an army of users who compete against each
other to add places as soon as they open, remove places that have gone
out of business, and to add more data to places as we need.
--
Subscription settings:
http://groups.google.com/group/geoplacematch/subscribe?hl=en