Actually, the mediawiki API isn't too bad. Lots of bots have been
by a bot. I'm not 100% sure it'll work, but it's been fun to play
around with the idea. And the mediawiki infoboxes provide
> I'm not sure if I understand your goals..
> Are you just trying to standardize the data to make it
> Or are you trying to track the changes and differences over time?
> Either way, how is the data getting updated? If it's really a person editing
> record by record, then a UI like Mediawiki might be useful. But at the end
> of the day, it's still one at a time and there's no API.
> I think the OSM angle could be useful, especially if you can connect in to
> resolve the canonical address for your (probably mangled) address. Assuming
> you can match *good enough* it would simplify a lot of things and lend
> itself to pushing back updates pretty easily. I think you could do the same
> read-based actions using Google's location searches.
> If you're trying to track changes/diffs over time.. that's a nightmare. I'm
> diving into some of that with web2project (think: project baselining &
> drifting) and it makes my head hurt. :(
> Regardless, I think unless you have a compelling reason on why *not* to use
> a standard database, you should go for it. The API is a separate
> consideration from storage anyway. If you can convince the BreweryDB guys to
> share their mindset, there's probably a lot of overlap.. in concept, not
> On 09/27/2012 05:01 PM, Tac Tacelosky wrote:
>> I have data from a bunch of different government agencies regulating
>> retail outlets. Most (but not all) of them have some sort of internal
>> identifier, but it's maddening to try to get reports with data where
>> the name or address is slightly different, and of course there's no
>> master id.
>> So I'm trying to put together a master database, with our own
>> identifier. I figure we'll go through some address standardization
>> for the first pass. It gets more complicated, though, when store
>> names change (e.g. when a business is sold, the address stays the same
>> but the name doesn't) or moves, or is added, whatever. We're talking
>> about making opening the data up under the Open Database License, and
>> I'd like to use some sort of standard tool.
>> MediaWiki came to mind first. It'd be a great UI, and would allow
>> people to update the site with specific information like if the store
>> was no longer in business. But the majority of the data updates would
>> come from merging lists from various agencies. On the huge plus side,
>> we have an API for access and updating, and a built-in history tool.
>> OpenStreetMap was my second idea. Many of the same benefits, but a
>> bit more awkward to work with. The huge advantage is that the data
>> relevant to OSM we get from the agencies could be pushed back to the
>> OSM database. But I'm not sure how to handle our project-specific
>> data (e.g. violations) which is clearly of no interest to OSM.
>> I'm leaning toward MediaWiki, and was wondering if anyone had any
>> experience with managing what is more often stored in a relational
>> database. I'm trying to avoid writing a full site for this, but it's
>> tempting to start with name, address, phone, etc., but then I'd have
>> to manage all the history and wiki nature of this. So I'm looking for
>> a combination of the free-flow wiki-style data and a structured
>> I worked on something a while ago where we just enforced headers,
>> which roughly mapped to each field. But that felt a bit hackish.
>> It's more like I want a form within a mediawiki page.
>> We will write some plugins for linking to data that comes directly
>> from a database, like the violations themselves, but the common data
>> is what I'm thinking about now.
>> Any pointers?
> D. Keith Casey, Jr.