Vladimir,
Mix'n'Match looks great, thanks for the heads-up! Just looking at it now, so I'm not sure how it works, or what the downloaded data looks like, but I'll be exploring that shortly, for sure. I'll get back to you re: schema:description in a separate response.
As for what (and how) I'm doing with photographer names:
I have, for about 12 years now, been collecting, editing, and researching photographers' biographies. I started with an old telnet database hosted by the George Eastman House. That db had ~93,000 biographies. I was able to copy and past the Name, Nationality & Dates of all 93,000 (20 entries at a time!) into a spreadsheet very shortly before that database permanently went offline. There was a lot of duplication, and a lot of entries with too little info to trifle with, and I ended up with a core set of about 65,000 names. (The GEH data has since resurfaced in
this db, run independently by the former editors from GEH. It's an invaluable set of info which I use daily, but in a sadly outdated database).
Over the years, I've continued to refine and grow my photographer biographies by checking my list (no longer a spreadsheet, but in our TMS database) against various authorities, both print and online, and researches in censuses, city directories, &c. While the scope covers the entire history (and some pre-history) of photography (ca. 1820s-present), the bulk are 19th to mid 20th century photographers. Called PIC (the Photographers' Identities Catalog), I hope to get it properly online this year, but I don't have any specific projections, as I'm at the mercy of our much in-demand programmers. In the meantime, here's a map and a bit more background on a
sad little google site I made. I am now working with a programmer who is helping with a more sophisticated map, the goal being that you could ask for female daguerreotypists in Kansas, for instance, then click through those points to fuller biographical entries.
Our TMS database is shared between the Prints and Photographs departments and is used for our cataloging of our collections, and so it contains many more names than just my PIC names (not just printmakers, but donors, subjects, etc.). So the risk of duplication is high. I use a lot of sql queries to spot dupes, merge or delete them. And frankly, I do a whole lot of spreadsheet work to clean and normalize data, and use the vlookup function to compare & merge data from various sources. Not particularly hi-tech but it has served me well. But now I'm trying to reconcile about 220,000 names in our database to the entirety of ULAN. I've gotten about 20,000 exact matches, but now I need to do some fuzzy matching to find more. I've been trying to use Open Refine with
Reconcile-CSV, but it is (VERY SLOWLY) exploding my poor computer.