This is just to let you know that I've refreshed the discogs dataset
based on the July data dump[1].
* I've fixed the missing void description, so homepage now works:
http://discogs.dataincubator.org
* Fixed the rdfs:comment typo
* Fixed the broken dbpedia links
* Added links to bbc music based on simple co-reference via myspace links
* Reworked the conversion to use RDF.rb. Combined with what looks like
fixes in the dumps, I think this has addressed some of the previous
encoding issues. I suspect there are others. Bug reports welcome!
* I've also migrated the code to github here:
http://github.com/ldodds/discogs. If you're interested in contributing
them I'm happy to incorporate them.
The latest data is around 129M triples, and links to dbtune, bbc and dbpedia:
6350 links to dbtune.org/myspace
1740 links to bbc.co.uk/music artists
5169 links to dbpedia
Next steps are: I want to iron out the remaining encoding issues, and
also look at improving the modelling.
[1]. Yes, already out of date, but I was away for a chunk of last week.
Cheers,
L.
--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com
great stuff!!!
after a quick glance, we still have some unicode issues somehweres in the chain:
http://discogs.dataincubator.org/artist/piotr-illitch-tcha%C3%AFkovsky.html
hopefully i'll have some time to work on bug hunting next week...
-kurt j