Discogs Refresh

10 views
Skip to first unread message

Leigh Dodds

unread,
Aug 10, 2010, 11:21:08 AM8/10/10
to dataincubator
Hi,

This is just to let you know that I've refreshed the discogs dataset
based on the July data dump[1].

* I've fixed the missing void description, so homepage now works:
http://discogs.dataincubator.org
* Fixed the rdfs:comment typo
* Fixed the broken dbpedia links
* Added links to bbc music based on simple co-reference via myspace links
* Reworked the conversion to use RDF.rb. Combined with what looks like
fixes in the dumps, I think this has addressed some of the previous
encoding issues. I suspect there are others. Bug reports welcome!
* I've also migrated the code to github here:
http://github.com/ldodds/discogs. If you're interested in contributing
them I'm happy to incorporate them.

The latest data is around 129M triples, and links to dbtune, bbc and dbpedia:

6350 links to dbtune.org/myspace
1740 links to bbc.co.uk/music artists
5169 links to dbpedia

Next steps are: I want to iron out the remaining encoding issues, and
also look at improving the modelling.

[1]. Yes, already out of date, but I was away for a chunk of last week.

Cheers,

L.

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com

Kurt J

unread,
Aug 10, 2010, 12:20:47 PM8/10/10
to datain...@googlegroups.com
> --
> You received this message because you are subscribed to the Google Groups "Data Incubator" group.
> To post to this group, send email to datain...@googlegroups.com.
> To unsubscribe from this group, send email to dataincubato...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/dataincubator?hl=en.
>
>

great stuff!!!

after a quick glance, we still have some unicode issues somehweres in the chain:

http://discogs.dataincubator.org/artist/piotr-illitch-tcha%C3%AFkovsky.html

hopefully i'll have some time to work on bug hunting next week...

-kurt j

Ian Davis

unread,
Aug 10, 2010, 1:32:51 PM8/10/10
to datain...@googlegroups.com
Just a note to say that I had expected my musicbrainz conversion to be
reloaded last week (which includes sameas links to the discogs data).
Turns out I resubmitted the old data file for loading :( I will submit
the correct one as soon as I have a few spare minutes.
Reply all
Reply to author
Forward
0 new messages