Musicbrainz Refresh

26 views
Skip to first unread message

Ian Davis

unread,
Aug 13, 2010, 6:23:43 AM8/13/10
to datain...@googlegroups.com
After a couple of false starts on my part, this dataset is now live:

http://musicbrainz.dataincubator.org/

This is an entirely new expression of musicbrainz using the NGS dump
(although I notice that I forgot to update the source info in the void
description)

My Dipper browser gives a better view of backlinks....

http://api.talis.com/stores/iand-dev1/items/dipper.html#s=musicbrainz&q=http%3A%2F%2Fmusicbrainz.dataincubator.org%2Fartist%2F49b34ba3-46a2-40af-b6c0-6f1fab93af8b

I will post more details on the schema mapping next week.

Ian

On Tue, Aug 10, 2010 at 6:32 PM, Ian Davis <m...@iandavis.com> wrote:
> Just a note to say that I had expected my musicbrainz conversion to be
> reloaded last week (which includes sameas links to the discogs data).
> Turns out I resubmitted the old data file for loading :( I will submit
> the correct one as soon as I have a few spare minutes.
>
> On Tue, Aug 10, 2010 at 5:20 PM, Kurt J <kur...@gmail.com> wrote:
>> On Tue, Aug 10, 2010 at 10:21 AM, Leigh Dodds <leigh...@talis.com> wrote:
>>> Hi,
>>>
>>> This is just to let you know that I've refreshed the discogs dataset
>>> based on the July data dump[1].
>>>
>>> * I've fixed the missing void description, so homepage now works:
>>> http://discogs.dataincubator.org
>>> * Fixed the rdfs:comment typo
>>> * Fixed the broken dbpedia links
>>> * Added links to bbc music based on simple co-reference via myspace links
>>> * Reworked the conversion to use RDF.rb. Combined with what looks like
>>> fixes in the dumps, I think this has addressed some of the previous
>>> encoding issues. I suspect there are others. Bug reports welcome!
>>> * I've also migrated the code to github here:
>>> http://github.com/ldodds/discogs. If you're interested in contributing
>>> them I'm happy to incorporate them.
>>>
>>> The latest data is around 129M triples, and links to dbtune, bbc and dbpedia:
>>>
>>> 6350 links to dbtune.org/myspace
>>> 1740 links to bbc.co.uk/music artists
>>> 5169 links to dbpedia
>>>
>>> Next steps are: I want to iron out the remaining encoding issues, and
>>> also look at improving the modelling.
>>>
>>> [1]. Yes, already out of date, but I was away for a chunk of last week.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>> --
>>> Leigh Dodds
>>> Programme Manager, Talis Platform
>>> Talis
>>> leigh...@talis.com
>>> http://www.talis.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Data Incubator" group.
>>> To post to this group, send email to datain...@googlegroups.com.
>>> To unsubscribe from this group, send email to dataincubato...@googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/dataincubator?hl=en.
>>>
>>>
>>
>> great stuff!!!
>>
>> after a quick glance, we still have some unicode issues somehweres in the chain:
>>
>> http://discogs.dataincubator.org/artist/piotr-illitch-tcha%C3%AFkovsky.html
>>
>> hopefully i'll have some time to work on bug hunting next week...
>>
>> -kurt j
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Data Incubator" group.
>> To post to this group, send email to datain...@googlegroups.com.
>> To unsubscribe from this group, send email to dataincubato...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/dataincubator?hl=en.
>>
>>
>

Richard Cyganiak

unread,
Aug 13, 2010, 9:08:54 AM8/13/10
to datain...@googlegroups.com
Hi Ian,

Awesome! Do you have the total triple number, and details on the links
to other datasets (I see some DBPedia, discogs and BBC/music)?

Yeah, I'm still updating the LOD cloud ...

Best,
Richard

Bob Ferris

unread,
Aug 13, 2010, 9:23:17 AM8/13/10
to datain...@googlegroups.com
Hi Richard,

you probably didn't follow the parallel discussion at the Music Ontology
mailing list. Ian wrote there:

"... Happy to produce dumps. I don't currently because dataincubator is
a temporary host for this musicbrainz data and I don't want the URIs to
become too deeply embedded in the ecosystem. The goal of dataincubator
is to show the original data owners how their data could look and work
and volunteering effort on modelling etc. In this case the URIs should
be prefixed with http://musicbrainz.org/

The current data is available from
http://s3.amazonaws.com/iand/datasets/musicbrainz-20100811.nt.tgz but
I don't think it's a good idea to load that into your LOD cache
because there are severe quality issues and modelling decisions to
change. ..."

So this dataset isn't intended to appear on the LOD cloud. Hopefully,
the outcome of LinkedBrainz project[1] could be a part of the LOD cloud
in the future.
And again, a number of triples doesn't say anything about the quality of
a dataset. It's for me more ore less worth nothing. Sorry to probably be
a bit harsh here, but I'm really a bit afraid of that triple counter
metric hype. We should investigate or knowledge more into the high
quality datasets (on a hopefully meaningful quality definition).

Cheers,


Bob

[1] http://linkedbrainz.c4dmpresents.org/

Ian Davis

unread,
Aug 13, 2010, 9:43:09 AM8/13/10
to datain...@googlegroups.com
There are 3 different expressions of the MB data to my knowledge. This
is only one of them and hopefully they'll converge because we are
sharing ideas (though its important to remember tgat MB core data is
publid domain so it could be used to produce lots of different
variants or mixed into new ones)

For this dataset i expect it to be subsumed by MB themselves fairly
soon because they are actively working on it. Discogs is in a
different position - i dont get the sense that they are going to adopt
linked data very quickly so i expect the dataincubator project to be
the main expression for quite a while.

Richard Cyganiak

unread,
Aug 13, 2010, 10:11:25 AM8/13/10
to datain...@googlegroups.com
Bob,

You jump to conclusions. I'm not asking for triple counts to pass
quality judgements. That is the last thing I want to get into. The LOD
cloud picture would look rather boring if all the datasets were drawn
in the same size, hence we draw them in different size depending on
triple count. That's why I ask for these numbers.

The LOD cloud diagram is a visual directory of linked RDF datasets
that exist out there on the web. Listing this dataset in the LOD cloud
diagram is a completely independent question from loading it into
someone's LOD cache.

I'm aware of the different MusicBrainz conversion efforts and have
been in contact with their respective publishers, submitting bug
reports and so on.

Ian: I still want those numbers.

Best,
Richard

Bob Ferris

unread,
Aug 13, 2010, 10:54:30 AM8/13/10
to datain...@googlegroups.com
Hi Richard,

Am 13.08.2010 16:11, schrieb Richard Cyganiak:
> Bob,
>
> You jump to conclusions. I'm not asking for triple counts to pass
> quality judgements. That is the last thing I want to get into. The LOD
> cloud picture would look rather boring if all the datasets were drawn in
> the same size, hence we draw them in different size depending on triple
> count. That's why I ask for these numbers.
>
> The LOD cloud diagram is a visual directory of linked RDF datasets that
> exist out there on the web. Listing this dataset in the LOD cloud
> diagram is a completely independent question from loading it into
> someone's LOD cache.
>

Again, sorry for the harshness against the triple counter metric.
However, does this mean that you would represent all MusicBrainz
datasets in you LOD diagram? If yes, does this really makes sense?

Cheers,


Bob

Tom Morris

unread,
Aug 13, 2010, 12:04:10 PM8/13/10
to datain...@googlegroups.com
This looks cool. How are the sameAs links computed? Does the server
not know how to compute the closure for all the loaded data sets? I
ask because there seem to be sameAs links on the DBpedia entry which
aren't represented in the sameAs for the MusicBrainz entry.

On Fri, Aug 13, 2010 at 9:43 AM, Ian Davis <m...@iandavis.com> wrote:
> There are 3 different expressions of the MB data to my knowledge.

What are the three?

Kind of off-topic for the list, but I think it's an interesting
question. For example, a subset of the Freebase entries includes all
the MusicBrainz artists, releases, and tracks, (e.g.
http://www.freebase.com/view/en/kevin_shields for Ian's example) but
a) they're not separate/distinguishable from other datasets (ie the
graphs are merged, not kept separate) and b) from a visual standpoint
if you start trying to convert the LD cloud diagram into some type of
Venn diagram, the drawing and layout goes from really hard to probably
impossible.

Tom

Kurt J

unread,
Aug 13, 2010, 5:56:13 PM8/13/10
to datain...@googlegroups.com, music-ontology-sp...@googlegroups.com
Hi Ian,

Looks like great stuff, I've been doing family stuff today but
hopefully I'll have a closer look this WE or Monday.

On Fri, Aug 13, 2010 at 8:43 AM, Ian Davis <m...@iandavis.com> wrote:
> There are 3 different expressions of the MB data to my knowledge. This
> is only one of them and hopefully they'll converge because we are
> sharing ideas (though its important to remember tgat MB core data is
> publid domain so it could be used to produce lots of different
> variants or mixed into new ones)
>
> For this dataset i expect it to be subsumed by MB themselves fairly
> soon because they are actively working on it. Discogs is in a
> different position - i dont get the sense that they are going to adopt
> linked data very quickly so i expect the dataincubator project to be
> the main expression for quite a while.

We are actively working on the MusicBrainz linked data integration and
its looking like the first iteration will only include RDFa tied to
the existing HTML pages. Ian's work seems to be quite comprehensive
and, after integrating some of Bob's comments, the mapping in MB will
probably be largely the same. Although don't hold me to that yet ;)

I also have the impression that Discogs is less interested (or
unresponsive) w.r.t. linked data integration and furthermore the
dataincubator is the only Discogs RDF mapping I am aware of.

Ian - have you published your source code for the mapping? are these
ruby scripts like the Discogs mapping? would like to have a look :-)

-Kurt J

Richard Cyganiak

unread,
Aug 14, 2010, 4:04:52 PM8/14/10
to datain...@googlegroups.com
Hi Bob,

On 13 Aug 2010, at 16:54, Bob Ferris wrote:
> However, does this mean that you would represent all MusicBrainz
> datasets in you LOD diagram?

Yes.

> If yes, does this really makes sense?

It reflects reality. That's good enough for me.

And there is of course a story that explains why there are three
versions of MusicBrainz in RDF on the Web right now, with a fourth
coming.

The 2009 iteration of the diagram already has three conversions of
DBLP and two of Eurostat, so this won't be a first either.

Best,
Richard


>
> Cheers,
>
>
> Bob

Ian Davis

unread,
Aug 16, 2010, 6:12:28 AM8/16/10
to Richard Cyganiak, datain...@googlegroups.com
On Fri, Aug 13, 2010 at 2:08 PM, Richard Cyganiak <ric...@cyganiak.de> wrote:
> Hi Ian,
>
> Awesome! Do you have the total triple number, and details on the links to
> other datasets (I see some DBPedia, discogs and BBC/music)?
>

178,775,789 triples
76,171 sameas links to dbpedia
543,325 to bbc music
236,258 to discogs.dataincubator.org


Ian

Christophe Gueret

unread,
Aug 16, 2010, 8:20:22 AM8/16/10
to Data Incubator
Hi Ian!

Could you turn this into a void description and upload it to
http://void.rkbexplorer.com ? I think that would be useful :)
And you can use Michael Hausenblas' nice editor to do that:
http://ld2sd.deri.org/ve2/

Christophe


On Aug 16, 12:12 pm, Ian Davis <m...@iandavis.com> wrote:

Ian Davis

unread,
Aug 16, 2010, 11:17:47 AM8/16/10
to datain...@googlegroups.com
The homepage _is_ a void description (as are all the dataincubator projects)

http://musicbrainz.dataincubator.org/

and in RDF/XML:

http://musicbrainz.dataincubator.org/.rdf

Happy for anyone to copy any of this to anywhere they like.

Christophe Gueret

unread,
Aug 16, 2010, 3:20:45 PM8/16/10
to Data Incubator
Woops! Sorry, I did not see that.
Thanks for pointing it out :)

Christophe


On Aug 16, 5:17 pm, Ian Davis <m...@iandavis.com> wrote:
> The homepage _is_ a void description (as are all the dataincubator projects)
>
> http://musicbrainz.dataincubator.org/
>
> and in RDF/XML:
>
> http://musicbrainz.dataincubator.org/.rdf
>
> Happy for anyone to copy any of this to anywhere they like.
>
> On Mon, Aug 16, 2010 at 1:20 PM, Christophe Gueret
>
> <christophe.gue...@gmail.com> wrote:
> > Hi Ian!
>
> > Could you turn this into a void description and upload it to
> >http://void.rkbexplorer.com? I think that would be useful :)

Ian Davis

unread,
Aug 16, 2010, 5:58:57 PM8/16/10
to datain...@googlegroups.com
On Mon, Aug 16, 2010 at 8:20 PM, Christophe Gueret
<christop...@gmail.com> wrote:
> Woops! Sorry, I did not see that.
> Thanks for pointing it out :)
>

No worries. I hope you find it helpful.

Ian

Reply all
Reply to author
Forward
0 new messages