Fwd: Musicbrainz Refresh

1 view
Skip to first unread message

Ian Davis

unread,
Aug 13, 2010, 6:24:16 AM8/13/10
to music-ontology-sp...@googlegroups.com
FYI


---------- Forwarded message ----------
From: Ian Davis <m...@iandavis.com>
Date: Fri, Aug 13, 2010 at 11:23 AM
Subject: Musicbrainz Refresh
To: datain...@googlegroups.com


After a couple of false starts on my part, this dataset is now live:

http://musicbrainz.dataincubator.org/

This is an entirely new expression of musicbrainz using the NGS dump
(although I notice that I forgot to update the source info in the void
description)

My Dipper browser gives a better view of backlinks....

http://api.talis.com/stores/iand-dev1/items/dipper.html#s=musicbrainz&q=http%3A%2F%2Fmusicbrainz.dataincubator.org%2Fartist%2F49b34ba3-46a2-40af-b6c0-6f1fab93af8b

I will post more details on the schema mapping next week.

Ian

On Tue, Aug 10, 2010 at 6:32 PM, Ian Davis <m...@iandavis.com> wrote:
> Just a note to say that I had expected my musicbrainz conversion to be
> reloaded last week (which includes sameas links to the discogs data).
> Turns out I resubmitted the old data file for loading :( I will submit
> the correct one as soon as I have a few spare minutes.
>
> On Tue, Aug 10, 2010 at 5:20 PM, Kurt J <kur...@gmail.com> wrote:
>> On Tue, Aug 10, 2010 at 10:21 AM, Leigh Dodds <leigh...@talis.com> wrote:
>>> Hi,
>>>
>>> This is just to let you know that I've refreshed the discogs dataset
>>> based on the July data dump[1].
>>>
>>> * I've fixed the missing void description, so homepage now works:
>>> http://discogs.dataincubator.org
>>> * Fixed the rdfs:comment typo
>>> * Fixed the broken dbpedia links
>>> * Added links to bbc music based on simple co-reference via myspace links
>>> * Reworked the conversion to use RDF.rb. Combined with what looks like
>>> fixes in the dumps, I think this has addressed some of the previous
>>> encoding issues. I suspect there are others. Bug reports welcome!
>>> * I've also migrated the code to github here:
>>> http://github.com/ldodds/discogs. If you're interested in contributing
>>> them I'm happy to incorporate them.
>>>
>>> The latest data is around 129M triples, and links to dbtune, bbc and dbpedia:
>>>
>>> 6350 links to dbtune.org/myspace
>>> 1740 links to bbc.co.uk/music artists
>>> 5169 links to dbpedia
>>>
>>> Next steps are: I want to iron out the remaining encoding issues, and
>>> also look at improving the modelling.
>>>
>>> [1]. Yes, already out of date, but I was away for a chunk of last week.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>> --
>>> Leigh Dodds
>>> Programme Manager, Talis Platform
>>> Talis
>>> leigh...@talis.com
>>> http://www.talis.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Data Incubator" group.
>>> To post to this group, send email to datain...@googlegroups.com.
>>> To unsubscribe from this group, send email to dataincubato...@googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/dataincubator?hl=en.
>>>
>>>
>>
>> great stuff!!!
>>
>> after a quick glance, we still have some unicode issues somehweres in the chain:
>>
>> http://discogs.dataincubator.org/artist/piotr-illitch-tcha%C3%AFkovsky.html
>>
>> hopefully i'll have some time to work on bug hunting next week...
>>
>> -kurt j
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Data Incubator" group.
>> To post to this group, send email to datain...@googlegroups.com.
>> To unsubscribe from this group, send email to dataincubato...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/dataincubator?hl=en.
>>
>>
>

Bob Ferris

unread,
Aug 13, 2010, 6:55:42 AM8/13/10
to music-ontology-sp...@googlegroups.com
Hi Ian,

congrats, well done so far. It seems that we have to extend a bit the
relationship part of MO or build a new sub ontology for different music
related relationships. I would prefer a separate ontology for this
issue. Of course, the are already some relationships included in MO[1].
However, they are often also a bit restricted in their range/domain.
Furthermor I guess "dimb:artistList" can be covered by
mo:MusicalGroup[2], mo:member[3] and mo:member_of[4]. So no rdf:List is
needed here. If it is necessary to introduce a band leader, then we
should introduce a new property for this issue.
For the links to the different music information services, I would
suggest to use the Info Service Ontology as it is also proposed in this
example here[5].
Re. the release modelling, I missed a bit the links to the mo:Record
instances there, which should direct to the mo:Track instances. I
noticed only the link to a mo:Medium instance, which links to a
"tracklist", but this link does seem to work.

Cheers,


Bob

[1]
http://wiki.musicontology.com/index.php/Classes_Schemas#mo:MusicArtist.2C_mo:MusicGroup.2C_mo:Label_and_mo:Membership_schemas_.28extended.29
[2] http://musicontology.com/#term_MusicGroup
[3] http://musicontology.com/#term_member
[4] http://musicontology.com/#term_member_of
[5] http://purl.org/ontology/is/infoservice.html#sec-example

Kingsley Idehen

unread,
Aug 13, 2010, 7:01:57 AM8/13/10
to music-ontology-sp...@googlegroups.com
Ian Davis wrote:
> FYI
>
>
> ---------- Forwarded message ----------
> From: Ian Davis <m...@iandavis.com>
> Date: Fri, Aug 13, 2010 at 11:23 AM
> Subject: Musicbrainz Refresh
> To: datain...@googlegroups.com
>
>
> After a couple of false starts on my part, this dataset is now live:
>
> http://musicbrainz.dataincubator.org/
>
> This is an entirely new expression of musicbrainz using the NGS dump
> (although I notice that I forgot to update the source info in the void
> description)
>
> My Dipper browser gives a better view of backlinks....
>
> http://api.talis.com/stores/iand-dev1/items/dipper.html#s=musicbrainz&q=http%3A%2F%2Fmusicbrainz.dataincubator.org%2Fartist%2F49b34ba3-46a2-40af-b6c0-6f1fab93af8b
>
> I will post more details on the schema mapping next week.
>
> Ian
>

Ian,

Nice work!

Few questions:

re:
<http://api.talis.com/stores/iand-dev1/items/dipper.html#s=schema-cache&q=http%3A%2F%2Fpurl.org%2Fontology%2Fmo%2FMusicArtist>

Why can't you at least use some form of paging to handle large results
sets? I am referring to the condition that leads to the following
message: "Too many backlinks to display (2309)" .

Also, will there be a dump of this data? When we put out MBZ Linked Data
dumps we always release the RDF data sets. It is important that RDF
dumps are released as part of LOD efforts. For instance if I could load
the RDF dump into Virtuoso, you would be able to see how backlinks are
handled via Virtuoso which at the very least gives you an example of how
the backlinks problem has been tackled etc..


Kingsley


--

Regards,

Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Ian Davis

unread,
Aug 13, 2010, 7:21:04 AM8/13/10
to music-ontology-sp...@googlegroups.com
>
> Few questions:
>
> re:
> <http://api.talis.com/stores/iand-dev1/items/dipper.html#s=schema-cache&q=http%3A%2F%2Fpurl.org%2Fontology%2Fmo%2FMusicArtist>
>
> Why can't you at least use some form of paging to handle large results sets?
> I am referring to the condition that leads to the following message: "Too
> many backlinks to display (2309)" .

I could. I just haven't had time to write that piece of code in
Dipper. I am slow at writing Javascript.

>
> Also, will there be a dump of this data? When we put out MBZ Linked Data
> dumps we always release the RDF data sets. It is important that RDF dumps
> are released as part of LOD efforts. For instance if I could load the RDF
> dump into Virtuoso, you would be able to see how backlinks are handled via
> Virtuoso which at the very least gives you an example of how the backlinks
> problem has been tackled etc..


Happy to produce dumps. I don't currently because dataincubator is a
temporary host for this musicbrainz data and I don't want the URIs to
become too deeply embedded in the ecosystem. The goal of dataincubator
is to show the original data owners how their data could look and work
and volunteering effort on modelling etc. In this case the URIs should
be prefixed with http://musicbrainz.org/

The current data is available from
http://s3.amazonaws.com/iand/datasets/musicbrainz-20100811.nt.tgz but
I don't think it's a good idea to load that into your LOD cache
because there are severe quality issues and modelling decisions to
change.


>
>
> Kingsley
>>

Ian

Bob Ferris

unread,
Aug 13, 2010, 7:44:04 AM8/13/10
to music-ontology-sp...@googlegroups.com
Hello again,

I prepared now a list of possibly important relationship properties from
MO, where people could be involved (all with their domains and ranges):

- mo:collaborated_with -> foaf:Agent / foaf:Agent
- mo:compiled -> foaf:Agent / mo:MusicalManifestation
- mo:compiler -> mo:MusicalManifestation / foaf:Agent
- mo:composer -> mo:Composition / foaf:Agent
- mo:conductor -> mo:Performance / foaf:Agent
- mo:djmixed -> mo:MusicArtist (Yeah, a DJ as music artist! I love it ;)
) / mo:Signal
- mo:djmixed_by -> mo:Signal / mo:MusicArtist
- mo:engineer -> mo:RecordingSession, mo:Recording, mo:Performance /
foaf:Agent
- mo:engineered -> foaf:Agent / mo:RecordingSession, mo:Recording,
mo:Performance
- mo:group -> mo:Membership / foaf:Group (this should probably moved
completely to FOAF)
- mo:listened -> foaf:Agent / mo:Performance
- mo:listener -> mo:Performance / foaf:Agent
- mo:member -> mo:MusicGroup / mo:Agent
- mo:member_of -> foaf:Person / foaf:Group (FOAF?)
- mo:membership -> foaf:Agent / mo:Membership
- mo:performed -> foaf:Agent / mo:Performance
- mo:performer -> mo:Performance / foaf:Agent
- mo:produced -> foaf:Agent / mo:MusicalManifestation
- mo:producer -> mo:MusicalManifestation / foaf:Agent
- mo:publisher mo:MusicalManifestation / foaf:Agent (this shouldn't
really a part of MO -> better FRBR or something else)
- mo:remixer -> mo:Signal / mo:MusicArtist
- mo:remixed -> mo:MusicArtist / mo:Signal
- mo:sampler -> mo:Signal / mo:MusicArtist
- mo:singer -> mo:Performance / foaf:Agent
- mo:supporting_musician -> mo:MusicArtist / mo:MusicArtist
- mo:tribute_to -> mo:MusicalManifestation / mo:MusicArtist

I skipped the properties re. trading issues. This could probably be
handled by GoodReleations etc.
Furthermore, complex relationships can be handled by the Similarity
Ontology (I guess).
I hope that this overview will help a bit to find out, where further
(new) relationship properties are needed.

Cheers,


Bob

Kurt J

unread,
Aug 13, 2010, 5:56:13 PM8/13/10
to datain...@googlegroups.com, music-ontology-sp...@googlegroups.com
Hi Ian,

Looks like great stuff, I've been doing family stuff today but
hopefully I'll have a closer look this WE or Monday.

On Fri, Aug 13, 2010 at 8:43 AM, Ian Davis <m...@iandavis.com> wrote:
> There are 3 different expressions of the MB data to my knowledge. This
> is only one of them and hopefully they'll converge because we are
> sharing ideas (though its important to remember tgat MB core data is
> publid domain so it could be used to produce lots of different
> variants or mixed into new ones)
>
> For this dataset i expect it to be subsumed by MB themselves fairly
> soon because they are actively working on it. Discogs is in a
> different position - i dont get the sense that they are going to adopt
> linked data very quickly so i expect the dataincubator project to be
> the main expression for quite a while.

We are actively working on the MusicBrainz linked data integration and
its looking like the first iteration will only include RDFa tied to
the existing HTML pages. Ian's work seems to be quite comprehensive
and, after integrating some of Bob's comments, the mapping in MB will
probably be largely the same. Although don't hold me to that yet ;)

I also have the impression that Discogs is less interested (or
unresponsive) w.r.t. linked data integration and furthermore the
dataincubator is the only Discogs RDF mapping I am aware of.

Ian - have you published your source code for the mapping? are these
ruby scripts like the Discogs mapping? would like to have a look :-)

-Kurt J

Kingsley Idehen

unread,
Aug 15, 2010, 9:14:55 AM8/15/10
to music-ontology-sp...@googlegroups.com

Ian,

We always put datasets in their own Named Graph. This makes their use
optional re. any of our Linked Data Spaces.

Anyway, thanks for the dump URL. That's all I need.

Kingsley
>> Kingsley
>>
>
> Ian

Reply all
Reply to author
Forward
0 new messages