Discogs development

39 views
Skip to first unread message

Mats

unread,
Jun 9, 2010, 6:48:19 PM6/9/10
to Data Incubator
Here's my take on some things that could be done to improve the
Discogs RDFizer.

With the newest Music Ontology I believe, if I've understood it
correctly, a Discogs release should become roughly something like
this:

mo:Release
mo:catalogue_number
discogs:genre(?)
discogs:style(?)
mo:record_count
mo:release_status (bootleg/unofficial, official, promotion)
mo:release_type (compilation, album etc.)
mo:record
mo:Record
mo:track_count
foaf:maker
mo:track
mo:Track
dc:title
foaf:maker
mo:track_number
timeline:duration
mo:release_event
mo:ReleaseEvent
mo:label
country
release date

mo:Record is a tracklist in discogs. Multirecord releases are numbered
in <track><position> in the discogs xml. For example a 2-disc release
will have positions 1-1,1-2,...,2-1,... where 1-1 is track 1 cd 1 and
2-1 is track 1 cd 2 etc, this should become two different mo:Records.

The format description in Discogs contains the medium, release_status,
release_type and other stuff, but is only marked by <description>.
This means the text has to be used to map to the correct MO concept,
e.g. <description>Promo</description> is equal to mo:ReleaseStatus
promotion, while <description>Album</description> is equal to
mo:ReleaseType album

When it comes to MusicBrainz links, there are some in the data, but
for many things there are no links. To link the Discogs releases to
MusicBrainz (NGS) I think it is best to have both databases in RDF
with MO 2.0, neither of which exist yet. Then, for each artist in
Discogs, query the MusicBrainz db with the artist name and select the
artist names with similarity over a certain threshold for further
inspection. For each possible hit, compare the album names and track
names in the same manner as the artist names and match the artists if
they pass a certain threshold score. Then continue to match the
Discogs releases for that artist to MusicBrainz releases with the same
title, catalog nr, release date, country or whatever is available.
After a release is matched, it's a simple matter to match the tracks
on the release.

One thing that the Discogs dumps does not have is mo:Signal and
mo:SignalGroup (recording and release group in MusicBrainz NGS). A
SignalGroup should be something like a master release in Discogs, but
they're not included in the dumps. A mo:Track is a mo:Signal on one
specific record, with the relation [Signal] mo:published_as [Track],
and has a track number on that particular record. This means that if
an album is released in Japan and USA with the same songs, these songs
on these two releases will have different mo:Tracks but the same
mo:Signals.

I think the way to get mo:Signal from tracks is to first link each
Discogs track to the corresponding MusicBrainz track and then for
those only use the mo:Signal from MusicBrainz. MusicBrainz (NGS) has
recordings that corresponds to mo:Signal, while Discogs does not so I
think this will lead to cleaner data. To get mo:Signals from releases
that does not exist in MusicBrainz a mo:Signal could be created from
all the tracks from these releases. Each track that has the same
title, artist, length and release title(but can be from different
releases) can be the basis for a single mo:Signal, and all other
tracks can be the basis for their own mo:Signal.

SignalGroups could be created by making one mo:SignalGroup for all
releases with the same artist and similar titles with approximately
the same tracks, but for SignalGroups that have one or more releases
in both MusicBrainz and Discogs I suppose the SignalGroup/ReleaseGroup
from MusicBrainz could be used.

Bob Ferris

unread,
Jun 10, 2010, 3:58:50 AM6/10/10
to datain...@googlegroups.com, music-ontology-sp...@googlegroups.com
Am 10.06.2010 00:48, schrieb Mats:
> Here's my take on some things that could be done to improve the
> Discogs RDFizer.
>
> With the newest Music Ontology I believe, if I've understood it
> correctly, a Discogs release should become roughly something like
> this:
>
> mo:Release
> mo:catalogue_number
> discogs:genre(?)
> discogs:style(?)
=> it would probably makes sense to create a discogs genre taxonomy,
which is hooked from mo:Genre (therefore, the domain mo:Release,
mo:Record, mo:Track (or full mo:MusicalManifestation), mo:Signal should
be added to to mo:genre and the isSubPropertyOf feature be removed)
=> furthermore, it would be good to mark these genres and music styles
that they are come from the info service Discogs (see [1] for info
service concept)

> mo:record_count
> mo:release_status (bootleg/unofficial, official, promotion)
> mo:release_type (compilation, album etc.)
> mo:record
> mo:Record
> mo:track_count
> foaf:maker
> mo:track
> mo:Track
> dc:title
> foaf:maker
> mo:track_number
> timeline:duration
> mo:release_event
> mo:ReleaseEvent
> mo:label
> country
=> event:place
> release date
=> event:time

=> what about releated artists (features, remixer, ...) and people from
the production process (producer, writer, engineerer)?

>
> mo:Record is a tracklist in discogs. Multirecord releases are numbered
> in<track><position> in the discogs xml. For example a 2-disc release
> will have positions 1-1,1-2,...,2-1,... where 1-1 is track 1 cd 1 and
> 2-1 is track 1 cd 2 etc, this should become two different mo:Records.
>
> The format description in Discogs contains the medium, release_status,
> release_type and other stuff, but is only marked by<description>.
> This means the text has to be used to map to the correct MO concept,
> e.g.<description>Promo</description> is equal to mo:ReleaseStatus
> promotion, while<description>Album</description> is equal to
> mo:ReleaseType album
>
> When it comes to MusicBrainz links, there are some in the data, but
> for many things there are no links. To link the Discogs releases to
> MusicBrainz (NGS) I think it is best to have both databases in RDF
> with MO 2.0, neither of which exist yet. Then, for each artist in
> Discogs, query the MusicBrainz db with the artist name and select the
> artist names with similarity over a certain threshold for further
> inspection. For each possible hit, compare the album names and track
> names in the same manner as the artist names and match the artists if
> they pass a certain threshold score. Then continue to match the
> Discogs releases for that artist to MusicBrainz releases with the same
> title, catalog nr, release date, country or whatever is available.
> After a release is matched, it's a simple matter to match the tracks
> on the release.

=> it will maybe better to do it the other way around (from MusicBrainz
to Discogs)

>
> One thing that the Discogs dumps does not have is mo:Signal and
> mo:SignalGroup (recording and release group in MusicBrainz NGS). A
> SignalGroup should be something like a master release in Discogs, but
> they're not included in the dumps.

=> That's bad, that is one of the good features from Discogs

> A mo:Track is a mo:Signal on one
> specific record, with the relation [Signal] mo:published_as [Track],
> and has a track number on that particular record. This means that if
> an album is released in Japan and USA with the same songs, these songs
> on these two releases will have different mo:Tracks but the same
> mo:Signals.
>
> I think the way to get mo:Signal from tracks is to first link each
> Discogs track to the corresponding MusicBrainz track and then for
> those only use the mo:Signal from MusicBrainz. MusicBrainz (NGS) has
> recordings that corresponds to mo:Signal, while Discogs does not so I
> think this will lead to cleaner data. To get mo:Signals from releases
> that does not exist in MusicBrainz a mo:Signal could be created from
> all the tracks from these releases. Each track that has the same
> title, artist, length and release title(but can be from different
> releases) can be the basis for a single mo:Signal, and all other
> tracks can be the basis for their own mo:Signal.

=> sounds good so far ;)

>
> SignalGroups could be created by making one mo:SignalGroup for all
> releases with the same artist and similar titles with approximately
> the same tracks, but for SignalGroups that have one or more releases
> in both MusicBrainz and Discogs I suppose the SignalGroup/ReleaseGroup
> from MusicBrainz could be used.
>

Overall thank you very much for your detailed proposal for a Discogs
mapping. We are on the right track!

Cheers,

Bob


[1] http://wiki.foaf-project.org/w/FOAF_and_InformationServices

Kurt J

unread,
Jun 10, 2010, 1:00:13 PM6/10/10
to music-ontology-sp...@googlegroups.com, datain...@googlegroups.com
FWIW Queen Mary's Centre for Digital Music has just been awarded a
JISC grant for linked data music stuff - we will be collaborating with
MetaBrainz making the MusicBrainz core a proper linked data resource
and do a good mapping of NGS to MO. Depending on how quick this goes
- there maybe some more resources to burn for working on the Discogs
set. The grant hires one coder (hopefully me ;)

I will respond in more detail later. For now back to diapers...

-kurt j

> --
> You received this message because you are subscribed to the Google Groups
> "Music Ontology Specification Group" group.
> To post to this group, send email to
> music-ontology-sp...@googlegroups.com.
> To unsubscribe from this group, send email to
> music-ontology-specific...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/music-ontology-specification-group?hl=en.
>
>

mats...@gmail.com

unread,
Jun 10, 2010, 1:09:50 PM6/10/10
to music-ontology-sp...@googlegroups.com, datain...@googlegroups.com
=> it would probably makes sense to create a discogs genre taxonomy, which is hooked from mo:Genre (therefore, the domain mo:Release, mo:Record, mo:Track (or full mo:MusicalManifestation), mo:Signal should be added to to mo:genre and the isSubPropertyOf feature be removed)
furthermore, it would be good to mark these genres and music styles that they are come from the info service Discogs (see [1] for info service concept)

I think that would be good idea also.
 
=> what about releated artists (features, remixer, ...) and people from the production process (producer, writer, engineerer)?

Features should perhaps be a reified statement to another artist as foaf:maker with "features" as a property on the reified statement?
Things connected to a Discogs track can be connected to the mo:Recording and mo:Performance of the track and for SignalGroups the RecordingSession of the SignalGroup I think.

mo:Recording
    mo:engineer (Engineer from Discogs)
    mo:produced_signal mo:Signal

mo:Performance
    mo:instrument (Instruments from Discogs track, but Discogs doesn't only have the instrument it also specifies who plays it, not sure how to model that in MO)
    mo:conductor (Conductor from Discogs track)
    mo:singer (In Discogs there are things like Soprano Voice and Rap though)
    mo:recorded_as mo:Signal        

mo:RecordingSession
    mo:engineer (Engineer from Discogs)
    mo:produced_signal_group mo:SignalGroup      

There are many other things in Discogs that I'm a bit unsure about, such as Arranged By, Artwork By, Directed By, Mastered By, Mixed By and Recorded By. Also, a track in Discogs can have a producer, while in MO it looks like only the Record can have a producer.

=> it will maybe better to do it the other way around (from MusicBrainz to Discogs)

Yeah, MusicBrainz NGS is more like MO a.t.m. and updates aren't only available once a month so that might be better.


Cheers,
Mats

Bob Ferris

unread,
Jun 10, 2010, 1:12:41 PM6/10/10
to music-ontology-sp...@googlegroups.com, datain...@googlegroups.com
Woha,

that are the news of the day. I'm very happy about that announcement. I
set up a MusicBrainz NGS dump locally today. So I'm reading for a NGS to
MO mapping ;)

Cheers,

Bob

Kurt J

unread,
Jun 11, 2010, 9:25:28 AM6/11/10
to music-ontology-sp...@googlegroups.com, datain...@googlegroups.com
Hi Nicholas,


> I think Yves said something to me about RDFa the other day, is that the plan
> Kurt? Both RDFa and RDF/XML would be great ;-)

there's been talk of RDFa, but then there's also been some concerns
(by MusicBrainz developers) about bloating the HTML and favoring some
content negotiation. this is all early stages but we plan on using
the mo list and public-lod to solicit opinions.

> So far MusicBrainz has had to block access to the site by spiders in it's
> robots.txt file. Do you know if this will be able to change? Otherwise
> well-behaved linked data clients won't be able to access the site...

I was not aware of this issue - thanks. i've never tried crawling
MusicBrainz - always just used the dumps or web service. i'm not sure
what the answer is - we'll have to see what the MusicBrainz guys are
willing to do.

-kurt j

> nick.

Bob Ferris

unread,
Jun 14, 2010, 5:36:45 AM6/14/10
to music-ontology-sp...@googlegroups.com, datain...@googlegroups.com
Hi,

Am 11.06.2010 15:25, schrieb Kurt J:
> Hi Nicholas,
>
>
>> I think Yves said something to me about RDFa the other day, is that the plan
>> Kurt? Both RDFa and RDF/XML would be great ;-)
>
> there's been talk of RDFa, but then there's also been some concerns
> (by MusicBrainz developers) about bloating the HTML and favoring some
> content negotiation. this is all early stages but we plan on using
> the mo list and public-lod to solicit opinions.
>
>> So far MusicBrainz has had to block access to the site by spiders in it's
>> robots.txt file. Do you know if this will be able to change? Otherwise
>> well-behaved linked data clients won't be able to access the site...
>
> I was not aware of this issue - thanks. i've never tried crawling
> MusicBrainz - always just used the dumps or web service. i'm not sure
> what the answer is - we'll have to see what the MusicBrainz guys are
> willing to do.

But hey, do we really need that? It should also be possible to run a
semantic graph based knowledge base of MusicBrainz side by side to the
existing one.

1. There exists a dump of the MusicBrainz database in NGS schema
2. There exists the LiveFeed feature from MusicBrainz [1]
3. There exists the REST based XML Web Service [2], where it is possible
to submit updates from the semantic graph based knowledge base

Because, I don't really believe into a change of their website structure
and background techniques in the near future. The main development power
from the MusicBrainz guys is concentrated now into the NGS release (I
guess ;) ).

Cheers,

Bob


PS: That's also the way BBC did it, or?

[1] http://musicbrainz.org/doc/Live_Data_Feed
[2] http://musicbrainz.org/doc/XML_Web_Service

Bob Ferris

unread,
Jun 17, 2010, 12:30:49 PM6/17/10
to datain...@googlegroups.com
Hello,

I did some edits in the Music Ontology wiki [1] during the last few
days, which might be of interest re. writing a mapping from Discogs
database schema to MO.

My main concerns were,
- to add new graphics, extend and explain existing ones to/of the class
schema site (its now mainly all linked to Music Ontology documentation) [2]
- to review (again) the extend release concept and add some notes (and
links) there [3]

I hope that it will probably now a bit easier to understand the Music
Ontology and its extended release concept (as introduced in Music
Ontology version 2.0).
Please let me know, which relations are also interesting (necessary) to
model (illustrate) and feel free to add any comments and suggestions and
critics (of course).
I know some graphics are quite big, but I tried to do my best to
compress them as possible. Although some could also be a bit more
compress as currently (I think ;) ).
Last but not least, the development process of the Music Ontology is a
continuing workflow, so I submitted also some changes, which can be
reviewed in the mo21_proposal branch [4] of the motools SVN. I added now
also the project files for the TopBraid Composer Maestro Edition [5]
ontology modelling tool, which is the Creative Suite or Mercedes of this
development tools area ;)
With the projects files one can also load the graphs of some graphics of
the class schema site.

Cheers,

Bob

[1] http://wiki.musicontology.com/
[2] http://wiki.musicontology.com/index.php/Classes_Schemas
[3] http://wiki.musicontology.com/index.php/ProposalRevision14
[4]
http://motools.svn.sourceforge.net/viewvc/motools/mo/branches/mo21_proposal/
[5] http://www.topquadrant.com/products/TB_Composer.html

Am 15.06.2010 23:41, schrieb Nicholas J Humfrey:


> On 14 Jun 2010, at 10:36, Bob Ferris wrote:
>
>> Hi,
>>
>> Am 11.06.2010 15:25, schrieb Kurt J:
>>> Hi Nicholas,
>>>
>>>
>>>> I think Yves said something to me about RDFa the other day, is that the plan
>>>> Kurt? Both RDFa and RDF/XML would be great ;-)
>>>
>>> there's been talk of RDFa, but then there's also been some concerns
>>> (by MusicBrainz developers) about bloating the HTML and favoring some
>>> content negotiation. this is all early stages but we plan on using
>>> the mo list and public-lod to solicit opinions.
>>>
>>>> So far MusicBrainz has had to block access to the site by spiders in it's
>>>> robots.txt file. Do you know if this will be able to change? Otherwise
>>>> well-behaved linked data clients won't be able to access the site...
>>>
>>> I was not aware of this issue - thanks. i've never tried crawling
>>> MusicBrainz - always just used the dumps or web service. i'm not sure
>>> what the answer is - we'll have to see what the MusicBrainz guys are
>>> willing to do.
>>
>> But hey, do we really need that? It should also be possible to run a semantic graph based knowledge base of MusicBrainz side by side to the existing one.
>>
>> 1. There exists a dump of the MusicBrainz database in NGS schema
>> 2. There exists the LiveFeed feature from MusicBrainz [1]
>> 3. There exists the REST based XML Web Service [2], where it is possible to submit updates from the semantic graph based knowledge base
>>
>> Because, I don't really believe into a change of their website structure and background techniques in the near future. The main development power from the MusicBrainz guys is concentrated now into the NGS release (I guess ;) ).
>

> Hi Bob,
>
> I think both are very valuable. I have looked at adding RDF/XML (and/or RDFa) support to the main MusicBrainz codebase, but I have held-off because of the ever-changing codebase due to the NGS work. I think it is important to have 'official', authoritative and de-referencable MusicBrainz identifiers. And then we can stop the age-old question of 'which namespace for an artist URI do I use?'.
>
> But it would be great to have a separate SPARQL endpoint that can be queried too ;-)
>
>
> nick.

Reply all
Reply to author
Forward
0 new messages