MusicBrainz and pagination

46 views
Skip to first unread message

Kurt J

unread,
Aug 4, 2010, 10:51:49 AM8/4/10
to music-ontology-sp...@googlegroups.com
Hello,

In digging into the MusicBrainz world, it seems pretty clear we will
have to deal with some pagination in RDF. For example, certain
artists (e.g. Bach or Mozart) have several thousands of items
associated with them. Creating one monster HTML or RDF doc for this
is too much strain on the DB server.

It seems we'll have to deal with pagination in the RDF. I've found a
vocab [1] and a semantic overflow post [2] about this, but that's the
extent of my knowledge. I also recall that Ian was having some
trouble with huge RDF docs timing out in his dataincubator.

Any thoughts?

Thanks,
Kurt J

[1] http://code.google.com/p/linked-data-api/wiki/API_Viewing_Resources#Page_Description
[2] http://www.semanticoverflow.com/questions/1398/are-there-terms-for-describing-pagination

Frederick Giasson

unread,
Aug 4, 2010, 11:07:34 AM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Kurt,

> In digging into the MusicBrainz world, it seems pretty clear we will
> have to deal with some pagination in RDF. For example, certain
> artists (e.g. Bach or Mozart) have several thousands of items
> associated with them. Creating one monster HTML or RDF doc for this
> is too much strain on the DB server.
>
> It seems we'll have to deal with pagination in the RDF. I've found a
> vocab [1] and a semantic overflow post [2] about this, but that's the
> extent of my knowledge. I also recall that Ian was having some
> trouble with huge RDF docs timing out in his dataincubator.

I guess it is no different than how you would handle this problem with
traditional RDBMS. Depending on the description of a record, and the UI
where you are presenting that information, you will have to build
pagination at the UI level and properly craft your SPARQL queries so
that you get subsets of triples that you want to display depending on
where you are with your pagination.

I really don't think that integrating pagination at the level of the
description of your data is the way to go. Otherwise, what if your UI
expect 10 records, and mine 100? The UI logic should handle this from
the plain description of a record.

At least, it is my modus operandi.

Thanks!

Take care,


Fred

Ian Davis

unread,
Aug 4, 2010, 11:32:16 AM8/4/10
to music-ontology-sp...@googlegroups.com
On Wed, Aug 4, 2010 at 3:51 PM, Kurt J <kur...@gmail.com> wrote:
> Hello,
>
> In digging into the MusicBrainz world, it seems pretty clear we will
> have to deal with some pagination in RDF.  For example, certain
> artists (e.g. Bach or Mozart) have several thousands of items
> associated with them.  Creating one monster HTML or RDF doc for this
> is too much strain on the DB server.
>
> It seems we'll have to deal with pagination in the RDF.  I've found a
> vocab [1] and a semantic overflow post [2] about this, but that's the
> extent of my knowledge.  I also recall that Ian was having some
> trouble with huge RDF docs timing out in his dataincubator.
>

My problem was in renderng them as an HTML page. the RDF itself spits
out quite nicely. The linked data API would be an option and paging
was one of its primary use cases.

It's possible to include this kind of information in the model, but
I'm not convinced that's necessary.

BTW, I expect a refresh of http://musicbrainz.dataincubator.org/ to be
live tomorrow some time. I have started documenting the modelling I
have done at http://wiki.musicbrainz.org/NGS_to_RDF_mappings#proposal_1_.28dataincubator.29


Ian

Bob Ferris

unread,
Aug 4, 2010, 11:46:57 AM8/4/10
to music-ontology-sp...@googlegroups.com
Hi all,

Am 04.08.2010 17:32, schrieb Ian Davis:
> On Wed, Aug 4, 2010 at 3:51 PM, Kurt J<kur...@gmail.com> wrote:
>> Hello,
>>
>> In digging into the MusicBrainz world, it seems pretty clear we will
>> have to deal with some pagination in RDF. For example, certain
>> artists (e.g. Bach or Mozart) have several thousands of items
>> associated with them. Creating one monster HTML or RDF doc for this
>> is too much strain on the DB server.
>>
>> It seems we'll have to deal with pagination in the RDF. I've found a
>> vocab [1] and a semantic overflow post [2] about this, but that's the
>> extent of my knowledge. I also recall that Ian was having some
>> trouble with huge RDF docs timing out in his dataincubator.
>>
>
> My problem was in renderng them as an HTML page. the RDF itself spits
> out quite nicely. The linked data API would be an option and paging
> was one of its primary use cases.
>
> It's possible to include this kind of information in the model, but
> I'm not convinced that's necessary.

As Frederick and now also Ian said, it won't be really needed because it
is on the UI side (in my mind). However, if one includes the Semantic
Graph directly into the HTML site as RDFa, this might become an issue.
In that case I would probably simple connecting the single graphs
somehow, e.g. by using the
http://www.w3.org/2006/link#listDocumentProperty property.

Cheers,


Bob

Frederick Giasson

unread,
Aug 4, 2010, 11:50:36 AM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Bob,

> As Frederick and now also Ian said, it won't be really needed because
> it is on the UI side (in my mind). However, if one includes the
> Semantic Graph directly into the HTML site as RDFa, this might become
> an issue. In that case I would probably simple connecting the single
> graphs somehow, e.g. by using the
> http://www.w3.org/2006/link#listDocumentProperty property.

Right, but then my next question would be: why not only adding the RDFa
of the paginated result only?

There are other (and better) methods to do the complete export, no?


Thanks,

Fred

Bob Ferris

unread,
Aug 4, 2010, 11:55:54 AM8/4/10
to music-ontology-sp...@googlegroups.com

Yes, of course. It's just because of the discussion, which opportunities
should be provided: XHTML+RDFa and/or content negotiation with different
representation formats RDF/Turtle, RDF/JSON, RDF/XML, ...

If we would only model the paginated result as Semantic Graph, than we
might add a hint somehow, that this is just a part of a bigger one, or?

Cheers,


Bob

Kurt J

unread,
Aug 4, 2010, 2:00:32 PM8/4/10
to music-ontology-sp...@googlegroups.com
Thnx all,

Again, the big issue here is that the MusicBrainz dev team isn't happy
about making huge SQL queries to build an all encompassing RDF page
for an artist that is associated with thousands and thousands of
works. This will make MusicBrainz fall over (they fear). We have to
work within the confines of what MB dev is willing to do.

I'm hoping to avoid pagination but...

Note that the there is _lots_ of pagination in the new server. see

http://test.musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438/recordings

and with an RDFa approach - i'd reckon we would have to deal with
pagination in the model, otherwise how do you find all the data?

with a content negotiation approach, perhaps we can get away with some
optimized super queries.

/me goes to investigate further...

-Kurt J

> --
> You received this message because you are subscribed to the Google Groups
> "Music Ontology Specification Group" group.
> To post to this group, send email to
> music-ontology-sp...@googlegroups.com.
> To unsubscribe from this group, send email to
> music-ontology-specific...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/music-ontology-specification-group?hl=en.
>
>

Frederick Giasson

unread,
Aug 4, 2010, 2:17:18 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Kurt,

> Again, the big issue here is that the MusicBrainz dev team isn't happy
> about making huge SQL queries to build an all encompassing RDF page
> for an artist that is associated with thousands and thousands of
> works. This will make MusicBrainz fall over (they fear). We have to
> work within the confines of what MB dev is willing to do.
>
> I'm hoping to avoid pagination but...
>
> Note that the there is _lots_ of pagination in the new server. see
>
> http://test.musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438/recordings
>
> and with an RDFa approach - i'd reckon we would have to deal with
> pagination in the model, otherwise how do you find all the data?
>
> with a content negotiation approach, perhaps we can get away with some
> optimized super queries.
>
> /me goes to investigate further...

I really come out of nowhere on this one, and I am totally unsync with
what MBZ is planning to do.

However, why are they thinking to generating RDF dynamically with huge
SQL queries?

I mean, there are many options out there: static dump, rdf views, etc.

Also, if they want to inject RDFa at mbz.org, when would they have to
send additional SQL queries if they theorically already have all the
data in hands that is used to generate the content from within the pages?

Thanks for helping me understanding what is at sake here :)


Thanks,


Fred

Kurt J

unread,
Aug 4, 2010, 2:48:32 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Fred,

On Wed, Aug 4, 2010 at 1:17 PM, Frederick Giasson <fr...@fgiasson.com> wrote:
>  Hi Kurt,
>
>> Again, the big issue here is that the MusicBrainz dev team isn't happy
>> about making huge SQL queries to build an all encompassing RDF page
>> for an artist that is associated with thousands and thousands of
>> works.  This will make MusicBrainz fall over (they fear).  We have to
>> work within the confines of what MB dev is willing to do.
>>
>> I'm hoping to avoid pagination but...
>>
>> Note that the there is _lots_ of pagination in the new server.  see
>>
>>
>> http://test.musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438/recordings
>>
>> and with an RDFa approach - i'd reckon we would have to deal with
>> pagination in the model, otherwise how do you find all the data?
>>
>> with a content negotiation approach, perhaps we can get away with some
>> optimized super queries.
>>
>> /me goes to investigate further...
>
> I really come out of nowhere on this one, and I am totally unsync with what
> MBZ is planning to do.
>
> However, why are they thinking to generating RDF dynamically with huge SQL
> queries?
> I mean, there are many options out there: static dump, rdf views, etc.

the idea is that

curl -H "Accept: application/rdf+xml" http://musicbrainz.org/artist/<mbid>

will give you RDF about the resource - totally up-to-date with the
most recent MB edits - no dump, no replication required.

>
> Also, if they want to inject RDFa at mbz.org, when would they have to send
> additional SQL queries if they theorically already have all the data in
> hands that is used to generate the content from within the pages?
>
> Thanks for helping me understanding what is at sake here :)

not all the info is retrieved for the artist page in one go. each
page of the paginated results fires a new limit/offset SQL query as i
understand - still getting to grips with all the Perl code :/

make sense?

-Kurt J

Yves Raimond

unread,
Aug 4, 2010, 3:52:54 PM8/4/10
to music-ontology-sp...@googlegroups.com

Hello!

Just my 2c before diving back into wedding planning :) pagination is absolutely fine in that case, don't try to avoid it! I had to put some very strong limit into place on the dbtune instance: some composers are associated with thousands of recordings, and it is enough to bring the db to its knees.

Another way is to split each doc according to some other criteria (personal info somewhere, list of recordings somewhere else etc.) but it will get messy with the current mbz uris.

The simplest way you can do that is to add a rdfs:seeAlso link from page i to page i+1...(it is also the way we do it on bbc programmes)

y

On 4 Aug 2010 20:00, "Kurt J" <kur...@gmail.com> wrote:

Thnx all,

Again, the big issue here is that the MusicBrainz dev team isn't happy
about making huge SQL queries to build an all encompassing RDF page
for an artist that is associated with thousands and thousands of
works.  This will make MusicBrainz fall over (they fear).  We have to
work within the confines of what MB dev is willing to do.

I'm hoping to avoid pagination but...

Note that the there is _lots_ of pagination in the new server.  see

http://test.musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438/recordings

and with an RDFa approach - i'd reckon we would have to deal with
pagination in the model, otherwise how do you find all the data?

with a content negotiation approach, perhaps we can get away with some
optimized super queries.

/me goes to investigate further...

-Kurt J


On Wed, Aug 4, 2010 at 10:55 AM, Bob Ferris <za...@elbklang.net> wrote:

> Am 04.08.2010 17:50, schri...

Bob Ferris

unread,
Aug 4, 2010, 3:59:50 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi all,

Am 04.08.2010 21:52, schrieb Yves Raimond:
> Hello!
>
> Just my 2c before diving back into wedding planning :) pagination is
> absolutely fine in that case, don't try to avoid it! I had to put some
> very strong limit into place on the dbtune instance: some composers are
> associated with thousands of recordings, and it is enough to bring the
> db to its knees.
>
> Another way is to split each doc according to some other criteria
> (personal info somewhere, list of recordings somewhere else etc.) but it
> will get messy with the current mbz uris.
>
> The simplest way you can do that is to add a rdfs:seeAlso link from page
> i to page i+1...(it is also the way we do it on bbc programmes)

Yeah, at the beginning I also thought rdfs:seeAlso would be fine.
However, I thought that it might be to general later ;)
rdfs:seeAlso is applied and can be applied for all more or less relevant
things. What will happen, if there is another need to make use of it?
How would you filter/get the specific pagination link=
Maybe a more specific hint would do it, e.g. ex:continues or
ex:graph_continues, or?

Cheers,


Bob

Kurt J

unread,
Aug 4, 2010, 5:45:42 PM8/4/10
to music-ontology-sp...@googlegroups.com
On Wed, Aug 4, 2010 at 2:59 PM, Bob Ferris <za...@elbklang.net> wrote:
> Hi all,
>
> Am 04.08.2010 21:52, schrieb Yves Raimond:
>>
>> Hello!
>>
>> Just my 2c before diving back into wedding planning :) pagination is
>> absolutely fine in that case, don't try to avoid it! I had to put some
>> very strong limit into place on the dbtune instance: some composers are
>> associated with thousands of recordings, and it is enough to bring the
>> db to its knees.
>>
>> Another way is to split each doc according to some other criteria
>> (personal info somewhere, list of recordings somewhere else etc.) but it
>> will get messy with the current mbz uris.
>>
>> The simplest way you can do that is to add a rdfs:seeAlso link from page
>> i to page i+1...(it is also the way we do it on bbc programmes)

thnx Yves. Congrats and good luck!

> Yeah, at the beginning I also thought rdfs:seeAlso would be fine. However, I
> thought that it might be to general later ;)
> rdfs:seeAlso is applied and can be applied for all more or less relevant
> things. What will happen, if there is another need to make use of it? How
> would you filter/get the specific pagination link=
> Maybe a more specific hint would do it, e.g. ex:continues or
> ex:graph_continues, or?

Bob, i'm inclined to agree with you. rdfs:seeAlso seems a bit vague
for this. I think the xhv:next, xhv:first, xhv:last, etc might be the
best way to handle this. See:

http://www.w3.org/1999/xhtml/vocab/#

Any thoughts on the xhv solution?

-Kurt J

Bob Ferris

unread,
Aug 4, 2010, 5:59:34 PM8/4/10
to music-ontology-sp...@googlegroups.com

> Bob, i'm inclined to agree with you. rdfs:seeAlso seems a bit vague
> for this. I think the xhv:next, xhv:first, xhv:last, etc might be the
> best way to handle this. See:
>
> http://www.w3.org/1999/xhtml/vocab/#
>
> Any thoughts on the xhv solution?

Yeah, I wasn't aware of that this namespace also delivers a RDFS
description (which could be better design, anyway :) ). Seems that it
fits perfectly into the pagination use case. Haven't seen it in an
example yet. However, I would vote for it ;)

Hence, +1

Cheers,


Bob

Michael Smethurst

unread,
Aug 4, 2010, 6:18:02 PM8/4/10
to music-ontology-sp...@googlegroups.com
just a late night thought that's possibly off topic and armagnac influenced so feel free to ignore...

the problem as i understand it is that classical composers are currently linked to releases rather than works (then on to releases). which leads to lots and lots of very long list views. the musicbrainz ngs introduces works (and performances etc)

from a bog standard html user experience pov i'd say if an artist has associated works under the new schema then link to work list (and from there to release list) else link to release list (or a list of releases that aren't linked thru works). the problem is really that a list of releases for a composer is uninteresting. it's uninteresting for users (as we've found on bbc.co.uk/music) and i always tend to take that as a hint that it's probably uninteresting to machines (or the people sat behind them). or at least what you'd do for users should influence what you'd do in data views. one web and all that

again from a non-rdf ux pov i always tend to think that pagination is the option you take when there's no other route to slice the data

which is not to disagree with yves (i'd never dare!). sometimes pagination is the only option and the rdfs:seeAlso works for that. but if there is another way to cut the data then take that route. and the point of the ngs is to provide those other routes?!?

http://www.w3.org/1999/xhtml/vocab/#

-Kurt J

--

You received this message because you are subscribed to the Google Groups "Music Ontology Specification Group" group.
To post to this group, send email to music-ontology-sp...@googlegroups.com.
To unsubscribe from this group, send email to music-ontology-specific...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/music-ontology-specification-group?hl=en.


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

winmail.dat

Kurt J

unread,
Aug 4, 2010, 6:30:55 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Michael,

On Wed, Aug 4, 2010 at 5:18 PM, Michael Smethurst
<Michael....@bbc.co.uk> wrote:
> just a late night thought that's possibly off topic and armagnac influenced so feel free to ignore...
>
> the problem as i understand it is that classical composers are currently linked to releases rather than works (then on to releases). which leads to lots and lots of very long list views. the musicbrainz ngs introduces works (and performances etc)
>
> from a bog standard html user experience pov i'd say if an artist has associated works under the new schema then link to work list (and from there to release list) else link to release list (or a list of releases that aren't linked thru works). the problem is really that a list of releases for a composer is uninteresting. it's uninteresting for users (as we've found on bbc.co.uk/music) and i always tend to take that as a hint that it's probably uninteresting to machines (or the people sat behind them). or at least what you'd do for users should influence what you'd do in data views. one web and all that
>
> again from a non-rdf ux pov i always tend to think that pagination is the option you take when there's no other route to slice the data
>
> which is not to disagree with yves (i'd never dare!). sometimes pagination is the only option and the rdfs:seeAlso works for that. but if there is another way to cut the data then take that route. and the point of the ngs is to provide those other routes?!?

a good point. this classical composer problem should be helped by
this. but i'm not sure it's implemented quite how you're imagining.

http://test.musicbrainz.org/artist/24f1766e-9635-4d58-a4d4-9413f9f98a4c/recordings

Bach is associated with a helluva lotta recordings. In fact more
recordings than works. Given he died in 1750 - he obviously did not
_make_ these recordings so perhaps he should not be associated with
them.

however, this is not the current state of NGS as far as i can tell.

/me goes to prod on #musicbrainz irc

-kurt j

Kurt J

unread,
Aug 4, 2010, 6:38:19 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hello again,

On Wed, Aug 4, 2010 at 5:30 PM, Kurt J <kur...@gmail.com> wrote:
> Hi Michael,
>
> On Wed, Aug 4, 2010 at 5:18 PM, Michael Smethurst
> <Michael....@bbc.co.uk> wrote:
>> just a late night thought that's possibly off topic and armagnac influenced so feel free to ignore...
>>
>> the problem as i understand it is that classical composers are currently linked to releases rather than works (then on to releases). which leads to lots and lots of very long list views. the musicbrainz ngs introduces works (and performances etc)
>>
>> from a bog standard html user experience pov i'd say if an artist has associated works under the new schema then link to work list (and from there to release list) else link to release list (or a list of releases that aren't linked thru works). the problem is really that a list of releases for a composer is uninteresting. it's uninteresting for users (as we've found on bbc.co.uk/music) and i always tend to take that as a hint that it's probably uninteresting to machines (or the people sat behind them). or at least what you'd do for users should influence what you'd do in data views. one web and all that
>>
>> again from a non-rdf ux pov i always tend to think that pagination is the option you take when there's no other route to slice the data
>>
>> which is not to disagree with yves (i'd never dare!). sometimes pagination is the only option and the rdfs:seeAlso works for that. but if there is another way to cut the data then take that route. and the point of the ngs is to provide those other routes?!?
>
> a good point.  this classical composer problem should be helped by
> this.  but i'm not sure it's implemented quite how you're imagining.
>
> http://test.musicbrainz.org/artist/24f1766e-9635-4d58-a4d4-9413f9f98a4c/recordings
>
> Bach is associated with a helluva lotta recordings.  In fact more
> recordings than works.  Given he died in 1750 - he obviously did not
> _make_ these recordings so perhaps he should not be associated with
> them.
>
> however, this is not the current state of NGS as far as i can tell.
>
> /me goes to prod on #musicbrainz irc
>

nikki pointed out on IRC this was a decision by the
editors/contributors to musicbrainz. i guess they want to see an
overwhelmingly huge number of recordings associated with Bach.

but nikki also notes, bands that have tones of bootlegs like the
Grateful Dead would still require some pagination

http://test.musicbrainz.org/artist/6faa7ca7-0d99-4a5e-bfa6-1fd5037520c6/recordings

^^ some busy and methodical DeadHeads on MusicBrainz ;-)

-Kurt J

Gregg Kellogg

unread,
Aug 4, 2010, 6:52:37 PM8/4/10
to music-ontology-sp...@googlegroups.com
Borrowing from RDFa, using xhv:next, xhv:prev would make sense as a way to move between pages.

Gregg

Michael Smethurst

unread,
Aug 4, 2010, 7:39:04 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Kurt

apologies for repeated top posts; on webmail

i think the rather general point i was trying to make is that "how best to describe pagination in rdf" question felt like the wrong starting point and "how best to avoid pagination" felt like a better one

always tend to think that designing data views is as much a user experience issue as designing html views. partly cos i make my money as a user experience ""professional"" and partly cos part of the time i'm a punter clicking links on pages and part of the time i'm a punter consuming data views. and almost always the views i want as a user are the same views i want as hacker

really think linked data has a very valuable role to play in bog standard user experience if only at the level of getting the data model to agree with user's mental models. imo it has to be about designing data on a human scale. or "the bones of the model should show thru in the views" as someone i can't remember once said

i'd like to think that if you play this right ull not only end up adding rdf to brainz but also making brainz a better all round website

given:
http://test.musicbrainz.org/artist/24f1766e-9635-4d58-a4d4-9413f9f98a4c/recordings
i wonder whether ngs is not implemented in the way i imagine or whether there's a lot of legacy data that predates ngs. basically is this a model problem or a data migration / normalisation problem? given brainz is moving to a more expressive schema the natural assumption is that some data migration has to happen to fill that model out. the brainz community might be suspicious not because they don't trust the new model but because they don't quite trust some hackers they don't quite know to put their lovingly curated data in the right place...

anyway, if i were making a website out of this i'd have:

/artists/:artist - an artist
/artists/:artist/works - a list of works composed by the artist
/artists/:artist/releases - a list of releases associated with the artist that aren't linked thru works

/works/:work - a work
/works/:work/releases - a list of releases containing that work

none of this implies that you won't need pagination. sadly you always end up needing pagination (if only to list the members of the fall). but just wondering if there's a way to make brainz easier for people and easier for hackers

also not meaning to make work for you. but where there's a more expressive schema usually someone's gotta migrate the data :-)

anyway /me throws confetti in the general direction of yves :-)

goodnight

winmail.dat

Kurt J

unread,
Aug 4, 2010, 10:07:18 PM8/4/10
to music-ontology-sp...@googlegroups.com
Hi Michael,

>
> apologies for repeated top posts; on webmail
>

i ain't mad.

> i think the rather general point i was trying to make is that "how best to describe pagination in rdf" question felt like the wrong starting point and "how best to avoid pagination" felt like a better one

point taken.

> always tend to think that designing data views is as much a user experience issue as designing html views. partly cos i make my money as a user experience ""professional"" and partly cos part of the time i'm a punter clicking links on pages and part of the time i'm a punter consuming data views. and almost always the views i want as a user are the same views i want as hacker
>
> really think linked data has a very valuable role to play in bog standard user experience if only at the level of getting the data model to agree with user's mental models. imo it has to be about designing data on a human scale. or "the bones of the model should show thru in the views" as someone i can't remember once said
>
> i'd like to think that if you play this right ull not only end up adding rdf to brainz but also making brainz a better all round website
>

let's hope :-)

> given:
> http://test.musicbrainz.org/artist/24f1766e-9635-4d58-a4d4-9413f9f98a4c/recordings
> i wonder whether ngs is not implemented in the way i imagine or whether there's a lot of legacy data that predates ngs.

i think it was actually a deliberate decision to keep classical
composers as credited for recordings. that's what i was led to
believe on IRC. but the NGS model would allow them to just be
credited in works. maybe when people (the editros) become more
comfortable with NGS, they'll make that switch.

/me needs to dig thru the musicbrainz mailing list archives

> basically is this a model problem or a data migration / normalisation problem?
> given brainz is moving to a more expressive schema the natural assumption is
> that some data migration has to happen to fill that model out. the brainz community might be suspicious not because they don't trust the new model but because they don't quite trust some hackers they don't quite know to put their lovingly curated data in the right place...

yes that's it i think.

>
> anyway, if i were making a website out of this i'd have:
>
> /artists/:artist - an artist
> /artists/:artist/works - a list of works composed by the artist
> /artists/:artist/releases - a list of releases associated with the artist that aren't linked thru works
>

mostly test.MB.org works this way

> /works/:work - a work
> /works/:work/releases - a list of releases containing that work
>

it doesn't seem to quite do this. we need to start a thread on the
mbz-dev list perhaps. that'd seem a critical link to getting editors
to accept crediting works instead of recordings.

> none of this implies that you won't need pagination. sadly you always end up needing pagination (if only to list the members of the fall). but just wondering if there's a way to make brainz easier for people and easier for hackers
>
> also not meaning to make work for you. but where there's a more expressive schema usually someone's gotta migrate the data :-)

naw, we like work :-)

> anyway /me throws confetti in the general direction of yves :-)

yes confetti for Yves and his lovely bride!!

-kurt j

Joshan Mahmud

unread,
Aug 5, 2010, 4:19:18 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi All
 
At the risk of sounding  a bit silly could I ask how you would handle the pagination if built into the RDF schema as the data (all the triples change)?  So if you had Beethoven's initial symphonies on page 1 but then forgot a whole of symphonies that are added to the list (say somewhere in the data and not to the end) would the paginations triples have to shift?  Or would it be the case that you add triples on to the end of a dataset and then add pagination triples as you go along?
 
Cheers!
J


--

George Fazekas

unread,
Aug 5, 2010, 6:10:04 AM8/5/10
to music-ontology-sp...@googlegroups.com

(not talking of experience) I wonder what are the arguments against
keeping
a native RDF store in sync instead? Wouldn't that be more scalable
and perhaps even more flexible in the future?

>
>
>>
>> Also, if they want to inject RDFa at mbz.org, when would they have
>> to send
>> additional SQL queries if they theorically already have all the
>> data in
>> hands that is used to generate the content from within the pages?
>>
>> Thanks for helping me understanding what is at sake here :)
>
> not all the info is retrieved for the artist page in one go. each
> page of the paginated results fires a new limit/offset SQL query as i
> understand - still getting to grips with all the Perl code :/
>
> make sense?
>
> -Kurt J
>

Frederick Giasson

unread,
Aug 5, 2010, 8:34:52 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi!

>
> Just my 2c before diving back into wedding planning :) pagination is
> absolutely fine in that case, don't try to avoid it! I had to put some
> very strong limit into place on the dbtune instance: some composers
> are associated with thousands of recordings, and it is enough to bring
> the db to its knees.
>
> Another way is to split each doc according to some other criteria
> (personal info somewhere, list of recordings somewhere else etc.) but
> it will get messy with the current mbz uris.
>
> The simplest way you can do that is to add a rdfs:seeAlso link from
> page i to page i+1...(it is also the way we do it on bbc programmes)
>

Good, I buy this. Even if other systems could be used to handle the
amount of data we are currently talking about, there will always be a
usecase where it will be too much. Also you ensure that smaller systems
will be able to discover all the data (if wanted) given their limited
resources (think about smart phones agents, etc).

I prefer the idea of seeAlso than talking about "pagination". Even if
the end result is the same, the concepts are quite different (and even
misleading in a LD context). As Yves said, there are better chances that
data get splitted in themes more than in pages.

Thanks!


Fred

Frederick Giasson

unread,
Aug 5, 2010, 8:40:06 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi Kurt

> the idea is that
>
> curl -H "Accept: application/rdf+xml" http://musicbrainz.org/artist/<mbid>
>
> will give you RDF about the resource - totally up-to-date with the
> most recent MB edits - no dump, no replication required.
>

Ok good, thanks for the clarification. However, it is exactly what RDF
views are about, no (and 3 or 4 years ago, we did a proof of concept
that was working quite properly; we were just missing some usage
testings. With their replication feature, standalone nodes could be 100%
in sync delivering RDF without affecting their main network. Also, such
a network would *easily* be scalable since simple standalone nodes could
be spawned and load-balanced on-demand.

The other advantage is that you only have to maintain views and not code.


But well, there are probably a reason since it nevers took off (on mbz's
side) in all this time :)


Thanks,


Fred

Kurt J

unread,
Aug 5, 2010, 10:14:33 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi Gregg,

> Borrowing from RDFa, using xhv:next, xhv:prev would make sense as a way to
move between pages.

thnx for your input! vote noted.

-kurt j

Kurt J

unread,
Aug 5, 2010, 10:18:40 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi Fred

I've never seen views work that well for MusicBrainz. Maybe i'm
wrong. Do they deal with the heavy pagination in the HTML? Can you
point me to a working example? Also afaik, no modeling (other than
Ian's) has dealt with NGS yet or Advanced Relationships. And our work
will be integrated into the mb_server code base (at very least code
for de-refing MB URIs to get RDF). This is why we got a grant to do
the work :-)

-Kurt J

Kurt J

unread,
Aug 5, 2010, 10:29:39 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi George,

> (not talking of experience) I wonder what are the arguments against keeping
> a native RDF store in sync instead? Wouldn't that be more scalable
> and perhaps even more flexible in the future?

this might be done for the SPARQL endpoint. the problems with this
are mostly on the MusicBrainz side. they've got a shoe-string budget
to run this complicated write-heavy webapp. they hate replication.
and they're not too keen on running their own triple store along side
postgres (altho i might talk'em into it yet).

one of our main goals is to get MB URIs to de-ref as linked data.
(hopefully) this then solves the issue of which MB mapping do I use
(zitgist, dbtune, dataincubator). then MB really does become the
universal lingua franca for artists albums and tracks in linked data.
imo, this is best done with some small code in mb_server.

-kurt

Frederick Giasson

unread,
Aug 5, 2010, 10:29:54 AM8/5/10
to music-ontology-sp...@googlegroups.com
Hi Kurt!

> I've never seen views work that well for MusicBrainz. Maybe i'm
> wrong. Do they deal with the heavy pagination in the HTML? Can you
Years ago when Zitgist was operating, we put Virtuoso RDF Views in front
of a running postgre instance that was using mbz's synching mechanism.
The only thing that was needing maintenance, was the rdf view when MBZ
was making changes to their RDB schemas.

> point me to a working example? Also afaik, no modeling (other than
I have the view somewhere, I am not sure if it is still up-and-running
at Zitgist (now maintained directly by OpenLinks).

> Ian's) has dealt with NGS yet or Advanced Relationships. And our work
> will be integrated into the mb_server code base (at very least code
> for de-refing MB URIs to get RDF). This is why we got a grant to do
> the work :-)

Well, I am not in the grant business, but I guess it then make sense for
MBZ to move into that direction (it probably has to go where the money is).

Thanks!


Fred

Kingsley Idehen

unread,
Aug 5, 2010, 7:11:11 PM8/5/10
to music-ontology-sp...@googlegroups.com, Patrick van Kleef
Frederick Giasson wrote:
> Hi Kurt!
>> I've never seen views work that well for MusicBrainz. Maybe i'm
>> wrong. Do they deal with the heavy pagination in the HTML? Can you
> Years ago when Zitgist was operating, we put Virtuoso RDF Views in
> front of a running postgre instance that was using mbz's synching
> mechanism. The only thing that was needing maintenance, was the rdf
> view when MBZ was making changes to their RDB schemas.
Fred,

What you describe above is only going to be better, very soon.

We are syncing Virtuoso and PostgreSQL and then syncing RDF Views with
the Quad Store. Thus, you basically end up with the MBZ equivalent of
DBpedia-Live.

>> point me to a working example? Also afaik, no modeling (other than
> I have the view somewhere, I am not sure if it is still up-and-running
> at Zitgist (now maintained directly by OpenLinks).

See comment above.

>> Ian's) has dealt with NGS yet or Advanced Relationships. And our work
>> will be integrated into the mb_server code base (at very least code
>> for de-refing MB URIs to get RDF). This is why we got a grant to do
>> the work :-)
>
> Well, I am not in the grant business, but I guess it then make sense
> for MBZ to move into that direction (it probably has to go where the
> money is).

Anyway, we will soon release an update of what we've maintained etc..

Kingsley
>
> Thanks!
>
>
> Fred
>


--

Regards,

Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Kingsley Idehen

unread,
Aug 5, 2010, 7:18:38 PM8/5/10
to music-ontology-sp...@googlegroups.com

With MBZ as the publisher there shouldn't be any need for replication.
Basically, that should be an RDF View.

Now if you want to perform faceted browsing at scale, guess what, you
will have to replicate between the SQL based RDF View and the Quad Store
via Materialized RDF Views. Anyway, these are the sort of things we
will be unleashing re. MBZ since the RDF Views themselves are 3+ years
old (as per comments from Fred already).


>
>> Also, if they want to inject RDFa at mbz.org, when would they have to send
>> additional SQL queries if they theorically already have all the data in
>> hands that is used to generate the content from within the pages?
>>
>> Thanks for helping me understanding what is at sake here :)
>>
>
> not all the info is retrieved for the artist page in one go. each
> page of the paginated results fires a new limit/offset SQL query as i
> understand - still getting to grips with all the Perl code :/
>
> make sense?
>

Hmmm.

You are not going to offer a substrate for faceted browsing, that
scales, down this path.

BTW - how are you planning to produce your RDF based Linked Data?
Logically, you have to use RDF Views, but I am not sensing views from
your comments.


Kingsley
> -Kurt J

Frederick Giasson

unread,
Aug 6, 2010, 1:44:22 PM8/6/10
to music-ontology-sp...@googlegroups.com
Hi Kingsley!

> What you describe above is only going to be better, very soon.
>
> We are syncing Virtuoso and PostgreSQL and then syncing RDF Views with
> the Quad Store. Thus, you basically end up with the MBZ equivalent of
> DBpedia-Live.

Good :)


Thanks,


Fred

Kurt J

unread,
Aug 6, 2010, 3:04:01 PM8/6/10
to music-ontology-sp...@googlegroups.com
Hi Kingsley,

Unfortunately, I think the MusicBrainz project doesn't have the
resources to run a Virtuoso instance along side its existing stack.
At least they are not willing to commit such resources at this time.
Is this what you mean when you say use RDF Views? This is a Virtuoso
feature, am I mis-understanding?

> Now if you want to perform faceted browsing at scale, guess what, you will
> have to replicate between the SQL based RDF View and the Quad Store via
> Materialized RDF Views.  Anyway, these are the sort of things we will be
> unleashing re. MBZ since the RDF Views themselves are 3+ years old (as per
> comments from Fred already).


>>
>>
>>>
>>> Also, if they want to inject RDFa at mbz.org, when would they have to
>>> send
>>> additional SQL queries if they theorically already have all the data in
>>> hands that is used to generate the content from within the pages?
>>>
>>> Thanks for helping me understanding what is at sake here :)
>>>
>>
>> not all the info is retrieved for the artist page in one go.  each
>> page of the paginated results fires a new limit/offset SQL query as i
>> understand - still getting to grips with all the Perl code :/
>>
>> make sense?
>>
>
> Hmmm.
>
> You are not going to offer a substrate for faceted browsing, that scales,
> down this path.
>
> BTW - how are you planning to produce your RDF based Linked Data? Logically,
> you have to use RDF Views, but I am not sensing views from your comments.
>

Again, we have to be very light-weight wrt to the MusicBrainz server.
Can we use RDF Views w/o running the entire Virtuoso stack? Maybe
this is the best option? Plz help me understand :-)

-Kurt J

Bob Ferris

unread,
Aug 6, 2010, 3:28:36 PM8/6/10
to music-ontology-sp...@googlegroups.com
Hi,

Am 06.08.2010 21:04, schrieb Kurt J:

[snip]

>>
>> With MBZ as the publisher there shouldn't be any need for replication.
>> Basically, that should be an RDF View.
>
> Unfortunately, I think the MusicBrainz project doesn't have the
> resources to run a Virtuoso instance along side its existing stack.
> At least they are not willing to commit such resources at this time.
> Is this what you mean when you say use RDF Views? This is a Virtuoso
> feature, am I mis-understanding?
>
>> Now if you want to perform faceted browsing at scale, guess what, you will
>> have to replicate between the SQL based RDF View and the Quad Store via
>> Materialized RDF Views. Anyway, these are the sort of things we will be
>> unleashing re. MBZ since the RDF Views themselves are 3+ years old (as per
>> comments from Fred already).
>
>

[snip]

>>
>> You are not going to offer a substrate for faceted browsing, that scales,
>> down this path.
>>
>> BTW - how are you planning to produce your RDF based Linked Data? Logically,
>> you have to use RDF Views, but I am not sensing views from your comments.
>>
>
> Again, we have to be very light-weight wrt to the MusicBrainz server.
> Can we use RDF Views w/o running the entire Virtuoso stack? Maybe
> this is the best option? Plz help me understand :-)

I would also vote for a Virtuoso something similar for a propert Quad
Store. However, as I understand it right the dereferenceable URIs might
be at high priority, or?
Furthermore, as far as I know, RDF Views are a Virtuoso feature, but if
someone will do a mapping from NGS to RDF than it also end up as an RDF
View, or? So there are different methods to create Semantic Graph Views
on top of an RDBMS.
Re. server issues. I think a GDBMS server for a Quad Store should be
more or less financed by the customers, which like to make heavy use of
this information resource, or? That means, MBZ will push update
notifications to the GDBMS and it will retrieve the updates as needed, or?
I hope that there are resources to provide a proper GDBMS on top of MBZ.
Talking about resource issues is important, but it shouldn't block the
development at all, or?
Go future!

Cheers,


Bob

Kurt J

unread,
Aug 6, 2010, 3:34:17 PM8/6/10
to music-ontology-sp...@googlegroups.com

yes, the goal is to get MB URIs de-refing cheaply and quickly for the
NGS release. We're very late in the dev cycle so it can't require a
massive overhaul of what's already been coded for NGS.

But you're also correct - it is a mistake to get bogged down by constraints :-)


> Cheers,
>
>
> Bob

Kingsley Idehen

unread,
Aug 6, 2010, 8:01:21 PM8/6/10
to music-ontology-sp...@googlegroups.com
Kurt,

Virtuoso is a very lightweight product that smartly delivers a lot of
functionality.

What are your assumptions about Virtuoso resource requirements?

RDF Views are what you get via Virtuoso, D2RQ, and Triplify. Naturally,
the functionality of the end product varies re. performance,
scalability, and other factors.

Kingsley Idehen

unread,
Aug 6, 2010, 8:05:15 PM8/6/10
to music-ontology-sp...@googlegroups.com
Bob,

Virtuoso gives you a GDBMS, Web Server, and Virtual DBMS in on
offerring. Ultimately you need all three aspects to make a scalable and
dynamic Linked Data space as we currently demonstrate with DBpedia-Live.
What you see with DBpedia-Live will also be done with MBZ data. Also via
MBZ data space you will see the effect of RDF Views that sync with the
Quad Store thereby enabling high-performance and scalable faceted
browsing etc..

Links:

1. http://dbpedia-live.openlinksw.com/live -- DBpedia Live Edition .

Dan Brickley

unread,
Aug 7, 2010, 10:22:24 AM8/7/10
to music-ontology-sp...@googlegroups.com
On Wed, Aug 4, 2010 at 4:51 PM, Kurt J <kur...@gmail.com> wrote:
> Hello,
>
> In digging into the MusicBrainz world, it seems pretty clear we will
> have to deal with some pagination in RDF.  For example, certain
> artists (e.g. Bach or Mozart) have several thousands of items
> associated with them.  Creating one monster HTML or RDF doc for this
> is too much strain on the DB server.
>
> It seems we'll have to deal with pagination in the RDF.  I've found a
> vocab [1] and a semantic overflow post [2] about this, but that's the
> extent of my knowledge.  I also recall that Ian was having some
> trouble with huge RDF docs timing out in his dataincubator.
>
> Any thoughts?

Just quick reply for now to note that this was also an issue mentioned
a few years back by SixApart folk, who then ran livejournal.com and
found themselves generating giant FOAF files for celebrities and large
groups. I'm not sure how (if at all) this was resolved, but might be
worth digging there (or in the issue tracker for the open source
LiveJournal codebase). I'll have a hunt for old emails too...

Dan

> Thanks,
> Kurt J
>
> [1] http://code.google.com/p/linked-data-api/wiki/API_Viewing_Resources#Page_Description
> [2] http://www.semanticoverflow.com/questions/1398/are-there-terms-for-describing-pagination

Kurt J

unread,
Aug 7, 2010, 11:33:40 AM8/7/10
to music-ontology-sp...@googlegroups.com

Hi Kingsley,

>
> Virtuoso is a very lightweight product that smartly delivers a lot of
> functionality.
>
> What are your assumptions about Virtuoso resource requirements?

Yes it is, I didn't mean to discount it. We will consider it more
thoroughly, but we still have to work within the constraints of what
MusicBrainz is willing to do.

> RDF Views are what you get via Virtuoso, D2RQ, and Triplify. Naturally, the
> functionality of the end product varies re. performance, scalability, and
> other factors.

And these too. Thanks so much for your input!

-Kurt J

> --
>
> Regards,
>
> Kingsley Idehen       President & CEO OpenLink Software     Web:
> http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>

Kurt J

unread,
Aug 7, 2010, 11:49:17 AM8/7/10
to music-ontology-sp...@googlegroups.com
On Sat, Aug 7, 2010 at 9:22 AM, Dan Brickley <dan...@danbri.org> wrote:
> On Wed, Aug 4, 2010 at 4:51 PM, Kurt J <kur...@gmail.com> wrote:
>> Hello,
>>
>> In digging into the MusicBrainz world, it seems pretty clear we will
>> have to deal with some pagination in RDF.  For example, certain
>> artists (e.g. Bach or Mozart) have several thousands of items
>> associated with them.  Creating one monster HTML or RDF doc for this
>> is too much strain on the DB server.
>>
>> It seems we'll have to deal with pagination in the RDF.  I've found a
>> vocab [1] and a semantic overflow post [2] about this, but that's the
>> extent of my knowledge.  I also recall that Ian was having some
>> trouble with huge RDF docs timing out in his dataincubator.
>>
>> Any thoughts?
>
> Just quick reply for now to note that this was also an issue mentioned
> a few years back by SixApart folk, who then ran livejournal.com and
> found themselves generating giant FOAF files for celebrities and large
> groups. I'm not sure how (if at all) this was resolved, but might be
> worth digging there (or in the issue tracker for the open source
> LiveJournal codebase). I'll have a hunt for old emails too...

Thanks Dan. Appreciate the help!

Reply all
Reply to author
Forward
0 new messages