Possibly minor issues with BBC and Cyc

25 views
Skip to first unread message

Andreas Harth

unread,
Aug 25, 2011, 11:19:19 AM8/25/11
to pedant...@googlegroups.com
Dear fellow pedants,

I came across two issues and I'm unsure whether I should push them.

1) BBC's content negotiation seems borken:

$ rapper -c
"http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
rapper: Parsing URI
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with
parser rdfxml
rapper: Error - URI
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 -
Using property attribute 'lang' without a namespace is forbidden.
rapper: Error - URI
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 -
Resolving URI failed: Failed writing body (0 != 2736)
rapper: Failed to parse URI
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 rdfxml
content
rapper: Parsing returned 1 triple
$

Adding an .rdf to the filename works, but then the thing-source correspondence
is lost.

2) OpenCyc returns text/xml as content-type (e.g., at [2]), and I would like
them to return application/rdf+xml that I don't have to feed all text/xml
files that I get into an RDF/XML parser.

Best regards,
Andreas.

[1] http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0
[2] http://sw.opencyc.org/concept/Mx4r-T6OkHdRS-eUiqO5n8NA1g

Damian Steer

unread,
Aug 25, 2011, 11:37:18 AM8/25/11
to pedant...@googlegroups.com

On 25 Aug 2011, at 16:19, Andreas Harth wrote:

> Dear fellow pedants,
>
> I came across two issues and I'm unsure whether I should push them.
>
> 1) BBC's content negotiation seems borken:
>
> $ rapper -c "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
> rapper: Parsing URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with parser rdfxml
> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 - Using property attribute 'lang' without a namespace is forbidden.
> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 - Resolving URI failed: Failed writing body (0 != 2736)
> rapper: Failed to parse URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 rdfxml content
> rapper: Parsing returned 1 triple
> $

Without more information -- namely what accept rapper is sending -- it's not clear that it's broken. My rapper has rdfa parsing built in, so it might well be asking for html.

I'll see if I can find out. In the meantime try:

$ rapper -c -g "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
rapper: Parsing URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with parser guess
raptor_guess.c:113:raptor_guess_parse_content_type_handler: Got content type 'text/html'
rapper: Guessed parser name 'rdfa'
rapper: Error - - XML parser error: AttValue: " or ' expected
rapper: Error - - XML parser error: attributes construct error
rapper: Error - - XML parser error: Specification mandate value for attribute og:image
rapper: Error - - XML parser error: attributes construct error
rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 - Resolving URI failed: Failed writing body (0 != 2896)
rapper: Failed to parse URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 guess content
rapper: Parsing returned 13 triples

Damian

Damian Steer

unread,
Aug 25, 2011, 11:50:31 AM8/25/11
to pedant...@googlegroups.com

On 25 Aug 2011, at 16:37, Damian Steer wrote:

>
> On 25 Aug 2011, at 16:19, Andreas Harth wrote:
>
>> Dear fellow pedants,
>>
>> I came across two issues and I'm unsure whether I should push them.
>>
>> 1) BBC's content negotiation seems borken:
>>
>> $ rapper -c "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
>> rapper: Parsing URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with parser rdfxml
>> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 - Using property attribute 'lang' without a namespace is forbidden.
>> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 - Resolving URI failed: Failed writing body (0 != 2736)
>> rapper: Failed to parse URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 rdfxml content
>> rapper: Parsing returned 1 triple
>> $
>
> Without more information -- namely what accept rapper is sending -- it's not clear that it's broken. My rapper has rdfa parsing built in, so it might well be asking for html.

(Apologies for the formatting, via ngrep)

GET /music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 HTTP/1.1..Host: www
.bbc.co.uk..Accept: application/rdf+xml, text/rdf;q=0.6, text/plain;q=0.1,
text/turtle, application/x-turtle, application/turtle, text/n3;q=0.3, text/
rdf+n3;q=0.3, application/rdf+n3;q=0.3, application/x-trig, application/rss
;q=0.8, application/rss+xml;q=0.8, text/rss;q=0.8, application/xml;q=0.3, t
ext/xml;q=0.3, application/atom+xml;q=0.3, text/html;q=0.2, application/xht
ml+xml;q=0.4, text/html;q=0.6, application/xhtml+xml;q=0.8, text/x-nquads,
*/*;q=0.1

which looks fine to me. (x)html is well down the list.

And, indeed:

$ curl -I -H 'Accept: application/rdf+xml' http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0
HTTP/1.1 200 OK
Date: Thu, 25 Aug 2011 15:46:24 GMT
Server: Apache
Vary: Accept
Cache-Control: no-cache
Content-Type: text/html
Transfer-Encoding: chunked
Set-Cookie: BBC-UID=84ce357636bee5c0429f8949b050c91a7aecb2ec108040bc72c9830fa9824cbf0curl%2f7%2e21%2e6%20%28x86%5f64%2dapple%2ddarwin10%2e7%2e0%29%20libcurl%2f7%2e21%2e6%20OpenSSL%2f1%2e0%2e0d%20zlib%2f1%2e2%2e5%20libidn%2f1%2e22; expires=Mon, 24-Aug-15 15:46:24 GMT; path=/; domain=bbc.co.uk;

Damian

Richard Cyganiak

unread,
Aug 25, 2011, 2:39:03 PM8/25/11
to pedant...@googlegroups.com
On 25 Aug 2011, at 16:19, Andreas Harth wrote:
> 1) BBC's content negotiation seems borken:
>
> $ rapper -c "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
> rapper: Parsing URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with parser rdfxml
> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 - Using property attribute 'lang' without a namespace is forbidden.
> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 - Resolving URI failed: Failed writing body (0 != 2736)
> rapper: Failed to parse URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 rdfxml content
> rapper: Parsing returned 1 triple
> $
>
> Adding an .rdf to the filename works, but then the thing-source correspondence
> is lost.

As Damian said, the server is returning HTML, so it's not clear that anything is really wrong.

> 2) OpenCyc returns text/xml as content-type (e.g., at [2]), and I would like
> them to return application/rdf+xml that I don't have to feed all text/xml
> files that I get into an RDF/XML parser.

Yeah, it's worth mentioning to them. See also:
http://pedantic-web.org/fops.html#contenttype

As a heuristic, you can scan the first 5k or so for the RDF namespace URI. If it occurs, it's worth throwing an RDF/XML parser at it.

(Any23 has these heuristics built-in for all its supported RDF syntaxes, and it makes life much easier.)

Best,
Richard

Andreas Harth

unread,
Aug 25, 2011, 3:50:08 PM8/25/11
to jo...@cycfoundation.org, pedant...@googlegroups.com
Dear John,

many thanks for making Cyc available on the Semantic Web!

There is on small issue though: would it be possible to serve the RDF/XML
files with a "application/rdf+xml" content type rather than the more generic
"text/xml" one? That way, systems that use your data (and other people's)
can directly use the right parser.

If you need direction regarding the Apache configuration let me know.

Cheers,
Andreas.

aidan.hogan

unread,
Aug 25, 2011, 8:22:53 PM8/25/11
to pedant...@googlegroups.com
On 25/08/2011 19:39, Richard Cyganiak wrote:
> On 25 Aug 2011, at 16:19, Andreas Harth wrote:
>> 1) BBC's content negotiation seems borken:
>>
>> $ rapper -c "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
>> rapper: Parsing URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 with parser rdfxml
>> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 - Using property attribute 'lang' without a namespace is forbidden.
>> rapper: Error - URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 - Resolving URI failed: Failed writing body (0 != 2736)
>> rapper: Failed to parse URI http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 rdfxml content
>> rapper: Parsing returned 1 triple
>> $
>>
>> Adding an .rdf to the filename works, but then the thing-source correspondence
>> is lost.
>
> As Damian said, the server is returning HTML, so it's not clear that anything is really wrong.

I'm a bit confused about this... they already have an RDF/XML
description [1], which describes the artist [2]. It seems a waste not to
have [2] dereference to [1] when "Accept: application/rdf+xml" is
specified... or am I missing something?

Cheers,
Aidan

[1]
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0.rdf
[2]
http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0#artist

Damian Steer

unread,
Aug 26, 2011, 5:23:56 AM8/26/11
to pedant...@googlegroups.com
On 26/08/11 01:22, aidan.hogan wrote:

> I'm a bit confused about this... they already have an RDF/XML
> description [1], which describes the artist [2]. It seems a waste not to
> have [2] dereference to [1] when "Accept: application/rdf+xml" is
> specified... or am I missing something?

A brain dump in lieu of actual thought on my part:

a) Content negotiating from [1] to the content of [2] would be ideal, I
agree. Client's wishes are fulfilled.

b) It's a shame that there seems to be no way to find [2] from [1]. A
link alternate would be useful.

c) [1] and [2] appear to contain the same triples (give or take some
rdfa artifacts).

So I agree, it is a waste. However I have mixed feelings about content
negotiation as a way to discover [2] and consider b) more problematic.

Damian

Bob Ferris

unread,
Aug 26, 2011, 6:16:05 AM8/26/11
to Nicholas Humfrey, pedant...@googlegroups.com
Hi,

here are some news regarding the CN issue of BBC.

Cheers


Bo


On 8/26/2011 12:07 PM, Nicholas Humfrey wrote:
> Hi Bob,
>
> Yes, content negotiation is currently broken. Hope to have it fixed (and
> actually doing proper content negotiation, rather than string matching) in
> the next release.
>
> nick.
>
>
> On 26/08/2011 09:04, "Bob Ferris"<za...@smiy.org> wrote:
>
>> Hi Nic,
>>
>> I don't know, whether you are already aware of the following ongoing
>> discussion on the pedantic web list. If not, please feel free to
>> interact there. AFAIK, you are dealing with these issues at BBC, or?
>>
>> Cheers,
>>
>>
>> Bob


>>
>>
>> -------- Original Message --------
>> Subject: [pedantic-web] Possibly minor issues with BBC and Cyc
>> Date: Thu, 25 Aug 2011 17:19:19 +0200
>> From: Andreas Harth<and...@harth.org>
>> Reply-To: pedant...@googlegroups.com
>> To: pedant...@googlegroups.com
>>
>> Dear fellow pedants,
>>
>> I came across two issues and I'm unsure whether I should push them.
>>

>> 1) BBC's content negotiation seems borken:
>>
>> $ rapper -c
>> "http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0"
>> rapper: Parsing URI
>> http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0
>> with
>> parser rdfxml
>> rapper: Error - URI
>> http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0:1 -
>> Using property attribute 'lang' without a namespace is forbidden.
>> rapper: Error - URI
>> http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0 -
>> Resolving URI failed: Failed writing body (0 != 2736)
>> rapper: Failed to parse URI
>> http://www.bbc.co.uk/music/artists/4cf1eab6-0a14-4ab0-8c11-38d4157f91e0
>> rdfxml
>> content
>> rapper: Parsing returned 1 triple
>> $
>>
>> Adding an .rdf to the filename works, but then the thing-source
>> correspondence
>> is lost.
>>

>> 2) OpenCyc returns text/xml as content-type (e.g., at [2]), and I would like
>> them to return application/rdf+xml that I don't have to feed all text/xml
>> files that I get into an RDF/XML parser.
>>

> nick.

Reply all
Reply to author
Forward
0 new messages