Writing to Triple Store

182 views
Skip to first unread message

Ron Koron

unread,
Nov 3, 2011, 4:08:26 PM11/3/11
to LDSpider
Hello,

I am new to this group and to linked data crawlers, so bear with me.

I am trying to get LDSpider (v. 1.1) to write out to a local Virtuoso
Triple Store using SPARQL/Update endpoint, but it is not working. I
keep getting response code 400, "Bad Request". I assume that it is
making a valid HTTP connection to the SPARQL endpoint (http://
localhost:8890/sparql) because I am not getting a "Connection Refused"
response.

In the SinkSparul public class, I see that it fails in method
endSparql() after _writer.close(). It looks to me that the SPARUL
request will look like this:
"request=INSERT+DATA+INTO+<graphUri>+INSERT+DATA+INTO+<graphUri>+
{subject predicate object .\n}".

Is this a valid SPARQL/Update request for Virtuoso endpoint? Is there
anything obvious that I am doing wrong?

From the virtuoso documentation (http://docs.openlinksw.com/virtuoso/
rdfsparql.html), it looks to me like the SPARUL request is valid, but
it does mention a Data Manipulation Language (DML) extension to
SPARUL. Is there another library that I need to download for DML
extensions?

Any help will be greatly appreciated. Thanks.

Ron Koron

Robert Isele

unread,
Nov 7, 2011, 6:13:40 AM11/7/11
to ldsp...@googlegroups.com
Hi Ron,

I checked the code and it seems that LDSpider used the wrong parameter
to submit the query ('request' instead of 'query'). Most triple stores
and the version of Virtuoso that I tested with understood the request
parameter, but it seems your's does not. I changed that and committed
the fix [1]. As the query looks correct to me this should already fix
the problem. We are using the same code in the Silk Link Discovery
Framework, so there shouldn't be anything fundamentally wrong.

Could you retest please? Let me know in case you are not building from
source and need an updated binary. And don't hesitate to contact me in
case it's still not working.

Cheers,
Robert

[1] http://code.google.com/p/ldspider/source/detail?r=287

Robert Isele

unread,
Nov 20, 2011, 9:31:47 AM11/20/11
to Ron Koron, ldsp...@googlegroups.com
Hi Ron,

I just ran a simple example with LDSpider and writing to local
Virtuoso [1] worked totally fine for me. Did you set the user roles
i.e. did you allow write access [2]? I'm not sure about this as I
didn't configure the Virtuoso by myself, but you might need to
explicitly allow write access.

As the error message you get doesn't help very much, I improved the
error reporting in the current trunk [3]. Could you update and retest?
LDSpider should now return the complete error message from the
Virtuoso which typically includes an error code which gives more clues
about the concrete problem.

Cheers,
Robert

[1] Version 06.01.3127, on Linux (x86_64-pc-linux-gnu), Single Edition
[2] http://docs.openlinksw.com/virtuoso/rdfgraphsecurity.html
[3] http://code.google.com/p/ldspider/source/detail?r=288

On Wed, Nov 9, 2011 at 8:56 PM, Ron Koron <rdk...@gmail.com> wrote:
> Robert,
>
> Thanks for the response to my problem.  I am building from source, and
> I incorported your change from "request=" to "query=".  However, I am
> still getting the same results (Bad Request).  &:-(
>
> I am wondering about all the "+" and "%3C" and "%3E" in the
> _writer.write commands.  My guess is that the %3C and %3E are for
> UTF-8 encoding of "<" and ">", but what is the "+" doing?
>
> Also, I adding a few _log.info statements to log what was being
> written to the HTTP connection and it looks just like the write
> commands from above. I did notice that it is reaching the maximum 200
> statements per request.  That's a lot of SPARUL code!
>
> I asksed the same question in the Virtuoso forum and all I got back
> was a link to their documentation.  From their documentation, it looks
> like a statements should look something like this:
>
>    "SPARQL INSERT INTO GRAPH <http://mygraph.com> {  <http://
> myopenlink.net/dataspace/Kingsley#this>
>                                                     <http://rdfs.org/
> sioc/ns#id>
>                                                     <Kingsley> };
>    callret-0
>    VARCHAR"
>
> That looks a bit different.  Should I make all the SPARUL requests
> look like that?
>
> BTW, I am using LDspider open source v.1.1 and open source Virtuoso
> 6.0.1.
>
> Thanks again for the help...
>
> Ron

>> > Ron Koron- Hide quoted text -
>>
>> - Show quoted text -

Ron Koron

unread,
Nov 21, 2011, 11:30:17 PM11/21/11
to LDSpider
Robert,

Thanks again for all your help. Your error reporting code has
confirmed my suspicions. I believe the problem is that LDSpider is
outputting blank nodes without the encompassing "<", ">" characters.
Since we last communicated, I have been trying to input some of the
statements directly in Virtuoso/Endpoint from my browser's URL, and I
found that there is header information that LDSpider storing as blank
nodes "_:header-xxxxxxxxx" as the object nodes.

I figured that the header information was the provenance data that is
set to true during instantiation of SinkSparul in Main.java. I tried
changing from "true" to "false" for includeProvenance parameter in
instantiating, but it still failed for the 1st two batches of 200
statements. However, the last batch of 197 statements loaded fine.
Yippee! With your help I have found that there are other statements
that have blank nodes for the object. The following is a sample error
message and the corresponding statement from the quad output file:

CONSOLE

WARNING: SPARQL/Update query on http://localhost:8890/sparql
failed. Error Message: 'Virtuoso 37000 Error SP031: SPARQL compiler:
Blank node '_:httpx3Ax2Fx2Fharthx2Eorgx2Fandreasx2Ffoafx2Erdfxxbnode2'
is not allowed in a constant clauseSPARQL query:define sql:big-data-
const 0 INSERT DATA INTO <http://harth.org/andreas/foaf.rdf>
{_:httpx3Ax2Fx2Fharthx2Eorgx2Fandreasx2Ffoafx2Erdfxxbnode2 <http://
www.w3.org/2002/07/owl#sameAs> <http://trust.mindswap.org/cgi-bin/
FilmTrust/foaf.cgi?user=pinheiro> .

QUAD FILE

<http://harth.org/andreas/foaf#ah> <http://xmlns.com/foaf/0.1/knows>
_:httpx3Ax2Fx2Fharthx2Eorgx2Fandreasx2Ffoafx2Erdfxxbnode0 <http://
harth.org/andreas/foaf.rdf> .

Note the following:

[1] Version 06.01.3127-pthreads for Darwin as of Nov 13 2011 (MacOS
X)
[2] Seed URI: http://harth.org/andreas/foaf.rdf
[3] I re-installed Virtuoso from scratch; all I did was give sparql
account "sparql update" group privileges as per Virtuoso
documentation.
[4] I have been able to query the <http://harth.org/andreas/foaf.rdf>
graph from SQL.

Thanks again for all your help. I will try to see if I can figure out
where to modify the code to add "<",">" to N3 nodes.

Later,

Ron

On Nov 20, 9:31 am, Robert Isele <robertis...@googlemail.com> wrote:
> Hi Ron,
>

Robert Isele

unread,
Nov 25, 2011, 5:14:02 AM11/25/11
to ldsp...@googlegroups.com
Hi Ron,

as I understand the SPARQL specification [1], blank nodes don't have
to be enclosed in angled brackets. I'm also happy test the exact
crawling job you are executing. Could you give me the exact parameters
and seed list of the crawl job which doesn't work for you, so I can
reproduce it locally?

Cheers,
Robert

[1] http://www.w3.org/TR/rdf-sparql-query/#BlankNodesInResults

Ron Koron

unread,
Nov 29, 2011, 12:02:05 AM11/29/11
to LDSpider
Robert,

Here's the parameters that I am using:

-c 1 -s seed.txt -oe http://localhost:8890/sparql

seed.txt:

http://harth.org/andreas/foaf.rdf

My thoughts were that blank nodes don't have angled brackets, but it
does seem that virtuoso endpoint does not like it without the
brackets. If I manually put in brackets and use endpoint URL, it
works. However, I could not find anything in the Virtuoso
documentation that specified that blank nodes need angled brackets.

Cheers,
Ron

On Nov 25, 5:14 am, Robert Isele <robertis...@googlemail.com> wrote:
> Hi Ron,
>

> >     WARNING: SPARQL/Update query onhttp://localhost:8890/sparql

Robert Isele

unread,
Dec 2, 2011, 7:23:34 AM12/2/11
to ldsp...@googlegroups.com
Hi Ron,

I guess if you enclose blank nodes in angled brackets, Virtuoso just
interprets it as a URI instead of an blank node. So if we change this
in LDSpider this will also change the semantics of the written data as
two independent documents using the same blank node identifier
suddenly use the same URI.

There might be a general problem with the blank node support in
SPARQL/Update queries in Virtuoso. Did you try to issue a query
containing a blank node manually (that is without enclosing brackets)?

Cheers,
Robert

Ron Koron

unread,
Dec 5, 2011, 4:40:02 PM12/5/11
to LDSpider
Hi Robert,

Yes, I tried it exactly as written by LDSpider. The only way to get
it to work is by encasing the blank nodes in angled brackets.
One interesting note is that when I query the graph using SQL in
Virtuoso, it displays all the nodes in each triple with no angled
brackets. I will try to give a sample when I get home.

Cheers,
Ron

On Dec 2, 7:23 am, Robert Isele <robertis...@googlemail.com> wrote:
> Hi Ron,
>

> I guess if you enclose blank nodes in angled brackets, Virtuoso just
> interprets it as a URI instead of an blank node. So if we change this
> in LDSpider this will also change the semantics of the written data as
> two independent documents using the same blank node identifier
> suddenly use the same URI.
>
> There might be a general problem with the blank node support in
> SPARQL/Update queries in Virtuoso. Did you try to issue a query
> containing a blank node manually (that is without enclosing brackets)?
>
> Cheers,
> Robert
>
>
>
> On Tue, Nov 29, 2011 at 6:02 AM, Ron Koron <rdko...@gmail.com> wrote:
> > Robert,
>
> > Here's the parameters that I am using:
>

> >       -c 1 -s seed.txt -oehttp://localhost:8890/sparql

> >> >> >> - Show quoted text -- Hide quoted text -

Benedikt Kämpgen

unread,
Nov 16, 2012, 4:34:27 AM11/16/12
to ldsp...@googlegroups.com
Hello,

Any progress on this issue of crawling blank nodes?

When I crawl using the current trunk version of LDSpider with commands "-c 1 -s seed.txt -oe http://localhost:8890/sparql, I get the error: "Error Message: 'Virtuoso 37000 Error SP031: SPARQL compiler: Blank node '_:httpx3Ax2Fx2Freferencex2Edatax2Egovx2Eukx2Fdocx2Fmonthx2F0001x2D12xxA0' is not allowed in a constant clause".

Thanks

Benedikt

Florian Kleedorfer

unread,
May 7, 2013, 9:25:39 AM5/7/13
to ldsp...@googlegroups.com
Hi,

I think I found a solution: It seems virtuoso doesn't like 'INSERT DATA INTO [graph] ..' but accepts 'INSERT INTO GRAPH [graph] ..'.

I've forked ldspider on github (as I need to experiment with it), and I've added a branch for the workaround:
https://github.com/researchstudio-sat/ldspider4won/tree/bugfix-virtuoso-blanknodes
here's the diff for SinkSparul.java
https://github.com/researchstudio-sat/ldspider4won/commit/7a777f6612968012256b00a25d1164714dac8769

Caution, the workaround is enabled by default, If it is to be included in ldspider, that would have to be done via a command line switch (and an additional constructor argument to the SinkSparul).

best
Florian



Reply all
Reply to author
Forward
0 new messages