query behaviour change from 2.2.[34] -> 3.0

0 views
Skip to first unread message

Conrad Leonard

unread,
Apr 2, 2015, 8:08:14 AM4/2/15
to sta...@clarkparsia.com
I have a pretty simple query (without reasoning) using property path to find all ancestors of entity specified by its type and label:

SELECT
        ?result ?parent
WHERE {
        ?result a :ReadGroup ;
                rdfs:label '''110620_SN0798_0069_BBBBBBBBXX.lane_6.ATCACG'''^^xsd:string ;
                :wasDerivedFrom + ?parent
}

In 2.2.3 and 2.2.4 this is correctly returning the following 3 results on my dataset:

+------------------------------------------------+--------------------------------------------------------+
|                     result                     |                         parent                         |
+------------------------------------------------+--------------------------------------------------------+
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | sequencinglibrary:a1fb96a4-4869-4a6e-936b-1ca988b1c098 |
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | collectedsample:600c2634-83dc-42eb-b685-0481f5187200   |
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | donor:8b52f6e9-55b3-4370-a97d-8242fecf23c3             |
+------------------------------------------------+--------------------------------------------------------+

Using exactly the same data loaded into 3.0 server & using 3.0 client to query I get just 1 result:

+------------------------------------------------+--------------------------------------------------------+
|                     result                     |                         parent                         |
+------------------------------------------------+--------------------------------------------------------+
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | sequencinglibrary:a1fb96a4-4869-4a6e-936b-1ca988b1c098 |
+------------------------------------------------+--------------------------------------------------------+

which is only the immediate ancestor of the :ReadGroup instance.

To investigate further I pruned down the initial large dataset in 3.0 to include just the three entities related by :wasDerivedFrom, and the weirdest thing... I get 3 results back! So it's something to do with the presence of the other data? CLI script I use to reproduce this 100% reliably:

#!/bin/sh
function toggle {
        /opt/stardog/$CURRENT/bin/stardog-admin server stop
        if [ $CURRENT = $NEW ]
        then
                CURRENT=$OLD
        else
                CURRENT=$NEW
        fi
        export STARDOG_HOME=./$CURRENT
        DBNAME=`echo $CURRENT|tr -d '\-.'`
        BINDIR=/opt/stardog/$CURRENT/bin
        $BINDIR/stardog-admin server start
        $BINDIR/stardog-admin db drop $DBNAME
        $BINDIR/stardog-admin db create -n $DBNAME $DATA
        echo "querying $CURRENT"
        $BINDIR/stardog query $DBNAME "
        SELECT
                ?result ?parent
        WHERE {
                ?result a :ReadGroup ;
                        rdfs:label '''110620_SN0798_0069_BBBBBBBBXX.lane_6.ATCACG'''^^xsd:string ;
                        :wasDerivedFrom + ?parent
        }"
}

NEW=stardog-3.0
OLD=stardog-2.2.3
CURRENT=$OLD
DATA=dump.ttl
toggle
toggle


I can provide obfuscated version of dump.ttl if you wish.

On an unrelated note I had trouble running query against the obfuscated data, getting the error:
com.complexible.common.rdf.query.parser.sparql.ast.ASTQName cannot be cast to com.complexible.common.rdf.query.parser.sparql.ast.ASTIRI


Fernando Hernandez

unread,
Apr 2, 2015, 10:07:13 AM4/2/15
to sta...@clarkparsia.com
Hi Conrad,

Can you send us both obfuscated and non-obfuscated versions of the data? We'd like to look into both issues.

You can share it privately if you prefer.

Cheers,
Fernando

Conrad Leonard

unread,
Apr 4, 2015, 9:10:03 AM4/4/15
to sta...@clarkparsia.com
Hi Fernando;

I'm happy to send the data if you give me an off-list means for doing so.

I've done some more testing and can show the following:

[conradL@slithytove ~]$ cat query.sparql 
SELECT
        ?result ?parent
WHERE {
        ?result a :ReadGroup ;
                rdfs:label '''110620_SN0798_0069_BBBBBBBBXX.lane_6.ATCACG'''^^xsd:string ;
                :wasDerivedFrom + ?parent
}
[conradL@slithytove ~]$ stardog-admin db drop test > /dev/null 2>&1; stardog-admin db create -n test 1.2-10433.ttl > /dev/null 2>&1; stardog query test query.sparql 2>/dev/null
+------------------------------------------------+--------------------------------------------------------+
|                     result                     |                         parent                         |
+------------------------------------------------+--------------------------------------------------------+
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | sequencinglibrary:a1fb96a4-4869-4a6e-936b-1ca988b1c098 |
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | collectedsample:600c2634-83dc-42eb-b685-0481f5187200   |
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | donor:8b52f6e9-55b3-4370-a97d-8242fecf23c3             |
+------------------------------------------------+--------------------------------------------------------+

Query returned 3 results in 00:00:00.072
[conradL@slithytove ~]$ stardog-admin db drop test > /dev/null 2>&1; stardog-admin db create -n test 1.2-10434.ttl > /dev/null 2>&1; stardog query test query.sparql 2>/dev/null
+------------------------------------------------+--------------------------------------------------------+
|                     result                     |                         parent                         |
+------------------------------------------------+--------------------------------------------------------+
| readgroup:9ade3f7f-833c-4fd1-b1bc-c702eb55b755 | sequencinglibrary:a1fb96a4-4869-4a6e-936b-1ca988b1c098 |
+------------------------------------------------+--------------------------------------------------------+

Query returned 1 results in 00:00:00.096
[conradL@slithytove ~]$ diff 1.2-10433.ttl 1.2-10434.ttl 10432a10433
> :wasDerivedFrom sequencinglibrary:d250259c-3da7-4d27-acd1-79926e2accf1 ;


in summary, I can create a dataset '1.2-10433.ttl' of about 9k triples on which the query performs as expected, and a second dataset with just a single additional triple (not involved in the query at all) on which the query does not work as expected. I don't think there's anything special about this triple though because adding it to the minimal dataset of four entites connected by :wasDerivedFrom still gives a dataset on which the query performs correctly. My best guess is some indexing hash collision isn't being handled properly.

C

Fernando Hernandez

unread,
Apr 4, 2015, 12:36:42 PM4/4/15
to sta...@clarkparsia.com
Conrad,

On Sat, Apr 4, 2015 at 9:10 AM, Conrad Leonard <conrad....@hotmail.com> wrote:
Hi Fernando;

I'm happy to send the data if you give me an off-list means for doing so.

I sent you a link to a private folder where you can share the data with us.
We will look into this.

Cheers,
Fernando

Michael Grove

unread,
Apr 7, 2015, 2:21:06 PM4/7/15
to stardog
Conrad,

Thanks for the detailed report, we were able to track down the issue and the fix will be included in the next release.

Cheers,

Mike

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Reply all
Reply to author
Forward
0 new messages