Re: [stardog-users] Aligning individuals across feeds

0 views
Skip to first unread message

Héctor Pérez-Urbina

unread,
Jun 10, 2013, 12:07:54 PM6/10/13
to stardog
Dear Ben,

This is indeed the right direction as far as OWL is concerned. That being said, however, please note that key axioms produce owl:sameAs assertions, which in order to be meaningful require the reasoner to support equality reasoning.

Stardog currently doesn't support equality reasoning. Moreover, this particular type of reasoning is known to be problematic in terms of performance. I would recommend a different approach: I would take the value of the has_longStemName property and deterministically create the URI of the corresponding individual from it. In this way, individuals with the same value (coming from different sources) would share the same URI and no equality reasoning would be necessary.


On Fri, Jun 7, 2013 at 11:56 AM, Ben Whittam Smith <bened...@gmail.com> wrote:
I'm trying to align individual products across three data feeds. I'm looking for some advice on how best to do this.

One feed provides specification data on products; one provides test data; and the third provides pricing data. 

For each feed I've created a property on product called 'has_longStemName'. If the long stem name of a product in one feed matches that in another feed then they are the same product.

I'd like to use this property as a 'canonical ID' to enable me to:

1. Query across the products as if they were one data set;

2. Avoid duplicates when listing products (by grouping on long stem name).

I've assumed I can do this by:

1. Stating the product classes to be identical across feeds so:

<rdf:Description rdf:about="http://www.a.org/tvs2#Product">
        <owl:equivalentClass rdf:resource="http://www.b.org/tvr2#Product"/>
        <owl:equivalentClass rdf:resource="http://www.c.org/tvt2#Product"/>
</rdf:Description>

2. Stating for each feed that the property 'has_longStemName' can be used as a key. Here's the example from one feed:

<owl:Class rdf:about="&tvt2;Product">
        <owl:hasKey rdf:parseType="Collection">
            <rdf:Description rdf:about="http://www.c.org/tvt2#has_longStemName"/>
        </owl:hasKey>
</owl:Class>

3. And then stating that the property is identical across feeds so:

<rdf:Description rdf:about="http://www.a.org/tvs2#has_longStemName">
        <owl:equivalentProperty rdf:resource="http://www.b.org/tvr2#has_longStemName"/>
        <owl:equivalentProperty rdf:resource="http://www.c.org/tvt2#has_longStemName"/>
</rdf:Description>

Am I going in the right direction? Is there a better way of doing this?

I'd much appreciate some advice.

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
 
 



--
Best,
Héctor

Ben Whittam Smith

unread,
Jun 11, 2013, 5:07:24 PM6/11/13
to sta...@clarkparsia.com
Hi Hector,

Many thanks for the response. Your solution meets my requirements and is performant. But I feared you would advise against equality reasoning!

Although what you suggest fits well with my data it doesn't seem to hard to imagine scenarios where deterministically creating the URI will not match individuals across data sets.

Imagine one feed has two keys: long_StemName and has_EAN. Each key can then be matched to other data feeds - one on long_StemName and the another on has_EAN, thus linking individuals across three feed with two keys. Wouldn't I then be forced to rely on equality reasoning? Or am I missing something.

Ben

Héctor Pérez-Urbina

unread,
Jun 12, 2013, 11:31:44 AM6/12/13
to stardog
Dear Ben,

It's true; in some scenarios my solution would not work. In such cases, we might need to deal with equality in some other way. 

This does not mean, however, that we would be *forced* to use equality reasoning. One can imagine having some sort of ad hoc preprocessing to do entity resolution, or even managing the equality relation by some other means and simply assert it explicitly in the ontology. I'm not saying this is necessarily better, my point is simply that using a reasoner is not the only solution.

That being said, equality reasoning is something we have talked about and we might address it in a future release.

Kendall Clark

unread,
Jun 12, 2013, 11:34:44 AM6/12/13
to stardog
To take Hector's point a bit further, we've talked about integrating https://code.google.com/p/duke/ into Stardog in some way... In many cases what you want to do for entity resolution is something more statistical and then represent the outcomes of that process in OWL in some logically crisp way.

But you could always pass data through Duke as a preprocess step before putting in into Stardog. In that case you'd just "smush" all the properties into canonical individuals and not have to represent the equality explicitly. That might be better, depending on yr use cases.

Cheers,
Kendall

benedict.wh...@which.co.uk

unread,
Jul 10, 2013, 6:32:16 PM7/10/13
to sta...@clarkparsia.com
Thanks guys. Duke does look an option for murky data.
CONFIDENTIAL NOTICE 
This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is strictly prohibited. If you received this communication in error, please notify us by e-mail or by telephone (020 7770 7000) and then delete the e-mail and any copies of it. (v.9)

Which? is the business name of Which? Limited, registered in England and Wales No. 677665. Registered office: 2 Marylebone Road, London NW1 4DF.
Reply all
Reply to author
Forward
0 new messages