Excel Import Question - Disambiguation

23 views
Skip to first unread message

Fan Li

unread,
Aug 11, 2020, 10:57:50 AM8/11/20
to TopBraid Suite Users
Suppose I am dealing with financial data related to two retail companies "W" and "T" and each of them has a set of departments.

"W":
  • Apparel
  • Electronics
  • Sports
"T":
  • Clothes
  • Electronics
  • Sports & Recreation
As you see, some departments from different companies may have the same name.

I have modeled the companies and the departments as instances of Organizations and OrganizationalUnits and their relations similar to the W3C Organization Ontology (the specific ontology choice probably doesn't matter here).

Now I have the following financial data to import into EDG:
Annotation 2020-08-11 105105.png
I am using the "pattern-based" import strategy and trying to match the company and department entities based on their label. The challenge I am facing is that some department names are ambiguous (e.g. "Electronics") and EDG would, understandably, get it wrong sometimes.

I am asking the community what is the best practice to solve this problem. Thank you!



Irene Polikoff

unread,
Aug 11, 2020, 11:04:20 AM8/11/20
to topbrai...@googlegroups.com
A property you use to find matches for related resources must have unique across all instances of the related class (e.g., Department) in the asset collection you are importing into.

If there is no property that is unique, then property based match will not work.  You need to populate the Department column with the actual URIs of these resources.

Hope this helps,

Irene

On Aug 11, 2020, at 10:57 AM, Fan Li <lifa...@gmail.com> wrote:

Suppose I am dealing with financial data related to two retail companies "W" and "T" and each of them has a set of departments.

"W":
  • Apparel
  • Electronics
  • Sports
"T":
  • Clothes
  • Electronics
  • Sports & Recreation
As you see, some departments from different companies may have the same name.

I have modeled the companies and the departments as instances of Organizations and OrganizationalUnits and their relations similar to the W3C Organization Ontology (the specific ontology choice probably doesn't matter here).

Now I have the following financial data to import into EDG:
<Annotation 2020-08-11 105105.png>
I am using the "pattern-based" import strategy and trying to match the company and department entities based on their label. The challenge I am facing is that some department names are ambiguous (e.g. "Electronics") and EDG would, understandably, get it wrong sometimes.

I am asking the community what is the best practice to solve this problem. Thank you!




--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/3aed4890-f945-4bbb-bb9d-3bcc52e551dcn%40googlegroups.com.
<Annotation 2020-08-11 105105.png>

Fan Li

unread,
Aug 11, 2020, 12:05:50 PM8/11/20
to TopBraid Suite Users
Hi Irene. Thanks for the confirmation. I was thinking if I should create a new property for Department and use SHACL rule to populate it with concatenated company and department names. Then I can use this new property for matching purpose only. Do you think if it would work?

Irene Polikoff

unread,
Aug 11, 2020, 12:32:14 PM8/11/20
to topbrai...@googlegroups.com
For this to work, the spreadsheet will need to have a column with these concatenated values.

As for the data in EDG, you can use SHACL rule to “infer” property values - concatenating a company name with the department. While the rule could be defined as a property value rule that infers a value on demand, I don’t know if such dynamic inference would work for the import matching logic.  You could confirm it by trying, but I suspect that for the match to work, you will need to execute the rules and store the results prior to running the import. Use the Transform tab to do this.  

Holger Knublauch

unread,
Aug 16, 2020, 7:45:03 PM8/16/20
to topbrai...@googlegroups.com

On 12/08/2020 02:32, Irene Polikoff wrote:

For this to work, the spreadsheet will need to have a column with these concatenated values.

As for the data in EDG, you can use SHACL rule to “infer” property values - concatenating a company name with the department. While the rule could be defined as a property value rule that infers a value on demand, I don’t know if such dynamic inference would work for the import matching logic.  You could confirm it by trying, but I suspect that for the match to work, you will need to execute the rules and store the results prior to running the import. Use the Transform tab to do this. 

The out-of-the-box spreadsheet importer of EDG will not use inferred values for matching (and this might be slow). As Irene said, you would first need to materialize the inferences.

If you are somehow familiar with JavaScript then 6.4 offers more flexibility for spreadsheet importing, including the ability to use inferred values and building helper data structures for efficient matching. See

http://www.datashapes.org/active/import.html

and in particular the demo video linked from that page

https://www.youtube.com/watch?v=Dn7O8siZpTc&feature=youtu.be

Holger


Reply all
Reply to author
Forward
0 new messages