Importing spreadsheet and creating consistent URI/ID

43 views
Skip to first unread message

Kevin Guest

unread,
Apr 2, 2020, 5:57:47 PM4/2/20
to topbrai...@googlegroups.com
Hello TopBraid community,

I am relatively new to this tool, and am using it mostly for a glossary of terms and data warehouse metadata related to K-12 education analytics and reporting.  I am basically a solo librarian in a government office in British Columbia, Canada.
 

I don't understand how to have the ID/URI created for each glossary entry when I import using Excel.


I imported a set of terms successfully, but was leery about the URI part because it didn't have the namespace or term in the string. It looked like it could be overwritten by the next import, and it was.


For example, the glossary term Workforce Growth was imported from an Excel spreadsheet and it was assigned the URI urn:x-tb:evnimport#Row-0. There was no ID to toggle to. I was hoping it would look like this string below, as for the term Teacher:
URI https://educ.gov.bc.ca/glossary/business/Teacher
and
ID edw_business_glossary:Teacher


When I imported a new term from another Excel spreadsheet, that term was also assigned the URI urn:x-tb:evnimport#Row-0 and of course I was alerted to the error as it was adding the new data to the properties of the term Workforce Growth that had the same URI. So I deleted all the new data from that asset.


I had a suggestion from the TQ help desk that I can explore in our DEV environment, but thought I could get some insight from peers here.


Suggestion:

 

As a suggestion, you should take a look at how you have the URI construction rules set up for your collection (present when creating the collection; and also on the Manage tab for the collection), as well as the Unique Identifiers section in the spreadsheet importer.

In particular the URIs you have listed seem like you just need to set your URI constructions rules to reflect how you want your data organized.


Thanks,

Kevin



Irene Polikoff

unread,
Apr 2, 2020, 7:55:57 PM4/2/20
to topbrai...@googlegroups.com
Hi Kevin,

You should go to the Manage tab and set the URI construction method. For example, you can set it to “label”. I suspect that you have it set to “default” which may cause this issue on Import. 


With the “default”, when you edit in EDG, URIs are constructed using the label, but the spreadsheet Import dialog for the No Hierarchy pattern, does not offer you the option you need to control URI creation on import. This is inconsistent and we will address it in the next release.

If you make this change on the Manage tab, you would then be able to select a spreadsheet column to be used by EDG to generate a URI on import. 

For example, let’s say you have a column called “Term Name”. And in that column you “Teacher”.

The Unique Identifiers section of the import page will let you you pick values in the Term Name as the basis for the URIs. Just use the "Id column#1" drop down to select the column that contains the label e.g., Term Name - as shown below.

The section also lets you change the namespace used for the start of the URI. EDG will pre-populate the “Start of URIs’ field with the default namespace of your glossary. You could over-ride the default pre-population, but it sounds like you do not want to over-ride. If so, just keep it as is. 


This selection will ensure that new information for the same term is always added to existing information - because the URI will be generated based on the label/name column you select. You will get a URI http://itil.example.org/glossaries/Teacher for the row that has Teacher in the Term Name column.

This selection is independent from whatever selection you make in the Column Mapping section.

If the URI construction is not label based, EDG offers you another option to match to existing resource.

For example, let’s say it is UUID based. And let’s say a term for Teacher (with label = Teacher) was already in the glossary and had a URI such as  http://itil.example.org/glossaries/76eb499a-2c3e-11b2-8086-aa19931194c8. And you wanted to add some info to it e.g., definition.

You would then, on Import be able to select a column to use to match the imported data to already existing resources. 

In the Column Mapping section, you would need to map the "Term Name" column to the label property. 

The Unique identifiers section will look differently - see below. You would pick "Term Name" as the column to match on.



 

This will let EDG know that it should use data in the Term Name column and match it against the labels of already existing resources. If a match is found, EDG would re-use the existing URI. For the new terms that do not yet exist, it will create the URI using the generated unique identifiers.

There is no “Start of the URI” to select in this case. For new terms, the default namespace will be used. For existing ones, the match will determine the start of the URI.


Since you had the issue you are describing and got these “unhelpful” URIs with Row-o, etc., I would recommend clearing the data you imported. Then, selecting the URI construction method and following the approaches described here. This will ensure that you get the consistent URIs that can be used for matching.

I hope this helped.

Regards,

Irene

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/CAPQ9gY-XOxxu%2B_OEM9RTLjQGOLKUOnP2gFay9u%2BTBHj_qHqgZQ%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages