Using Stardog for the EDG repository

80 views
Skip to first unread message

Daniel Lavoie

unread,
Jul 30, 2019, 9:02:05 AM7/30/19
to TopBraid Suite Users
Hi,

We want to use our triple store database as the EDG repository. We are using Stardog version 6.1.0. I have tried to find how to setup an external triple store as EDG's repository but I was not really successful. 

I have seen the Application data storage section in the documention but found nothing that is not referring to either TBD or Marklogic databases. What about the others and Stardog specifically?

Could you please provide the installation and configuration steps. Or point me to the right documentation.

Thanks
Daniel

Irene Polikoff

unread,
Jul 30, 2019, 9:36:54 AM7/30/19
to topbrai...@googlegroups.com
Hi Daniel,

TopBraid EDG needs to apply rules for permission management, support workflows (virtual sandboxes), audit trails, infer new values, validate data, dynamically combine graphs using owl:imports, generate lineage presentation and perform other types of business logic. It is not practical to implement this functionality using plain SPARQL as the only API to a database. 

Thus, for EDG to work on top of multiple triple stores a different solution would need to be implemented for each one - taking into account its specific capabilities, supported functions, APIs, etc.. At the moment we only support two options for EGD Repository: Apache Jena TDB (two flavors) or a triple store based on RDBMS of your choice. The latter is for smaller deployments.

Current RDF databases landscape has many options with no clear market leader. From the roadmap perspective, we are monitoring market developments. If a leader emerges that captures over 50% of the market, we plan to support it as a repository option in EDG.

Today, EDG can offer a solution where data is located in its repository as well as in another triple store. Using EDG Data Platform (replication server - https://www.topquadrant.com/technology/topbraid-data-platform/), changes in the data in EDG will be automatically synchronized (pushed) to an external triple store as micro transactions. 

If there are also changes in data relevant to EDG that are being made directly to the remote triple store outside of EDG and real time replication is important, then the store would need to provide some change services to plugin to the replication framework or have some other means of updating EDG.


--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/f7386d6a-6dde-45bb-8a34-12650d4bab41%40googlegroups.com.

Daniel Lavoie

unread,
Jul 31, 2019, 8:40:00 AM7/31/19
to TopBraid Suite Users
Hi Irene,

Thanks for your fast answer yesterday. It has helped me clarifiy the situtation.  And I would like to explain you more what we are trying to do.

The overall goal is to make our  design and governance environment (TBC and EDG) and our implementation environment (Stardog and Semaphore) communicate together. First, let's talk about going from EDG to Stardog.

For example, we want to propagate the ontologies finalized in EDG to our Stardog Triple Store. And we want to use GIT, in the middle, to version control the ontologies and package them with the other objects in releases. So in summary, we want to go from EDG to GIT and finally to Stardog.  

In EDG, there is an Export function that allow us to get a turtle file for each ontology.  Is there any API we can use to execute the export function? Or is there a way to get directly in GIT?

Also, we a re going to use named graphs to store the ontologies in Stardog. We are not going to create a different graph for each ontology but we will group them logically. Is there a way to indicate in EDG in which named graph an ontology should be loaded in the target triple store?

Second, the other way around, we would like to make Stardog data viewable in EDG. For example, to allow our governance team to profile the information. Should we use data graphs for that? Or what else would you suggest?

Thanks

On Tuesday, July 30, 2019 at 9:36:54 AM UTC-4, Irene Polikoff wrote:
Hi Daniel,

TopBraid EDG needs to apply rules for permission management, support workflows (virtual sandboxes), audit trails, infer new values, validate data, dynamically combine graphs using owl:imports, generate lineage presentation and perform other types of business logic. It is not practical to implement this functionality using plain SPARQL as the only API to a database. 

Thus, for EDG to work on top of multiple triple stores a different solution would need to be implemented for each one - taking into account its specific capabilities, supported functions, APIs, etc.. At the moment we only support two options for EGD Repository: Apache Jena TDB (two flavors) or a triple store based on RDBMS of your choice. The latter is for smaller deployments.

Current RDF databases landscape has many options with no clear market leader. From the roadmap perspective, we are monitoring market developments. If a leader emerges that captures over 50% of the market, we plan to support it as a repository option in EDG.

Today, EDG can offer a solution where data is located in its repository as well as in another triple store. Using EDG Data Platform (replication server - https://www.topquadrant.com/technology/topbraid-data-platform/), changes in the data in EDG will be automatically synchronized (pushed) to an external triple store as micro transactions. 

If there are also changes in data relevant to EDG that are being made directly to the remote triple store outside of EDG and real time replication is important, then the store would need to provide some change services to plugin to the replication framework or have some other means of updating EDG.
On Jul 30, 2019, at 8:48 AM, Daniel Lavoie <daniel....@gmail.com> wrote:

Hi,

We want to use our triple store database as the EDG repository. We are using Stardog version 6.1.0. I have tried to find how to setup an external triple store as EDG's repository but I was not really successful. 

I have seen the Application data storage section in the documention but found nothing that is not referring to either TBD or Marklogic databases. What about the others and Stardog specifically?

Could you please provide the installation and configuration steps. Or point me to the right documentation.

Thanks
Daniel

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbrai...@googlegroups.com.

Irene Polikoff

unread,
Jul 31, 2019, 9:29:33 AM7/31/19
to topbrai...@googlegroups.com
Please see responses below

On Jul 31, 2019, at 8:40 AM, Daniel Lavoie <daniel.j...@gmail.com> wrote:

Hi Irene,

Thanks for your fast answer yesterday. It has helped me clarifiy the situtation.  And I would like to explain you more what we are trying to do.

The overall goal is to make our  design and governance environment (TBC and EDG) and our implementation environment (Stardog and Semaphore) communicate together.

OK. As an aside, TopBraid EDG offers a module for content tagging https://www.topquadrant.com/products/topbraid-tagger-autoclassifier/

First, let's talk about going from EDG to Stardog.

For example, we want to propagate the ontologies finalized in EDG to our Stardog Triple Store. And we want to use GIT, in the middle, to version control the ontologies and package them with the other objects in releases. So in summary, we want to go from EDG to GIT and finally to Stardog.  

In EDG, there is an Export function that allow us to get a turtle file for each ontology.  Is there any API we can use to execute the export function? Or is there a way to get directly in GIT?

Pretty much everything you can do in EDG UI is available as a service. When you click on Export -> Turtle, note the URI in the browser. It will be something like:


This is the API you can use.

The beginning has server address. I used localhost.

Then, you have the URIs of the asset collection (graph) you are exporting as the value of base and projectGraph parameters. I used the example geo ontology that is available in the samples project.


Also, we a re going to use named graphs to store the ontologies in Stardog. We are not going to create a different graph for each ontology but we will group them logically. Is there a way to indicate in EDG in which named graph an ontology should be loaded in the target triple store?

I would create a new property to capture this for an ontology in EDG.


Second, the other way around, we would like to make Stardog data viewable in EDG. For example, to allow our governance team to profile the information. Should we use data graphs for that? Or what else would you suggest?

If the data is in SKOS and follows assumptions EDG puts on the use of SKOS, then you could use Taxonomies. Assumptions are: 1. There is a concepts scheme and it has some top concepts; 2. skos:broader (as opposed to narrower) is used for the hierarchical relationship

If the data is reference data and follows assumptions EDG puts on reference data, then you could use Reference Datasets. Assumption is: 1. Each resource has a property that is locally unique - a code and URI of the resource is derived from this value

Otherwise, Data Graphs is the most general type of asset collection that can be used for arbitrary data.


Thanks

On Tuesday, July 30, 2019 at 9:36:54 AM UTC-4, Irene Polikoff wrote:
Hi Daniel,

TopBraid EDG needs to apply rules for permission management, support workflows (virtual sandboxes), audit trails, infer new values, validate data, dynamically combine graphs using owl:imports, generate lineage presentation and perform other types of business logic. It is not practical to implement this functionality using plain SPARQL as the only API to a database. 

Thus, for EDG to work on top of multiple triple stores a different solution would need to be implemented for each one - taking into account its specific capabilities, supported functions, APIs, etc.. At the moment we only support two options for EGD Repository: Apache Jena TDB (two flavors) or a triple store based on RDBMS of your choice. The latter is for smaller deployments.

Current RDF databases landscape has many options with no clear market leader. From the roadmap perspective, we are monitoring market developments. If a leader emerges that captures over 50% of the market, we plan to support it as a repository option in EDG.

Today, EDG can offer a solution where data is located in its repository as well as in another triple store. Using EDG Data Platform (replication server - https://www.topquadrant.com/technology/topbraid-data-platform/), changes in the data in EDG will be automatically synchronized (pushed) to an external triple store as micro transactions. 

If there are also changes in data relevant to EDG that are being made directly to the remote triple store outside of EDG and real time replication is important, then the store would need to provide some change services to plugin to the replication framework or have some other means of updating EDG.


On Jul 30, 2019, at 8:48 AM, Daniel Lavoie <daniel....@gmail.com> wrote:

Hi,

We want to use our triple store database as the EDG repository. We are using Stardog version 6.1.0. I have tried to find how to setup an external triple store as EDG's repository but I was not really successful. 

I have seen the Application data storage section in the documention but found nothing that is not referring to either TBD or Marklogic databases. What about the others and Stardog specifically?

Could you please provide the installation and configuration steps. Or point me to the right documentation.

Thanks
Daniel

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbrai...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/f7386d6a-6dde-45bb-8a34-12650d4bab41%40googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/6e4aac1b-1012-4a85-a409-533dcd9180a6%40googlegroups.com.

Daniel Lavoie

unread,
Jul 31, 2019, 3:23:39 PM7/31/19
to TopBraid Suite Users
Thanks again Irene,

I have tested the export service and it has worked just fine. I would just want to clarify one last thing. When I was referring to viewing information stored in Stardog, I was thinking of assertions not schemas. From your answer, I believe we should use Data Graph for that and nothing else. Right?

Regards
Daniel


On Wednesday, July 31, 2019 at 9:29:33 AM UTC-4, Irene Polikoff wrote:
Please see responses below

Irene Polikoff

unread,
Jul 31, 2019, 4:00:42 PM7/31/19
to topbrai...@googlegroups.com
Hi Daniel,

I assume that by assertions you mean data - as opposed to schema. Correct?

I am clarifying because the traditional use of “assertion” is in the context of “asserted statements vs inferred statements” - irrespective if these statements capture schema level information or data.

Schemas should be stored in ontologies. Data can be stored in any of the other type of collections (graphs). 

EDG displays information based on schemas - they determine what fields to show on the forms and how, what columns are available for tables, what properties can be used as filters for search criteria, etc. So, yes, data should go into one of the other types of collections, but schema for the data should be available to EDG.

To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/6535d53a-8d04-471c-b80c-396b2d3f2d87%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages