Multiple working copies in EDG 7.0 - no conflict resolution?

38 views
Skip to first unread message

Tim Smith

unread,
Feb 18, 2022, 3:51:39 PM2/18/22
to topbrai...@googlegroups.com
Hi,

I am exploring the robustness of change management within the EDG working copy process.  

When I open two workflows at the same time from the same production graph and make changes that are intended to conflict, I don't see any conflict detection or resolution.

For example:
In WF #1, I delete instance :inst1.  
In WF #2, I add the triple {:inst1 :getsDataFrom :inst2.}

I commit WF #1 first, deleting :inst1 and removing all triples about :inst1.

When I commit WF #2, the {:inst1 :getsDataFrom inst2 } triple is added but is essentially a "hanging" triple because everything else about :inst1 has been deleted by WF #1.

I was thinking that WF #2 would check to see if the production graph had been changed and surface all the changes between the WF #2 working copy and the current production graph, not the production graph WF #2 originated from.

I also noticed that after committing WF #1, when I ran the "See Changes/Comparison Report" in WF #2, the full URL for :inst1 was displayed instead of the label indicating that this report is "sort of" running against the new production graph created by committing WF #1 (i.e. the :inst1 rdfs:label triple is gone).  HOWEVER, I see the new triple as the only change even though other changes have occured to the production graph via WF #1.

Is my understanding of how EDG handles simultaneous workflows correct?  If so, should multiple working copies be permitted?  Can this process be changed so conflicts can be detected and resolved between multiple working copies (more like how GitHub works)

This is critical functionality to a large use case that I am exploring.

Thanks,

Tim

David Price

unread,
Feb 21, 2022, 4:19:25 AM2/21/22
to topbrai...@googlegroups.com
A working copy cannot “see” another in EDG. The word “copy” is misleading in that it is not a copy at all but is a set of changes to be applied. The UI makes the changes appear as real triples but they are not. So, not really comparable to git at all. 

Customers today handle this kind of issue by being reviewers or editors on both workflows and by business process management of work tasks (eg daily scrums and using chat tools).

There is work on this topic for the future so if you have detailed use case you can share it would be great to submit a feature request thru Support so it can be considered as part of that effort.

Cheers,
David

On 18 Feb 2022, at 20:51, Tim Smith <smith...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/CAF0WbnLbWWJoc%2BE3vOAiCvLRVtNhvUvBPiEJdH32dZ1McZu0nw%40mail.gmail.com.

Tim Smith

unread,
Feb 22, 2022, 8:42:09 AM2/22/22
to topbrai...@googlegroups.com
Hi David,

Thank you for the clarification.  My impression of TeamWorks was different.  This feels like "open loop control" of a closed loop process given that one workflow can overwrite another unless strict human processes are followed.  My use case requires branching and versioning with "github-like" functionality where multiple people/roles can edit the same graph with conflicting edits reconciled at commit time.  We have a need to have versions of both ontologies and instance data with compatible versions linked together.

Since all edits can be controlled through EDG (vs Git where the individual changes have to be detected and reconciled), maybe an extension of the TBC Diff engine might help here?

Anyway, I was hoping to use versioning/diff reconciliation to help justify the cost of EDG.  I do not have a TSM at the moment.  What is the best way to submit a feature request?

Thanks,

Tim

David Price

unread,
Feb 22, 2022, 9:57:00 AM2/22/22
to 'Felix Sasaki' via TopBraid Suite Users
Hi Tim,

A few further opinions below.

On 22 Feb 2022, at 13:41, Tim Smith <smith...@gmail.com> wrote:

Hi David,

Thank you for the clarification.  My impression of TeamWorks was different.  This feels like "open loop control" of a closed loop process given that one workflow can overwrite another unless strict human processes are followed.  My use case requires branching and versioning with "github-like" functionality where multiple people/roles can edit the same graph with conflicting edits reconciled at commit time. 

IMO thinking of ontologies as software artefacts is a good approach. However, assuming that ontologies (RDF graphs), can be versioned and diff'ed using the same approach as say Java source code (sequences of lines of text) does not seem to me like a good way to start. In RDF-land, even ontologies are actually data. The defiition of “conflicting edits” is also a big one as things that conflict in RDF-land can be completely unrelated as far as a normal diff tool is concerned.

We have a need to have versions of both ontologies and instance data with compatible versions linked together.

Of course, everyone has that requirement who’s ever used an ontology at all. However, a tool cannot do all the heavy lifting. For example, if your ontology changes to make existing data invalid then some sort of data migration/transformation is needed. 

The simple addition of "<x> sh:minCount 1” in a SHACL property shape  can make millions of data instances invalid and no diff tool will find that.

Only good business processes outside anything in EDG, or any other tool, can properly support this in the general case.


Since all edits can be controlled through EDG (vs Git where the individual changes have to be detected and reconciled), maybe an extension of the TBC Diff engine might help here?

As I mentioned, in a working copy you are not actually changing a graph. A working copy is a layer of changes (literally additions and deletions) to be applied over the production copy that the UI make look like normal triples when viewing thru the lense that is the working copy. That said, we are looking at potential improvements.


Anyway, I was hoping to use versioning/diff reconciliation to help justify the cost of EDG.  I do not have a TSM at the moment.  What is the best way to submit a feature request?

I’ll talk to you off forum.

Cheers,
David


Tim Smith

unread,
Feb 22, 2022, 12:32:49 PM2/22/22
to topbrai...@googlegroups.com
Hi David,

A few thoughts below.

On Tue, Feb 22, 2022 at 9:57 AM David Price <dpr...@topquadrant.com> wrote:
Hi Tim,

A few further opinions below.

On 22 Feb 2022, at 13:41, Tim Smith <smith...@gmail.com> wrote:

Hi David,

Thank you for the clarification.  My impression of TeamWorks was different.  This feels like "open loop control" of a closed loop process given that one workflow can overwrite another unless strict human processes are followed.  My use case requires branching and versioning with "github-like" functionality where multiple people/roles can edit the same graph with conflicting edits reconciled at commit time. 

IMO thinking of ontologies as software artefacts is a good approach. However, assuming that ontologies (RDF graphs), can be versioned and diff'ed using the same approach as say Java source code (sequences of lines of text) does not seem to me like a good way to start. In RDF-land, even ontologies are actually data. The defiition of “conflicting edits” is also a big one as things that conflict in RDF-land can be completely unrelated as far as a normal diff tool is concerned.


I was definitely not thinking of diff as in a textual difference operation.  My reference to Github was intended to be an analogy not a literal interpretation.  I believe the concepts of branching, diff'ing and merging apply both to code and graphs with the implementation being wildly different.  Comparing graphs at the triple level, including handling b-nodes properly was what I was thinking, thus my reference to the TBC Diff engine. 

 
We have a need to have versions of both ontologies and instance data with compatible versions linked together.

Of course, everyone has that requirement who’s ever used an ontology at all. However, a tool cannot do all the heavy lifting. For example, if your ontology changes to make existing data invalid then some sort of data migration/transformation is needed. 

The simple addition of "<x> sh:minCount 1” in a SHACL property shape  can make millions of data instances invalid and no diff tool will find that.

Only good business processes outside anything in EDG, or any other tool, can properly support this in the general case.


I wasn't thinking of a tool to manage the transformation of instance data from one ontology to another.  I agree, that would be a real challenge.  I was thinking of simply ensuring an instance graph can know what version of an ontology it was populating.  This requires defining versions of ontologies beyond putting a "v<X.x>" in the name of the ontology graph..  This could be inferred from the import statements or a specific predicate could be used to point to the defining ontology.  While I'm not looking for this capability here, I am a fan of using a transformation ontology to define the transformations and a SPARQL/SPIN/SHACL engine to execute them.  I have built such an ontology and have used it successfully a number of times.  I believe this was also the strategy behind the Spinmap Mapping technology in TBC.
 

Since all edits can be controlled through EDG (vs Git where the individual changes have to be detected and reconciled), maybe an extension of the TBC Diff engine might help here?

As I mentioned, in a working copy you are not actually changing a graph. A working copy is a layer of changes (literally additions and deletions) to be applied over the production copy that the UI make look like normal triples when viewing thru the lense that is the working copy. That said, we are looking at potential improvements.


I understand.  Fundamentally, I think of a "working copy" as originating from the production graph at a specific point in time (i.e. a "branch" in the version tree), even if the graph isn't duplicated.

I think the risk in the current design is that the layer of changes is currently defined as anything the user has entered as a change since the working copy was created.  As you said, working copies do not know what changes other working copies have committed to the production graph.  So if you make changes, and I make changes, if I commit first, your changes can overwrite mine and vice versa.  What's also confusing, currently, if this happens, your changes will show up in my working copy (because it's just loading the latest production graph).  BUT, when I run the Comparison Report in my working copy, which explicitly says "Compares this working copy with the production copy", it only shows the changes I have entered even though the production copy has changed since my workflow was initiated.  Maybe I worked with version control systems for too long, LOL???

While this can be managed at the work process level, it would be better to prevent the occurrence within EDG.  For example, when a Working Copy is displaying the changes, it could show all proposed changes (from my working copy) as well as all committed changes that have occured since the working copy was initiated (EDG should have the change history but maybe there are edge cases?).  Changes that conflict could be highlighted.  Conflict definitions could be defined in an ontology, much like the TBC Diff engine.

From TBC help:

The Diff Engine

TopBraid's Diff engine is entirely declarative and can be modified or extended to get customized output. The various diff classes have SPARQL queries attached to them via diff:rule. These queries are called by the engine to create the instances of this class. The GRAPH keyword of SPARQL is used to query the old graph versus the new graph. If you want to add your own kinds of diff outputs, then simply add such diff:rules.

The second step of the Diff engine is to call all spin:rules for the constructed instances of the diff classes. These rules can be used to post-process the raw output from the first step, e.g. to create human-readable labels. The same mechanism can be used to create higher-level diff objects from the lower-level triple change objects.

You can modify the diff.ttl file in your workspace to adjust the behavior of the diff engine for your needs. 

 

David Price

unread,
Feb 22, 2022, 2:41:55 PM2/22/22
to 'Felix Sasaki' via TopBraid Suite Users

On 22 Feb 2022, at 17:32, Tim Smith <smith...@gmail.com> wrote:

Hi David,

A few thoughts below.

On Tue, Feb 22, 2022 at 9:57 AM David Price <dpr...@topquadrant.com> wrote:
Hi Tim,

A few further opinions below.

On 22 Feb 2022, at 13:41, Tim Smith <smith...@gmail.com> wrote:

Hi David,

Thank you for the clarification.  My impression of TeamWorks was different.  This feels like "open loop control" of a closed loop process given that one workflow can overwrite another unless strict human processes are followed.  My use case requires branching and versioning with "github-like" functionality where multiple people/roles can edit the same graph with conflicting edits reconciled at commit time. 

IMO thinking of ontologies as software artefacts is a good approach. However, assuming that ontologies (RDF graphs), can be versioned and diff'ed using the same approach as say Java source code (sequences of lines of text) does not seem to me like a good way to start. In RDF-land, even ontologies are actually data. The defiition of “conflicting edits” is also a big one as things that conflict in RDF-land can be completely unrelated as far as a normal diff tool is concerned.


I was definitely not thinking of diff as in a textual difference operation.  My reference to Github was intended to be an analogy not a literal interpretation.  I believe the concepts of branching, diff'ing and merging apply both to code and graphs with the implementation being wildly different.  Comparing graphs at the triple level, including handling b-nodes properly was what I was thinking, thus my reference to the TBC Diff engine. 

 
We have a need to have versions of both ontologies and instance data with compatible versions linked together.

Of course, everyone has that requirement who’s ever used an ontology at all. However, a tool cannot do all the heavy lifting. For example, if your ontology changes to make existing data invalid then some sort of data migration/transformation is needed. 

The simple addition of "<x> sh:minCount 1” in a SHACL property shape  can make millions of data instances invalid and no diff tool will find that.

Only good business processes outside anything in EDG, or any other tool, can properly support this in the general case.


I wasn't thinking of a tool to manage the transformation of instance data from one ontology to another.  I agree, that would be a real challenge.  I was thinking of simply ensuring an instance graph can know what version of an ontology it was populating.  This requires defining versions of ontologies beyond putting a "v<X.x>" in the name of the ontology graph..  This could be inferred from the import statements or a specific predicate could be used to point to the defining ontology.  While I'm not looking for this capability here, I am a fan of using a transformation ontology to define the transformations and a SPARQL/SPIN/SHACL engine to execute them.  I have built such an ontology and have used it successfully a number of times.  I believe this was also the strategy behind the Spinmap Mapping technology in TBC.
 

Since all edits can be controlled through EDG (vs Git where the individual changes have to be detected and reconciled), maybe an extension of the TBC Diff engine might help here?

As I mentioned, in a working copy you are not actually changing a graph. A working copy is a layer of changes (literally additions and deletions) to be applied over the production copy that the UI make look like normal triples when viewing thru the lense that is the working copy. That said, we are looking at potential improvements.


I understand.  Fundamentally, I think of a "working copy" as originating from the production graph at a specific point in time (i.e. a "branch" in the version tree), even if the graph isn't duplicated.

No, that’s not how it works. Any changes that happen within the production copy after the working copy is created become visible in the working copy too. 

That’s one of the reasons why I suggest that this topic really does require a change in mindset. A working copy is *not* a branch. The major, major difference is that git branches a whole repo which might contain a set of related files in multiple folders. Working copies do nothing like that today. We have done some work on multi-graph working copies but it’s not in the product yet and we’re exploring what makes sense for the general case.


I think the risk in the current design is that the layer of changes is currently defined as anything the user has entered as a change since the working copy was created.  As you said, working copies do not know what changes other working copies have committed to the production graph. 

I’ve not explained things clearly enough - "working copies do not know what changes other working copies have committed to the production graph” is not true. Working copies to not see other working copies content that is not yet committed to the production copy. They always see the latest change sot the production copy at all times.


So if you make changes, and I make changes, if I commit first, your changes can overwrite mine and vice versa. 

Git cannot stop that happening either.

What's also confusing, currently, if this happens, your changes will show up in my working copy (because it's just loading the latest production graph).  BUT, when I run the Comparison Report in my working copy, which explicitly says "Compares this working copy with the production copy", it only shows the changes I have entered even though the production copy has changed since my workflow was initiated. 

Yes, it is only doing what the name says.

Maybe I worked with version control systems for too long, LOL???

Yep -). There is no real comparable approach for graphs. As things work today, a single named graph and working copies (i.e. change sets with no visibility into other uncommitted change sets) over that graph are the level of “control” possible in EDG.



While this can be managed at the work process level, it would be better to prevent the occurrence within EDG.  For example, when a Working Copy is displaying the changes, it could show all proposed changes (from my working copy) as well as all committed changes that have occured since the working copy was initiated (EDG should have the change history but maybe there are edge cases?).  Changes that conflict could be highlighted.  Conflict definitions could be defined in an ontology, much like the TBC Diff engine.


There is enough data in the production copy change history to figure out what changed since the working copy was created, so that kind of compare is indeed possible - can do it with SPARQL today actually.


From TBC help:

The Diff Engine

TopBraid's Diff engine is entirely declarative and can be modified or extended to get customized output. The various diff classes have SPARQL queries attached to them via diff:rule. These queries are called by the engine to create the instances of this class. The GRAPH keyword of SPARQL is used to query the old graph versus the new graph. If you want to add your own kinds of diff outputs, then simply add such diff:rules.

The second step of the Diff engine is to call all spin:rules for the constructed instances of the diff classes. These rules can be used to post-process the raw output from the first step, e.g. to create human-readable labels. The same mechanism can be used to create higher-level diff objects from the lower-level triple change objects.

You can modify the diff.ttl file in your workspace to adjust the behavior of the diff engine for your needs. 

 

I can see how this idea is useful and might possibly be done in EDG since you could do this today with Stored SPARQL queries. Of course, would probably be better to have a nicer UI to run them and to display the results.

Anyway - good input for our internal discussions on this topic wrt new features to put into future releases of EDG.

Cheers,
David

Ralph Hodgson

unread,
Feb 22, 2022, 2:58:02 PM2/22/22
to topbrai...@googlegroups.com
Telling Tim about it off the list is fine. I can write the email,

Sent from my iPhone

On Feb 22, 2022, at 2:41 PM, David Price <dpr...@topquadrant.com> wrote:


Reply all
Reply to author
Forward
0 new messages