tbc ttl file questions

Bohms, H.M. (Michel)

unread,

Jun 22, 2017, 2:19:33 AM6/22/17

to topbrai...@googlegroups.com

Dear,

2 simple/short technical questions:

Can the generated comments at the start of a turtle file be avoided somehow?
Can the order of the items in the file be preserved? (related to # comments)?

Thx Michel

	Dr. ir. H.M. (Michel) Böhms Senior Data Scientist	T +31888663107 M +31630381220 E michel...@tno.nl	Location

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

Holger Knublauch

unread,

Jun 22, 2017, 6:47:46 PM6/22/17

to topbrai...@googlegroups.com

On 22/06/2017 16:19, Bohms, H.M. (Michel) wrote:

Dear,

2 simple/short technical questions:

Can the generated comments at the start of a turtle file be avoided somehow?

When you just use the Save feature to produce a TTL file then the # baseURI lines will always be there. If you want to avoid them, you'd need to give up on the order of items in the file, which is currently preserved by default.

Can the order of the items in the file be preserved? (related to # comments)?

What problem are you trying to solve?

Holger

Thx Michel

Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist

T +31888663107
M +31630381220
E michel...@tno.nl

Location

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bohms, H.M. (Michel)

unread,

Jul 12, 2017, 7:11:12 AM7/12/17

to topbrai...@googlegroups.com

Sorry bit late reply…

When I save, all comments (edited/added manualy) are gone and order is changed.

(can’t remember I changed a setting for that behaviour, so guess default)

After save (in ttl), order seems:

Comments baseURI/prefix/baseuri
Prefixes
Changed external entities
Ontology declaration
Classes (alphabet.)
Properties (datatype/object mixed) (alphabet.)

Is there a way to overrule this and stick to manually determined order and keeping own comments?

Gr Michel

Holger Knublauch

unread,

Jul 12, 2017, 7:34:02 PM7/12/17

to topbrai...@googlegroups.com

On 12/07/2017 21:11, Bohms, H.M. (Michel) wrote:

Sorry bit late reply…

When I save, all comments (edited/added manualy) are gone and order is changed.

(can’t remember I changed a setting for that behaviour, so guess default)

After save (in ttl), order seems:

Comments baseURI/prefix/baseuri

Prefixes

Changed external entities

Ontology declaration

Classes (alphabet.)

Properties (datatype/object mixed) (alphabet.)

Is there a way to overrule this and stick to manually determined order and keeping own comments?

Yes, the only way to keep these is to not use TBC at all.

What you are asking for (and in the parallel thread) is almost impossible to implement for us. You are asking for a system that not only parses Turtle but also preserves the details of the formatting (e.g. commas vs semicolon) and # comments (which the parsers usually throw away).

Seriously, if these low-level details of the TTL syntax are relevant to you, just use text editors.

Holger

Bohms, H.M. (Michel)

unread,

Jul 13, 2017, 4:13:35 AM7/13/17

to topbrai...@googlegroups.com

Hi Holger, see after >:

On 12/07/2017 21:11, Bohms, H.M. (Michel) wrote:

Sorry bit late reply…

When I save, all comments (edited/added manualy) are gone and order is changed.

(can’t remember I changed a setting for that behaviour, so guess default)

After save (in ttl), order seems:

Comments baseURI/prefix/baseuri
Prefixes
Changed external entities
Ontology declaration
Classes (alphabet.)
Properties (datatype/object mixed) (alphabet.)

Is there a way to overrule this and stick to manually determined order and keeping own comments?

Yes, the only way to keep these is to not use TBC at all.

Maybe 😊, but I’ll try first the next scenario:
- When ontology is fully stable add comments in file
- From that point only read the file and never write/save again avoiding losing my order and comments

I explain why important: we have this concept modelling ontology (CMO) supporting different modelling styles (decomposition, qudt2.0 etc.). I would like to group the mechanisms for the different modelling styles together and introduce the groups with a comment. Alternative is to introduce an annotated clone of the file for information but I do not like that. Yet another alternative is to annotate all items separately (“supports modelling style x”).

What you are asking for (and in the parallel thread) is almost impossible to implement for us. You are asking for a system that not only parses Turtle but also preserves the details of the formatting (e.g. commas vs semicolon) and # comments (which the parsers usually throw away).

That parallel issue is a completely different one not in the same league as the above. A writer that only supports 2 out of 3 (turtle) key language features is simply not fully complete even if the resulting turtle is fully valid turtle. If I would follow your reasoning you could actually write turtle without semicolons, commas and []. This would result in valid turtle but the end result is simply triples and that is not what you would expect from a Turtle writer since Turtle was just meant to add all this syntactic sugar to triples.
You say “e.g. commas vs semicolon”. This is not the case! Its about commas AND semicolons. These are fully orthogonal language features! So the issue is not about unimportant syntax-alternatives for the same concept, it’s about 2 different concepts (2 different ways to abbreviate your directed graph) where one is supported and not the other.

Seriously, if these low-level details of the TTL syntax are relevant to you, just use text editors.

Yes, low-level syntax issues ARE very relevant. They are the fundament under all we do in the end. When convincing our client to move from SPFF or XML to RDF and its serializations they expect implementations that 100% support these specs. If a comment is a feature of that spec, if a comma is a feature of that spec they do not expect that a parser and or writer ignores or even deletes them. Anyway as said before, lets agree to disagree (although your views in these matters highly surprise me I must say).

Greetings Michel

David Price

unread,

Jul 13, 2017, 5:55:35 AM7/13/17

to topbrai...@googlegroups.com

Hi Michel,

First, a bit of clarity seems to be required wrt understanding some basics of RDF/OWL. RDF/XML, Turtle, etc are encoding graphs, which by-definition have no order or hierarchy. There is no top, bottom or middle of a graph. Therefore, ordering is not important in any Turtle and RDF/XML encoding. Contrast that with XML which is encodes a strictly ordered set of XML elements and hierarchy/containment relationships plus attributes on the XML elements. Graphs and XML are nothing alike and mixing requirements between them is not a sensible thing to do.

It does not make sense to decide whether to use IFC EXPRESS and STEP file format vs IFC OWL and Turtle for data exchange based on a manually controlled Turtle file format. If that is important in your argument, IMO you’ve got entirely the wrong argument :-) If you have further concerns, I suggest you take them up with me directly rather than on the user forum since I understand the IFC STEP issues and most Composer developers and users don’t. I’m happy to help make the argument to IFC STEP users. How about : For real-world uses of RDF/OWL, it is very likely that the data will end up in an RDF database anyway, and so the fact that it was ever encoded as Turtle disappears entirely.

Cheers,

David

UK +44 7788 561308

US +1 336 283 0606

Bohms, H.M. (Michel)

unread,

Jul 13, 2017, 8:02:47 AM7/13/17

to topbrai...@googlegroups.com

Hi Michel,

First, a bit of clarity seems to be required wrt understanding some basics of RDF/OWL. RDF/XML, Turtle, etc are encoding graphs, which by-definition have no order or hierarchy. There is no top, bottom or middle of a graph. Therefore, ordering is not important in any Turtle and RDF/XML encoding. Contrast that with XML which is encodes a strictly ordered set of XML elements and hierarchy/containment relationships plus attributes on the XML elements. Graphs and XML are nothing alike and mixing requirements between them is not a sensible thing to do.

>The situation is IMHO subtly different:

Any RDF Document (in whatever encoding aka concrete RDF syntax) DOES have an order. Of course this order does have no semantic meaning (or as you say ‘the encoded graph does have no order’). Likewise all concrete RDF syntaxes have a concept of ‘Comment’. Again no semantic machine-interpreted meaning. Because of this ordering, also comments make sense (since a comment is typically location-sensitive having 1) no order and 2) comments would not really make sense at document level.
The next but separate question is: when imported/exported into a tool (say tbc), should the comments/order be preserved somehow or not? When the data is read into a platform once, becomes the primary source and in the end is only processed via SPARQL you do not care. The originating Turtle doc can be deleted or just left there for human information only.
Since many rdf data (lets focus here on ontologies) is in practice still imported , edited and exported by tools as files/documents in concrete RDF syntaxes (and not only via direct sparql/SOH access etc.) I would say the processing of order/comments IS an issue

It does not make sense to decide whether to use IFC EXPRESS and STEP file format vs IFC OWL and Turtle for data exchange based on a manually controlled Turtle file format. If that is important in your argument, IMO you’ve got entirely the wrong argument :-)

Sorry David, you completely missed my point. Let’s formulate it differently: I have seen end-users reading in Turtle (ontologies), editing with tbc and then say “hey tbc deleted all my comments!” It did not even warn me for that when I saved.
Anyway, the issue is only there when editing. If files are read-only there is of course no issue. Also when RDF documents are not used as primary source but the RDF Documents are only there for data exchange again there is no issue,
To be clear the whole issue is for me only relevant for ontologies not data files (don’t care much about order and comments in data files).

If you have further concerns, I suggest you take them up with me directly rather than on the user forum since I understand the IFC STEP issues and most Composer developers and users don’t. I’m happy to help make the argument to IFC STEP users. How about : For real-world uses of RDF/OWL, it is very likely that the data will end up in an RDF database anyway, and so the fact that it was ever encoded as Turtle disappears entirely.

>Finally, Here we totally agree!, if Turtle (or any concrete RDF syntax) becomes completely irrelevant in future because all instantiation and changing is done via direct interface like SPARQL, all issues that arose by combining RDF Documents and RDF Datasets also become fully irrelevant! So let’s work hard to make Turtle a formal W3C Condemnation (is that the opposite of a Recommendation?).

>Gr Michel

Irene Polikoff

unread,

Jul 13, 2017, 2:32:02 PM7/13/17

to topbrai...@googlegroups.com

Michel,

Serializations and deserialization provide a way for data to be translated into a format that could be used for transmission, interchange, storage in a file system, etc. with the ability for it to be later reconstructed to create semantically identical clone of the data.

The goal of RDF serializations and tool interoperability is to ensure that if tool A produces a serialization of a graph X, tool B can read it in and understand it as graph X. Tool B can then, in its turn, produce serialization of graph X, tool A can import it and it is still the same graph. The serialization output of A may not look exactly the same as the serialization output of B, but their semantic interpretation is always the same.

Serialization/deserialization process is not intended to ensure that the sequence of bytes in a file will be exactly the same. In case of both RDF/XML and Turtle format, there are several syntactic variations for representing the same information. The simplest RDF serialization is N-Triple. There is little room in it for syntactic variations as it just contains triple statements. However, even with that simplicity, there are variants since the order of statements may vary. The bottom line is that if you are using serializations in the interchange and parse them to deserialize for use in some target system, you need a parser that will understand what the serialization means semantically and will not rely purely on the byte sequence.

If TBC parser was ignoring something that captured semantics of data, this would be a bug. I do not think it is the case. Comma is not ignored, it is correctly understood by deserialization when data is imported into TBC. “Deleting it” is not even a concept because once data is deserialized, comma no longer exists. We now have a graph. When you save it, it is serialized anew - without any memory or consideration of how its serialization looked when it came in. As long as the serialization still represents semantically identical object, it is correct.

Regards,

Irene Polikoff

Tim Smith

unread,

Jul 13, 2017, 2:48:03 PM7/13/17

to topbrai...@googlegroups.com

Ok, now we have the "reason" for needing this functionality:

Michel wrote:
" I explain why important: we have this concept modelling ontology (CMO) supporting different modelling styles (decomposition, qudt2.0 etc.). I would like to group the mechanisms for the different modelling styles together and introduce the groups with a comment. Alternative is to introduce an annotated clone of the file for information but I do not like that. Yet another alternative is to annotate all items separately (“supports modelling style x”)."

It's interesting to me that you have used an ontology to capture the knowledge in your domain and then want to use a "document" (i.e. comments and proper ordering) to capture additional knowledge about the objects in your CMO.

Could you not create another ontology with a classes like "Modeling Style Mechanism" and "Modeling Stype Group" and then create Modeling Style Group instances and link the various mechanism instances to it using an appropriate property? Then you have a fully query-able representation of your modeling mechanisms, making the information easily discoverable, displayable, etc... Ontologies are just triples and unless you care about strict inferencing, you can interchangeably use a Class as an instance or a Class. I use this all the time to capture knowledge and data using the same ontologies.

Just a thought,

Tim

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-users+unsubscribe@googlegroups.com.

Irene Polikoff

unread,

Jul 13, 2017, 3:35:53 PM7/13/17

to topbrai...@googlegroups.com

Yes, I agree with Tim.

If comments are about an entire graph/ontology, then use rdfs:comment to record them and use as the subject of the comment statement an identifier/name of a model. If comments pertain to a subset of the resources described in a model, then Identify the subset and associate the comments with it.

Irene Polikoff

To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.

Bohms, H.M. (Michel)

unread,

Jul 13, 2017, 4:35:17 PM7/13/17

to topbrai...@googlegroups.com

Dear Irene

Under the assumption:

“rdf data is always in an RDF database (accessible by SPARQL) and RDF Documents are ONLY for semantic exchange between such systems”

I fully agree 100 % with all your statements (and those of your colleagues).

So the issue is about the “assumption”.

I observe many situations where the primary/reference data (typically an ontology) is in an RDF Document. You might say “this not good” or “this won’t be in future” etc. but that is just a situation I observe a lot (and the actual RDF1.1 specifications are certainly not clear that this shouldn’t be the case either).

An example (other than my own cmo.ttl): https://www.w3.org/ns/prov.ttl

This is a reference specification published on the web as RDF document (in various dereferenceable serialisations). It can be imported in any RDF DB you like but the reference spec IS an RDF Document (so this is another status then “just for exchange”).

In this ontology you find many comments like:

“## Definitions from other ontologies”

Or

“# The following was imported from http://www.w3.org/ns/prov-dc#”

Now image you are the owner of such a reference ontology as RDF Document.

You should be aware that when editing your ontology, most tools (other than text editors) will delete both order and comments (in general: all non-semantic aspects of that document).

When I say “delete” I of course mean the sequence (not-parse/record+write) which effectively deletes the comments and specific order after editing.

The current formal specifications do not tell us much about the rightness of assumptions above. Turtle specifies a comment mechanism but does not say ie “be careful, comments are only relevant when writing files, after parsing all will be lost” or something similar.

I also agree (with Holger) that if that second interpretation (RDF doc as primary/reference) is assumed it is quite a job to record the non-semantic info and reuse that when writing out again. (in the ISO STEP world the same issue is relevant and some tools actually retain the non-semantic data in STEP Physical Files to support more deterministic documents. In some situations this simplifies model comparisons; I am not saying this is the right approach, only saying it happens).

I hope I made the issue at least more clear now.

Greetings, Michel

	Dr. ir. H.M. (Michel) Böhms Senior Data Scientist	T +31888663107 M +31630381220 E michel...@tno.nl	Location

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

From: topbrai...@googlegroups.com [mailto:topbrai...@googlegroups.com] On Behalf Of Irene Polikoff
Sent: donderdag 13 juli 2017 20:32
To: topbrai...@googlegroups.com
Subject: Re: [topbraid-users] tbc ttl file questions

Michel,

--

Holger Knublauch

unread,

Jul 13, 2017, 6:04:57 PM7/13/17

to topbrai...@googlegroups.com

FWIW some formats such as JSON (and thus JSON-LD) don't even support comments. The philosophy behind that is that every piece of information should become data and not be hidden in a specific serialization.

Holger

Irene Polikoff

unread,

Jul 13, 2017, 6:16:13 PM7/13/17

to topbrai...@googlegroups.com

Dear Michel,

I do not have an assumption that RDF data is always in some database, but I do assume that the files get ingested for processing into some tool/system that is capable of deserializing them. With this, I do not really understand the issue with respect to commas and ordering.

I do understand the issue with losing comments recorded using ##. As a solution, I recommend capturing them directly in RDF. Or using some other, more specific, properties such as prov:wasDerivedFrom or rdfs:seeAlso.

Irene

<image001.gif>

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

From: topbrai...@googlegroups.com [mailto:topbrai...@googlegroups.com] On Behalf Of Irene Polikoff
Sent: donderdag 13 juli 2017 20:32
To: topbrai...@googlegroups.com
Subject: Re: [topbraid-users] tbc ttl file questions

Michel,

Serializations and deserialization provide a way for data to be translated into a format that could be used for transmission, interchange, storage in a file system, etc. with the ability for it to be later reconstructed to create semantically identical clone of the data.

The goal of RDF serializations and tool interoperability is to ensure that if tool A produces a serialization of a graph X, tool B can read it in and understand it as graph X. Tool B can then, in its turn, produce serialization of graph X, tool A can import it and it is still the same graph. The serialization output of A may not look exactly the same as the serialization output of B, but their semantic interpretation is always the same.

Serialization/deserialization process is not intended to ensure that the sequence of bytes in a file will be exactly the same. In case of both RDF/XML and Turtle format, there are several syntactic variations for representing the same information. The simplest RDF serialization is N-Triple. There is little room in it for syntactic variations as it just contains triple statements. However, even with that simplicity, there are variants since the order of statements may vary. The bottom line is that if you are using serializations in the interchange and parse them to deserialize for use in some target system, you need a parser that will understand what the serialization means semantically and will not rely purely on the byte sequence.

If TBC parser was ignoring something that captured semantics of data, this would be a bug. I do not think it is the case. Comma is not ignored, it is correctly understood by deserialization when data is imported into TBC. “Deleting it” is not even a concept because once data is deserialized, comma no longer exists. We now have a graph. When you save it, it is serialized anew - without any memory or consideration of how its serialization looked when it came in. As long as the serialization still represents semantically identical object, it is correct.

Regards,

Irene Polikoff

On Jul 13, 2017, at 4:13 AM, Bohms, H.M. (Michel) <michel...@tno.nl> wrote:

Seriously, if these low-level details of the TTL syntax are relevant to you, just use text editors.

Yes, low-level syntax issues ARE very relevant. They are the fundament under all we do in the end. When convincing our client to move from SPFF or XML to RDF and its serializations they expect implementations that 100% support these specs. If a comment is a feature of that spec, if a comma is a feature of that spec they do not expect that a parser and or writer ignores or even deletes them. Anyway as said before, lets agree to disagree (although your views in these matters highly surprise me I must say).

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bohms, H.M. (Michel)

unread,

Jul 14, 2017, 3:05:01 AM7/14/17

to topbrai...@googlegroups.com

Right, and I am actually in favour of that.

Fact is not all serialisations have that idea. Turtle and RDF/XML do have comments and people ARE using them in practice (cmo, prov ,…).

So let’s conclude:

“Despite being offered the possibility better not use comments in your RDF Documents (or assume any order)”.

In a sense these things are one order worse than annotations: where annotations are processed but not interpreted, comments and not even processed.

So we agree on the “what should be”, but differ on the use in practice. That’s fine for me. Maybe you can consider in your system some warnings on save: “when saving not all features of your original input will be retained”.

Thx for the discussion,

Michel

	Dr. ir. H.M. (Michel) Böhms Senior Data Scientist	T +31888663107 M +31630381220 E michel...@tno.nl	Location

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

From: topbrai...@googlegroups.com [mailto:topbrai...@googlegroups.com] On Behalf Of Holger Knublauch
Sent: vrijdag 14 juli 2017 00:05
To: topbrai...@googlegroups.com
Subject: Re: [topbraid-users] tbc ttl file questions

FWIW some formats such as JSON (and thus JSON-LD) don't even support comments. The philosophy behind that is that every piece of information should become data and not be hidden in a specific serialization.

Holger

Irene Polikoff

To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bohms, H.M. (Michel)

unread,

Jul 14, 2017, 4:13:49 AM7/14/17

to topbrai...@googlegroups.com

See my previous reply.

The ordering-thing is really the same issue of the comment-thing (both being non-semantic RDF document aspects).

The comma-thing is something very different and (maybe you noticed 😊) I have a very strong opinion here.

I will try to reformulate one more time and then shut up.

A directed graph when represented as triples has bad space-complexity.
Turtle improves the space-complexity by systematically shortcutting the three components of a triple:
1. Reuse of a subject > ;
2. Reuse of a predicate > ,
3. Reuse of a subject > […]
All these things have nothing to do with semantics, only with syntactic sugar and associated space and time-complexities
a, b and c are fully orthogonal mechanisms which independently improve the space-complexity of any directed graph
Maximum improvement of space-complexity is obtained when all three mechanisms are applied
Applying no mechanism means the output is still Turtle but also just triples (no gain, no syntactic sugar)
Applying only a and c (as your writer does) is also Turtle but not making maximal use of the b-advantages ie using the commas

Note 1: time-complexity typically works the other way round (spacetime-complexity is constant), but that wasn’t the issue here. If time-complexity IS the issue you better forget Turtle and stick with triples (this is true in general, in theory it depends on the time-penalty balance between reading an item and processing an item: better space-complexity means less to process but the actual processing is normally more than you gain)

Note 2: a b c all replace strings by predefined characters (; , []) so the end-result is always shorter than the original

In my eralier example:

Triples:

a1 b1 c1.

a1 b2 c2.

a1 b2 c3.

a1 b3 c4.

c4 b4 c5.

Applying a b and c:

a1 b1 c1; b2 c2, c3; b3 [c4 b4 c5].

Applying only b and c:

a1 b1 c1; b2 c2 ; b2 c3; b3 [c4 b4 c5].

As can be seen the second code is longer than first one.

David said earlier: “efficient” is in the eye of the beholder.

This of course related. “Efficient” only makes sense if you specify space or time. Here we talked space-complexity/efficiency.

It could be of course that for some reason you optimized space and time complexity by just leaving out “b”.

Gr Michel

Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist

T +31888663107
M +31630381220
E michel...@tno.nl

Location

Richard Cyganiak

unread,

Jul 14, 2017, 5:34:13 AM7/14/17

to topbrai...@googlegroups.com

Michel,

On 14 Jul 2017, at 09:13, Bohms, H.M. (Michel) <michel...@tno.nl> wrote:

A directed graph when represented as triples has bad space-complexity.
Turtle improves the space-complexity by systematically shortcutting the three components of a triple:
Reuse of a subject            > ;
Reuse of a predicate       > ,
Reuse of a subject            > […]

d. Make almost all whitespace optional, including line breaks

Why are you selectively advocating the use of a, b and c, but not d? Despite the clearly superior space efficiency of omitting all indentation and writing all triples on a single line?