long turtle format and Git

77 views
Skip to first unread message

Nicholas Car

unread,
Mar 12, 2022, 6:28:32 AM3/12/22
to rdfli...@googlegroups.com
Dear RDFLib community,

Please note the addition of the long turtle RDF serialization option (g.serialize(format="longturtle")) in the 6.1.1 release of RDFLib (current).

This format is just turtle - parsable with an un-altered turtle parser - but with more linebreaks and indentations than normal. The purpose of the format is to better present RDF text data to version control systems such as Git.

The motivation for the format is the realisation that many of us are using Git and systems like BitBucket, GitHub and GitLab for RDF data version control and that we also all use turtle as the main RDF text serialization.

An example of a long turtle serialized ontology is the TERN Ontology, made by frequent RDFLib contributor Edmond is:


This ontology is under active development and you can see longturtle doing its thing in  GitHub diffs suc as https://github.com/ternaustralia/ontology_tern/pull/137/files. It looks like some enhancements could be made though, such as more determanisim on the sorting of the serialization.

Any feedback on this format or on RDF text files and version control in general would be great.

Thanks,

Nick
rdflib co-maintainer

Graham Higgins

unread,
Mar 12, 2022, 10:39:30 AM3/12/22
to rdflib-dev
On Saturday, March 12, 2022 at 11:28:32 AM UTC Nicholas Car wrote:
Any feedback on this format or on RDF text files and version control in general would be great.

Although It’s not advertised as such, the architecture of the BerkeleyDB RDFLib Store implementation preserves the order in which triples are added (as a a side-effect of key indexing) and so all of the RDFLib Stores based on key-value back-ends (BerkelyDB, LevelDB, SQLiteLSM) provide reliably repeatable re-serialization suitable for unambitious efforts. I've not yet checked but I imagine that the corresponding (again, index-using) AbstractSQL-based RDFLib Stores (atm, only rdflib-sqlalchemy) also exhibit the same repeatable serialization property.

Otherwise, if you're just interested in manageable diffs of large but straightforward graphs, serializing as ntriples/nquads and sorting the serialization is also a viable strategy, I've found. However this approach is unsuitable for graphs containing blank nodes as BNode serialization will differ and afaik, the only mooted solution to this is Digital Bazaar's URDNA2015 RDF Dataset Canonicalization proposal, also now a topic in the RDFLib Github discussions section


Graham Higgins

unread,
Mar 12, 2022, 5:13:36 PM3/12/22
to rdflib-dev
On Saturday, March 12, 2022 at 11:28:32 AM UTC Nicholas Car wrote:
Any feedback on this format or on RDF text files and version control in general would be great.

On the topic of managed changes to Graphs - SOLID has a scheme whereby changes are specified using N3 "Patches" - see “6.3.1 Modifying Resources Using N3 Patches:

Example: Applying an N3 patch.
_:rename a solid:InsertDeletePatch;
  solid:where { ?person ex:familyName "Garcia". };
  solid:inserts { ?person ex:givenName "Alex". };
  solid:deletes { ?person ex:givenName "Claudia"} .

Has the advantage of providing a tractable change history.

Donny Winston

unread,
May 31, 2022, 1:00:36 PM5/31/22
to rdflib-dev
I was just introduced to a nifty RDF Serializer published by the Enterprise Data Management Council (http://www.edmcouncil.org/): <https://github.com/edmcouncil/rdf-toolkit>. Its intention is to be used in a git hook to automatically rewrite RDF for readable diffs.

Here's a toy example of the hook in action: <https://github.com/KGConf/Bookclub-ontology/commit/ecc392924f88efc738e0ad776048b12c48f745bc>.

My method (I'm on MacOS with x86 architecture):
1. I installed https://github.com/edmcouncil/rdf-toolkit/blob/master/etc/git-hook/pre-commit and https://jenkins.edmcouncil.org/view/rdf-toolkit/job/rdf-toolkit-build/lastSuccessfulBuild/artifact/target/rdf-toolkit.jar (link in the https://github.com/edmcouncil/rdf-toolkit README.md).
2. I changed two lines in the pre-commit file:
```
    case ${extension} in
        rdf)
```
to
```
    case ${extension} in
        ttl)
```
and
```
      --target-format rdf-xml \
```
to
```
      --target-format turtle \
```

You do need Java. For me, I installed a pre-built binary for OpenJDK 11 from <https://adoptopenjdk.net/index.html> a while ago, which is now <https://adoptium.net/temurin/releases/?version=11>. So e.g. for me, `echo $JAVA_HOME` is `/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home`.

Donny Winston

unread,
May 31, 2022, 1:06:14 PM5/31/22
to rdflib-dev
Ooh, forgot one detail: I put the rdf-toolkit.jar and pre-commit files in my .git/hooks directory. And another note about the scope of this project. It's designed so that if folks are using different tools that, say, export turtle differently, this hook will hopefully make all output look the same, so if you e.g. made a one-line change in a tool like Protege that someone else had last edited using another tool, the saved ttl might normally produce a multi-line diff, but this tries to get everything in the same canonical form so that the final diff is just the one line.
Reply all
Reply to author
Forward
0 new messages