[Discussed] On integrating sparql-gremlin 0.2 plugin in tinkerpop codebase | Seek guidance and support

52 views
Skip to first unread message

Harsh Thakkar

unread,
Dec 7, 2017, 10:46:49 AM12/7/17
to Gremlin-users
Hello, dear Gremlin people!

Apologies for raising this topic a bit late. I planned to start this thread quite earlier but wasn’t able to due to some reasons. 

======= short ==================================================================================================
I seek your guidance and also help for polishing and integrating the sparql-gremlin 0.2 (https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin) plugin in the apache tinkerpop code base, succeeding its predecessor developed by Daniel Kupitz (https://github.com/dkuppitz/sparql-gremlin). The new plugin offers support for a wide range of SPARQL queries from the SPARQL 1.0 features.


============ long =============================================================================================

I am a Ph.D. student at the University of Bonn and work at the intersection of semantic web and graph databases. My thesis is focused on bridging the gap between these two domains by enabling support for SPARQL querying of Property Graph databases. Thus, working on the SPARQL-Gremlin interoperability was an obvious idea given the wide popularity of Gremlin amongst the Graph DB vendors. 

The sparql-gremlin 0.1 (link - https://github.com/dkuppitz/sparql-gremlin) plugin was developed by Daniel Kupitz, which we have extended to support various features of the SPARQL 1.0 specification and have tested using various synthetic datasets (such as Northwind dataset and the Berlin sprawl benchmark [BSBM] dataset) and a wide range of SPARQL queries. 

The extended version of the plugin (sparql-gremlin 0.2, link -  https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin) supports a variety of query modifiers (group-by, order-by, counts, etc) and complex query features such as union, aggregation, etc. It does not currently support SPARQL optional queries though. It needs a minor fix.

I wish to integrate this updated version to the apache tinkerpop codebase and wish to see it roll out as a functional plugin (like the old one, replacing it with the updated version) in the next version of tinker pop (or even before, however it works out). 

===============================================================================================================

I am not much aware of how to do it and what steps I need to follow, so I seek input from you all and have started this thread (as suggested by Stephen Mallette) and already discussed with Marko Rodriguez during Graph Day SF 2017 and in other informal communications.

Please guide me through the same and let me know what all I will need to do and/or what you will need to get this done. I am happy to collaborate and be a part of this awesome project :)

**I HAVE ALREADY STARTED A THREAD ON https://lists.apache.org/list.html?d...@tinkerpop.apache.org (AS SUGGESTED BY STEPHEN MALLETTE, THANKS, A TON FOR THAT!) ABOUT THIS. IT SHOULD APPEAR THERE SOON**

Cheers,
Harsh

Marko Rodriguez

unread,
Dec 7, 2017, 1:48:19 PM12/7/17
to gremli...@googlegroups.com
Hello Harsh,

I think that it is important that TinkerPop support various “reference implementations.”

* Regarding graph systems, TinkerPop (out of the box) supports Neo4j, Spark, Giraph, and TinkerGraph.
* Regarding Gremlin language variants, TinkerPop (out of the box) supports Java, Groovy, Python, .NET.
* Regarding Gremlin distinct languages, TinkerPop (out of the box) only supports Gremlin!
- This is bad.

http://tinkerpop.apache.org/providers.html (see the section on Query Language Providers)

I think we should make a move to support another Gremlin distinct language and SPARQL is an obvious choice. Another choice would the SQL-Gremlin work by Ted Wilmes (https://github.com/twilmes/sql-gremlin). 

I believe the that the future of TinkerPop should provide references implementations as follows:

1. Three data systems: Neo4j, Spark, TinkerGraph.
2. Three TraversalEngines: Standard, Computer, and Actors.
3. As many language variants as possible (e.g. Ruby, Python, JS, .NET, etc.).
4. Two distinct query languages: SPARQL, SQL.

From there, we have all the primary bases covered enabling providers to learn the necessary mechanisms to integrate with the Gremlin VM for their particular project. 

Take care,
Marko.
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/68c9fa39-e7bc-4f8c-b4a8-c0489418117b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
Dec 7, 2017, 1:59:49 PM12/7/17
to Gremlin-users
I definitely don't want to discourage any comments about this topic but let's try to have this core of this discussion on the dev list if possible. There seems to be some kind of delay on that post coming through. Keep an eye out for it. 

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/16EDBF2D-004E-4517-AE12-16F0E3235E34%40gmail.com.

Harsh Thakkar

unread,
Dec 7, 2017, 2:15:43 PM12/7/17
to Gremlin-users
Hi Marko,

Thanks for your quick response. I do agree with your plan, as I personally wish to see Gremlin become the focal point for graph and semantic web community. I have been working on this for quite some time now. The sparql-gremlin plugin aims cover this gap by enabling interoperability and ease of access to a lot of people who wish to leverage the advantage of Graphs! 

I also think TinkerPop as an organization or a team should make efforts to standardize Gremlin as the *de-facto* Graph Query Language for Property Graph databases. Just like SPARQL is a W3C standard for RDF databases/stores. Though this will require a substantial push also from the theoretical perspective of defining the formal semantics of the Gremlin language and also little influence ;)  AFAIK, the Neo4J team is already working on CYPHER (with the OpenCypher specification) in the direction. However, I do not know the specifics. I totally believe that given the popularity of Gremlin and TinkerPop it is worth a fight. 

In short, the plan you described looks great!

====================================================

Hi Stephen,

I totally get your point. Thanks for the effort in pushing the thread through. I will post it here as well for people to debate/contribute to the criticism 




**I HAVE ALREADY STARTED A THREAD ON https://lists.apache.org/list.html?dev@tinkerpop.apache.org (AS SUGGESTED BY STEPHEN MALLETTE, THANKS, A TON FOR THAT!) ABOUT THIS. IT SHOULD APPEAR THERE SOON**

Cheers,
Harsh


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages