SQL-Gremlin compiler

Ted Wilmes

unread,

Dec 14, 2015, 9:33:39 AM12/14/15

to Gremlin-users

Hello,

I've released the first version of a SQL-Gremlin compiler: https://github.com/twilmes/sql-gremlin. This Gremlin shell plugin allows you to run standard SQL against your TinkerPop enabled graph db (Titan, Neo4j, TinkerGraph, etc...) The next release will include a JDBC driver so that you can access it from off the shelf JDBC enabled tools. My main motivation for this project is to allow folks to hook their reporting tools (Pentaho, Jasper, Crystal Reports) up to their graph dbs without necessarily having to ETL into a relational database first. If you give SQL-Gremlin a try and run into any problems, feel free to create an issue that includes a brief description of your data model and the failure.

Thanks,

Ted

Stephen Mallette

unread,

Dec 14, 2015, 9:58:40 AM12/14/15

to Gremlin-users

Very nice. The JDBC driver thing sounds cool too - your motivation for that piece sounds spot-on. Opening up access to all that tooling out there that relies on jdbc drivers would be a welcome feature to a lot of folks.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7bbf68b4-00f4-46b8-a533-03c092a050f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ted Wilmes

unread,

Dec 14, 2015, 10:35:43 AM12/14/15

to Gremlin-users

Thanks, Stephen. Yeah, I see this sort of thing as a bridge until the tooling can speak "graph" natively. Until then, this can take your machine generated SQL and perform the translation. There are a variety of great options for graph visualization, but when it comes to plain old report and dashboard type applications, the end user isn't left with much. Many of the vendors provide integration options for NoSQL stores but they require some development and/or provide little "pushdown" support into the underlying datastore. This can be problematic from a performance and maintenance standpoint. I'll hit you up when I have the core JDBC bits working. I'd like to get your thoughts on how to expose/package that.

--Ted

--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/npncDyVQJSU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAA-H439bCGWGosmSPDK7mzWi2hTtyYhV9Eib52UJ-%3DaKdV6S7w%40mail.gmail.com.

Sebastian Good

unread,

Dec 15, 2015, 8:11:00 AM12/15/15

to Gremlin-users

This is very cool stuff.

Given the awkwardness in expressing some native graph queries in SQL (e.g. friends-of-friends up to distance 3), it strikes me you could get a lot of mileage by allowing the graph developers to define custom Gremlin functions and expose them as SQL functions, e.g. "SELECT * FROM Person a JOIN Person b ON MyGremlinFunction(a,b)=true" or some such thing, so that reporting users could take advantage of their existing tooling, but get some advantages of graph query execution.

Looking forward to following this repository!

Ted Wilmes

unread,

Dec 15, 2015, 10:08:35 AM12/15/15

to Gremlin-users

Hi Sebastian,

That's a cool idea. I have a precursor to SQL-Gremlin that we're using with Titan 0.4.4 that goes SQL to essentially a bunch of MultiQuery calls. It has the concept of these primitive "vertex" tables but also two extensions that provide another level of abstraction.

Taking your friends-of-friends up to distance 3 example, we have a special type of bridge/associations table with the following columns: out_id, in_id, depth, direction. To get the friends within distance 3, you'd do something like this:

select * from person as p1

inner join friends as f on f.out_id = p1.person_id

inner join person as p2 on f.in_id = p2.person_id

where direction = 'OUT' and depth <= 3

The 2nd graph "view" that we expose is based upon the regular old star schema fact table. In this case, we have a bunch of time series data across many different time series that we need to make reporting friendly. Our fact tables are setup via some simple json configuration that specifies fact table names, their columns, and how to preprocess each column (eg. do we want an average, raw value, most recent value in time period?) You end up with tables that have 2 dimensions ( a foreign key to a dimension table like company, store, etc. and a time dimension). Then the rest of your columns are your time series values. Say we were collecting sensor data from a bunch of grocery stores, it'd look something like this:

store_id, tstamp, temp_sensor, humidity_sensor, temp_sensor_deli, temp_sensor_freezer

This is all sitting over a time series model that would be very verbose to query at the vertex level so the tool provides that nice translation that makes it easy to plug into Jasper, Pentaho, etc.

Long story short, I'd like to add these sorts of facilities into SQL-Gremlin and I'm looking forward to hearing further ideas from folks on how expose the more graph-type operations via SQL.