[TinkerPop3] Neo4j-Gremlin is now available in master/

377 views
Skip to first unread message

Marko Rodriguez

unread,
Jun 9, 2015, 10:21:48 AM6/9/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Hello everyone,

With Michael Hunger's help, NeoTechnology has published the following two artifacts to Apache's Maven Central Repo:


Note that TinkerPop3's Neo4j-Gremlin only <depends/> on the Apache2 licensed neo4j-tinkerpop-api artifact. When a user wishes to :install Neo4j-Gremlin (e.g. GremlinServer or GremlinConsole), then the AGPL binaries will be downloaded from Maven Central to the user's machine.

Neo4j-Gremlin was originally in TinkerPop3 M1 through M7. However, once TinkerPop went Apache Software Foundation, we had to gut it until the above ASF permitted model was implemented. Note that the current Neo4jGraph implementation has seen a lot of updates since M7 so please give it a test drive and find any problems issues


A collection of notes:

1. Neo4j 2.2+ has done a lot to ensure transaction consistency between indices and global graph operations. 
- All the "isDeleted()" checks in Neo4jGraph are gone.
2. The entire TinkerPop3 test suite passes except for TransactionTest test around graph.close() semantics.
- Michael Hunger is looking into the problem. Right now its OPT_OUT as its not a critical test.
3. Neo4jGraph no longer supports legacy indices -- only schema indices.
- This greatly simplified the code and ensured no @Deprecated references.
- Creating indices in Neo4jGraph is done via Cypher: graph.cypher("CREATE INDEX on :person(name)").
4. Neo4jGraph supports Neo4j multi-labels both at the Neo4jVertex API level and at the Neo4jGraphStep index lookup level.
- We now have LabelP.of() which allows for g.V.has(label,of('person')).
5. Neo4jGraph supports multi/meta-properties though this is considered an experimental feature until it is more fully tested at scale and high concurrency.
- The fear is that multi/meta-properties diverges from the native Neo4j representation and until we are comfortable with the embedding, its not safe for production use.
5. And of course, you can still go Cypher->Gremlin which is really cool.

Thanks again to NeoTechnology for working with TinkerPop and releasing an Apache2 licensed version of their API so the community can enjoy Neo4j-Gremlin.

Take care,
Marko.

Michael Pollmeier

unread,
Jun 9, 2015, 8:20:23 PM6/9/15
to d...@tinkerpop.incubator.apache.org, gremli...@googlegroups.com
I have tested a few basic things with gremlin-scala and Neo4jGraph, and
it works fine. Thanks everybody!

I have tested it with locally built snapshots as neo4j-gremlin isn't
part of M9-incubating. Will there be a separate release of tinkerpop3
that includes neo4j-gremlin, so that I can share this in my examples
project?

Marko Rodriguez

unread,
Jun 9, 2015, 9:16:05 PM6/9/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Hi Michael,

> I have tested a few basic things with gremlin-scala and Neo4jGraph, and it works fine. Thanks everybody!

Great.

> I have tested it with locally built snapshots as neo4j-gremlin isn't part of M9-incubating.

Yep.

> Will there be a separate release of tinkerpop3 that includes neo4j-gremlin, so that I can share this in my examples project?

The next official release of TinkerPop3 will be 3.0.0-incubating (i.e. GA).

We will go through 1 or 2 RCs that are basically tags in Apache Git.

HTH,
Marko.

http://markorodriguez.com

Michael Pollmeier

unread,
Jun 10, 2015, 8:02:03 PM6/10/15
to d...@tinkerpop.incubator.apache.org, gremli...@googlegroups.com
What's the best way to insert many elements at the same time, is there
some sort of bulk mode that doesn't check for constraints? I didn't find
anything fitting in the code/documentation in TP3.

Creating 25k vertices takes around 40s, and about 50s if I wrap it in a
transaction - that's a bit slow.

Stephen Mallette

unread,
Jun 11, 2015, 8:37:33 AM6/11/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
General support for bulk loading is still under development and won't be available for GA.  Here's some related issues:


As for just straight loading, I'm not sure we've evaluated performance of neo4j-gremlin at this point.  Not sure what improvements are to be had, if any.



On Wed, Jun 10, 2015 at 8:01 PM, Michael Pollmeier <mic...@michaelpollmeier.com> wrote:
What's the best way to insert many elements at the same time, is there some sort of bulk mode that doesn't check for constraints? I didn't find anything fitting in the code/documentation in TP3.

Creating 25k vertices takes around 40s, and about 50s if I wrap it in a transaction - that's a bit slow.


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/5578CFDB.5070603%40michaelpollmeier.com.

For more options, visit https://groups.google.com/d/optout.

Marko Rodriguez

unread,
Jun 11, 2015, 9:20:04 AM6/11/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Michael,

If there are any areas that can be sped up in the Neo4jGraph codebase, please identify. There hasn't been much manual testing of Neo4jGraph so any help/tickets/PRs you could provide would be greatly appreciated.

Also, could you share your data loading code via a gist?

Thanks,
Marko.

Michael Pollmeier

unread,
Jun 11, 2015, 6:08:01 PM6/11/15
to d...@tinkerpop.incubator.apache.org, gremli...@googlegroups.com
Here you go - all pretty straightforward:
https://gist.github.com/mpollmeier/108ab8998e3b0321f020

Without a bulk api for neo4j it takes nearly 70s to create 30k vertices.

There is currently no publicly available build artifact of tinkerpop3
that contains neo4j-gremlin, so this all depends on custom local builds.

The main thing is the obvious stuff I guess: disable indexes and
integrity constraints. Should be the same as the good old
Neo4jBatchGraph, no?
https://github.com/tinkerpop/blueprints/tree/master/blueprints-neo4j-graph/src/main/java/com/tinkerpop/blueprints/impls/neo4j/batch
I guess for license reasons we can't just copy that but have to
reimplement it?

Cheers
Michael

Marko Rodriguez

unread,
Jun 11, 2015, 6:54:19 PM6/11/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Hey Michael,

I'll let Stephen confirm the path forward, but I bet you could use a Neo4jTrait implementation to do the TinkerPop2-style Neo4jBatchGraph. Checkout Neo4jTrait in TinkerPop3.

Marko.

Michael Pollmeier

unread,
Jun 11, 2015, 7:22:30 PM6/11/15
to d...@tinkerpop.incubator.apache.org, gremli...@googlegroups.com
Yup that's what I meant.

Stephen Mallette

unread,
Jun 12, 2015, 6:06:44 AM6/12/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Not sure how ScalaGraph works, but in reference to your code if this:

sg.addVertex("some label").setProperty("property ${math.random}", math.random)

is not doing this

sg.addVertex(label, "some label", "property ${math.random}", math.random)

your getting some extra transactional checks that aren't necessary if you know all the properties up-front at the time the vertex is added.  changing that probably isn't going to help your load times all that much, but i thought i'd mention it.

After a quick review, I think a "batch trait" might be possible.  Recall that Neo4jBatchGraph used the Neo4j BatchInserter and not a Neo4jGraphAPI instance - again unsure of how that fits here at the moment.  I'm guessing that BatchInserter is not exposed via apache licensed neo4j interfaces at this time so that would be another problem.



Michael Pollmeier

unread,
Jun 12, 2015, 8:25:40 PM6/12/15
to d...@tinkerpop.incubator.apache.org, gremli...@googlegroups.com
That's a good tip! I just tried with a different example in neo4j and
the graph creation time went from 50s using `sg.addVertex("some
label").setProperty(s"property ${math.random}", math.random)`
to 40s using `sg.addVertex("some label", Map(s"property ${math.random}",
math.random))`
>>>> https://groups.google.com/d/msgid/gremlin-users/557A0695.9090807%40michaelpollmeier.com
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to gremlin-user...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/gremlin-users/557A1819.2010108%40michaelpollmeier.com
>> .

Stephen Mallette

unread,
Jun 12, 2015, 9:45:54 PM6/12/15
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
wow - i didn't expect that to make such a difference.  i may need to look at the code again to see why there is such a big difference considering i dismissed it so easily as impacting performance.

Reply all
Reply to author
Forward
0 new messages