[DISCUSS] Splitting janusgraph-cassandra

312 views
Skip to first unread message

Samant Maharaj

unread,
Mar 26, 2017, 10:11:48 PM3/26/17
to JanusGraph developer list
Hi,

As part of implementing the janusgraph-cql backend, we initially looked into adding the new CQL backend into the existing janusgraph-cassandra project.

We ultimately ended up not taking this approach because it would mean that any code depending on the CQL backend would still end up pulling cassandra-all, astyanax and all their dependencies which is quite undesirable.

We'd like to know what everyone thinks of the idea of splitting janusgraph-cassandra into core and implementation specific modules. If we implemented this, CQL could be moved in as another backend without any unrelated dependencies.

I've implemented this change in a branch in our repository here: https://github.com/orionhealth/janusgraph/tree/feature/cassandra-refactor

Some notes:
  • This covers all the existing backends but not the port of the CQL backend to the new structure.
  • We've simplified the testing setup code around bootstrapping Cassandra which enables running the tests easily from your IDE. It also removes the current dependency on inter-project structure where the other projects need to know how to use the resources in janusgraph-cassandra to bootstrap Cassandra.
  • We've introduced a cassandra-test project that is included in test scope similar to janusgraph-test that contains the shared testing code.
Samant Maharaj & Paul Kendall

Jason Plurad

unread,
Apr 4, 2017, 10:26:05 AM4/4/17
to JanusGraph developer list
Bump. I think the breakdown is fine. It looks similar to what's going on in the janusgraph-hbase directory.

I'd suggest that the root remains `janusgraph-cassandra` rather than `cassandra` as your repo has it.

Samant Maharaj

unread,
Apr 4, 2017, 4:03:20 PM4/4/17
to JanusGraph developer list
The reason for the simplified naming was to avoid nested modules having quite long names.

For example, the nested astyanax module would be called 'janusgraph-cassandra-astyanax'.

The other issue is that following that convention, the janusgraph-cassandra artifactId would now actually refer to the parent POM of all the cassandra modules which would conflict with its earlier incarnation. The other way to resolve this would be to call it 'janugraph-cassandra-parent' similar to hbase and hadoop. However this is quite an unconventional approach.

Personally I'm in favour of a less verbose naming scheme but it's not something I'm too concerned about.

Alexander Patrikalakis

unread,
Apr 4, 2017, 9:04:56 PM4/4/17
to JanusGraph developer list
Yeah, we can't reuse the janusgraph-cassandra artifact id.

sjudeng

unread,
Apr 5, 2017, 8:55:22 PM4/5/17
to JanusGraph developer list
Samant & Paul,

Thanks for your work on this. I like what you've done to clean up the project configs in the other modules in favor of the cassandra-test module.

I do agree that for consistency it would be best to include the janusgraph- prefix in the artifact id, at least for the base cassandra module. Whether it can be janusgraph-cassandra or needs to be janusgraph-cassandra-parent, I don't know. Since presumably you'll retarget to 0.2.0-SNAPSHOT, as long as there are no 0.2.0 snapshot deployments before your PR gets merged, wouldn't it be fine to reuse janusgraph-cassandra in this new context?

Also what about making the groupId org.janusgraph.cassandra for the sub-modules (core, embedded, etc.)? I'm just thinking of the eventual repository layout (see https://oss.sonatype.org/content/repositories/snapshots/org/janusgraph/) and wondering if it would be better organization to group the sub-modules under the additional cassandra group element.

Samant Maharaj

unread,
Apr 6, 2017, 4:47:40 PM4/6/17
to JanusGraph developer list
I really don't think it's advisable to reuse the janusgraph-cassandra artifactId. It's extremely unexpected for the meaning of an artifactId to change after a version bump.

Essentially janusgraph-cassandra would cease to mean the jar containing the JanusGraph Cassandra backend implementation and would become the parent POM for the Cassandra modules. This would have strange effects on anyone depending on that artifactId after the version change.

In regards to using different groupIds for sub-modules, that's not a bad solution but given the relatively small number of modules overall might be a bit unnecessary.

I favour continuing to use the org.janusgraph groupId for all modules and dropping the janusgraph prefix both in the directories and in the artifactIds as it seems a bit redundant since all the modules are scoped under the org.janusgraph groupId.

Another possibility to consider is to introduce an intermediate level for backend and index implementations:

/
├── docs/
├── dist/
├── janusgraph/
│   ├── core/
│   └── test/
├── backend/ │ ├── berkeleyje/ │ ├── cassandra/ │ | ├── core/ │ | ├── test/ │ | ├── astyanax/ │ | └── cql/ │ └── hbase/ │ ├── core/ │ ├── test/ │ ├── hbase-098/ │ └── hbase-10/ ├── index/ │ ├── elasticsearch/ │ ├── lucene/ │ └── solr/ └── hadoop/

This would allow for more easily creating shared test support projects such as for all index implementations as there would be a natural place to place such projects. Additionally the parent POMs at each group would also be a great place to manage group level build configuration. This would also simplify the root POM and make maintenance easier as configuration specific to each group could be moved into the group level poms.

In terms of artifact naming, one approach could be to use the project path to create the artifactId, so the Cassandra Astyanax module would be 'backend-cassandra-astyanax'. This would be consistent with the directory structure and would communicate the purpose of the artifact as well. Such an approach would also naturally prevent any name collisions for future modules.

Sorry for the long post but I thought it would be good to put up a more detailed straw man so we can continue the discussion.

Regards,
Samant

sjudeng

unread,
Apr 10, 2017, 9:00:36 PM4/10/17
to JanusGraph developer list
The broader reorganization sounds interesting but I think it would probably need a separate discussion. How about keeping the janusgraph-cassandra prefix (janusgraph-cassandra-parent, etc.) for now just for consistency with the rest of the project and then pursing the broader project reorg (dropping janusgraph prefix and/or intermediate levels, etc.) later?

Samant Maharaj

unread,
Apr 10, 2017, 11:54:55 PM4/10/17
to JanusGraph developer list
I think that's probably a good way to go forward. I'll change to janusgraph-cassandra-parent and if there are no objections I can create a pull request for the changes.

sjudeng

unread,
Apr 12, 2017, 3:14:17 PM4/12/17
to JanusGraph developer list
You might start a new VOTE thread with your proposal since it is pretty significant. I'd specifically list out the final module removals/additions (e.g. removing org.janusgraph:janusgraph-cassandra, adding janusgraph-cassandra-parent, cassandra-thrift, cassandra-embedded, etc.) that are in the PR in case anyone hasn't looked at the repo and/or followed all the discussion here.

Ted Wilmes

unread,
Apr 17, 2017, 9:40:25 AM4/17/17
to JanusGraph developer list
This discussion got me thinking, with the addition of the CQL backend, should we be deprecating the current Cassandra thrift-based backends at some point?  We wouldn't want to do that immediately because those have the benefit of being around for a long time and going through production use & bug fixes, but it seems like that would be the natural progression in the not too distant future. This would require an entirely different discussion, but I bring it up here because I wonder how/if doing that would change this proposal.

Thanks,
Ted

sjudeng

unread,
Jun 17, 2017, 1:28:41 PM6/17/17
to JanusGraph developer list
Hi Samant,

Are you still able to move forward with getting your work on this submitted? If so do you want to call a vote on the proposed refactoring or do you want me to? One way or another I think the refactoring is definitely needed. It came up in a recent PR where configuration needed to be duplicated in janusgraph-cassandra and janusgraph-cql because of the current state of things.

Thanks!

sjudeng

unread,
Jun 23, 2017, 11:45:50 PM6/23/17
to JanusGraph developer list
All,

Samant has said he should be able to continue working on this effort. The current proposal is to refactor into the following structure.

├─ janusgraph-cassandra-parent/
│   ├── astyanax
│   ├── cql
│   ├── core
│   ├── embedded
│   ├── test
│   ├── thrift

+1 for me. I think it's a step in the right direction and I'd hate to see the work already done on this go to waste.

Jason, is this something that needs a separate vote thread or can the work/PR just move forward?

Samant, What do you think about Ted's question above?

sjudeng

unread,
Jun 29, 2017, 10:38:03 PM6/29/17
to JanusGraph developer list
Samant, It's been 5 days and no one has yelled too loud so I think it's reasonable to move forward with your PR on this if you're still up for it.

Ted, I don't think the reorganization here would cause any additional complexities with the eventual deprecation of Thrift. On the surface it seems to me it might make it simpler to carry out the deprecation/removal if the Thrift components are separated into a standalone module like here.

Ted Wilmes

unread,
Jun 30, 2017, 9:08:44 AM6/30/17
to sjudeng, JanusGraph developer list
Yeah, seems like it would be easier to tease apart the pieces we wanted to deprecate after this restructure so I think it makes good sense.

--Ted

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph developer list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-dev/01ft3Fcoht4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samant Maharaj

unread,
Jul 3, 2017, 7:58:47 PM7/3/17
to JanusGraph developer list
I'm now working on bringing the branch up to date against master and will raise the PR as soon as it's ready.

There've been a few changes since I did that initial work so it might take a short while to get it all squared away.

Regards,
Samant 

Samant Maharaj

unread,
Jul 20, 2017, 5:18:07 PM7/20/17
to JanusGraph developer list
I’ve been putting a significant amount of time into the proposed cassandra split and have a branch here: https://github.com/orionhealth/janusgraph/tree/feature/cassandra-split with the core changes.

As part of this work I’ve improved the unit testing setup so that it’s not driven by Maven. This has had some flow on effects as the existing build system has the vast majority of maven config in the root POM. I have another branch based on the cassandra-split branch with the more significant maven changes here: https://github.com/orionhealth/janusgraph/tree/feature/maven-refactoring

In my opinion the existing maven setup does not follow best practices in a number of areas and my refactoring branch’s goal is to simplify the build system and to make it easier to work with and modify. I’d like to get your opinions on whether this is a worthwhile change and I’d be very happy to discuss the merits of my approach with you. In any case my overall goal is to complete the cassandra-split to the point that it can be merged and it is up for discussion as to whether that includes the additional maven refactoring.

It is probably a good idea for the maven refactoring discussion to be moved to a different thread or alternatively I could create a room on Gitter for discussion.

Thoughts?

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages