Add support to AWS Keyspaces (or DynamoDB) as storage backend #2154

Nicolò Marchi

unread,

Jul 2, 2020, 9:42:02 AM7/2/20

to JanusGraph developers

Hi all,

I'm reporting here a feature request that i added as issue on the Janusgraph github project.

Describe the feature:
Add the possibility to use AWS Keyspaces as storage backend since it is a SaaS service and compatible with Cassandra. Or add the possibility to use AWS DynamoDB as storage backend using the old connector made by AWS Labs.
These could be good feature since developers could use managed services and avoid the difficulty in manage an "important storage like Cassandra"

Describe a specific use case for the feature:
In a completely SaaS architecture is common to have DBs and other technologies provided as a service. Have the possibility to connect Janus to an already existing (for other purposes) Keyspaces cluster or DynamoDB endpoint gives the possibility to have all data in a single service.

Is it something that could be feasable? Or something that someone else could use or need??

What do you think about it?

Cheers,

Nicolò

Mick Delaney

unread,

Aug 24, 2020, 3:36:00 PM8/24/20

to JanusGraph developers

Yes, seems compelling given the pricing of Keyspaces

Pat Rice

unread,

Nov 29, 2020, 4:45:54 PM11/29/20

to JanusGraph developers

FYI, I've been poking around at this, and it won't work right now due to limitations within's AWS's Cassandra implementation. Upsetting indeed, as the MCS pricing (especially for smaller dev environments) is very attractive over using something like EMR to host hbase.

To prototype what it would look like I did the following actions:

I enabled SSL. This is obvious and well documented, but required for MCS. AWS only exposes SSL traffic.
AWS MCS uses a proprietary partitioner - JanusGraph expects a set of hard-coded partitioners which are expressed within the CQLStoreManager class. I added AWS's partitioner to this block and tested with several different ordering configurations. Adding the partitioner here allows us to get part that issue, but causes a new issue:
When you resolve the DNS for AWS's MCS, you're connecting to a single Cassandra node, which then publishes a set of peers. By default, DataStax's driver will then attempt to connect to those peer IPs, which AWS doesn't expose. This causes an issue where the driver will attempt to connect to the broadcast peers which are not exposed. To resolve this issue, you add an AddressTranslator to the cluster builder (also in CQLStoreManager) that translates all those private IP address peers into the reported AWS endpoint - essentially you have 1 main contactpoint (the one you pass in via the hostname attribute in the properties file), and you overwrite all the peers with that contactpoint. This allows you to successfully connect to MCS and start to initiatlize the cluster.
Finally, when JanusGraph starts initializing, it runs queries that require the TOKEN keyword from Cassandra - these would be required any time you do a large table scan style query. TOKEN is not currently supported in AWS MCS (https://forums.aws.amazon.com/thread.jspa?messageID=943452 among other help articles from AWS) and thus throws a "Token not yet supported" exception. This is coming from initializing the CQLKeyColumnValueStore class.

If anyone has ideas of where to go from here, I'm certainly open to doing some additional testing, but it seems like that's a limitation that may be unworkable here based on my admittedly limited understanding of how JanusGraph is using Cassandra behind-the-scenes.

Reply all

Reply to author

Forward