JanusGraph as Primary Database/Source of Truth

77 views
Skip to first unread message

Raghavendar T S

unread,
May 8, 2020, 8:09:56 AM5/8/20
to JanusGraph users
Hi. We are a charging station company based out of India. We are in the process of building an IoT platform to manage the charging stations. We are planning to use JanusGraph (Cassandra + ES Index Store) as primary data base or source of truth to persist real world entities and relationships. Basically the clients including the web and mobile applications interact with our APIs which will be first persisted into the graph and to a message queue. We have our stream processors running in the background to build denormalized views in Elasticsearch. We use graph to do multi-level traversals and the results will be used to persist in ES (denormalized). Is it recommended to use JanusGraph as primary database? We are concerned about the production issues in case If we face any and the support we get from the community? I am pretty sure that lot of companies are using JanusGraph in production and I just want to gain some confidence. Are there any of the companies that use JanusGraph for real-time client facing application other than analytics? Your valuable inputs would make us take better decisions.

Thanks & Regards
Raghavendar T S

Oleksandr Porunov

unread,
May 8, 2020, 7:44:40 PM5/8/20
to JanusGraph users
Hi,

In short, it depends on your data. Some data are really well suited for JanusGraph and some data isn't.
JanusGraph is a graph database layer on top of other databases. Thus, you should understand your data to know which data store you should use (I am suggesting to use CAP theorem).
Cassandra consistency is configurable but there are no transaction isolations.
ElasticSearch consistency depends on index refreshing configuration and indexing time.
If your entities are added / updated rarely, you may configure your Cassandra for write consistency ALL, use ElasticSearch refresh API to ensure all your data is consistent, thus you "may" use read consistency level - ONE. Write QUORUM and read QUORUM guarantees that your responses are consistent but you should check tradeoffs with your own project.
JanusGraph is very well suited for data with many relations. I guess, if your project is IoT, then your data will be very connected, and thus, your data is suited well in JanusGraph.
Also, I find ScyllaDB has a better overall performance then Cassandra thus may be a good decision for real time data but you should check it with your own scenarios as ScyllaDB uses more CPU time (to reduce latency) then Cassandra.
That said, JanusGraph is well suited for real time as well as analytics if well configured (storage, index storage and JanusGraph itself).
As a small suggestion if you are using JVM based language and your real-time traversals are not very complex, I would recommend to use JanusGraph in embedded mode (https://docs.janusgraph.org/basics/configuration/#janusgraph-embedded) as it will enable you to use datastax cassandra driver queries routing optimizations and eliminates an additional hop. That said, you your queries are complex enough, it may be better to use JanusGraph servers which are located closer to storage servers. Again, it should be checked for specific scenarios.
The above information doesn't answer your questions precisely but maybe it may help somehow.

Raghavendar T S

unread,
May 9, 2020, 3:35:47 AM5/9/20
to JanusGraph users
Hi Oleksandr

It is very detailed and helpful explanation. I have pretty good experience with DataStax Graph in my earlier organisation and we are sure that our use cases will fit in JanusGraph. 
Since we are a startup we are not ready to use DataStax Graph because of licensing cost. Can you also give some information on how do we generally resolve production issues 
in case If we face any? Backup/Restore of the the Cassandra database is one of the option. We do not know the issues which we are going to face. We are only concerned about 
the production support.

Thanks & Regards
Raghavendar T S

Oleksandr Porunov

unread,
May 9, 2020, 1:42:56 PM5/9/20
to JanusGraph users
If your startup doesn't want to spend money on production support (i.e. licensing) the only support you can get is either to hire an expert in a specific field (profitable for long term but doesn't for short term) or use community support (is free but the level of support is smaller than support with licensing).
Both Cassandra and ScyllaDB has quite good community support. You can subscribe to Cassandra mailing lists here:
https://cassandra.apache.org/community/
Or use ScyllaDB google group here:
https://groups.google.com/forum/#!forum/scylladb-users

Moreover, many companies has special pricing plans for startups which might be quite good, so I would suggest to contacting them directly and ask if they has special offers / discounts for startups.

As a piece of advice is to not worry of production support on such an early stage startup. If your startup becomes profitable, most likely you will have money to buy production licenses of hire a specialist in the concrete field. If your startup doesn't take off, than it doesn't meter if there is any support available because you won't need it.
Of course choosing a right data store is critical for the business but it is very rarely to see a startup doesn't take off because of data store or some technologies they choose.

Getting back to your original question. It depends on your size and the concrete use-case. Backup and restore is good of-course but it takes time to restore a large DB data set which is sometimes critical. I would suggest to configure a right replication, making compaction regularly, use monitoring and try to prevent such scenarios when you need to restore you data from backup.

I wish luck to your startup!

Raghavendar T S

unread,
May 10, 2020, 2:19:18 AM5/10/20
to janusgra...@googlegroups.com
Hi Oleksandr

Very much thanks for your valuable information. Just for your information, we contacted DataStax and there are no startup programs and we
are supposed to purchase license If we are on production.

Thanks & Regards
Raghavendar T S
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/9a642f9b-ba51-4742-a982-d3e5283c9e3e%40googlegroups.com.


--
Raghavendar T S

Samik Raychaudhuri

unread,
May 15, 2020, 8:08:30 AM5/15/20
to janusgra...@googlegroups.com
Based on our experience, and the talks I have heard in various meetups, I wouldn't recommend using JanusGraph + Cassandra + ES as the back-end for OLTP type queries. Consider using a different database (even Cassandra), or a caching layer in between for real-time purposes. The best performance you can get is as Oleksandr suggests: use Janusgraph in the embedded mode, but I suspect that will not satisfy your requirements completely.

Best.
-Samik
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/9a642f9b-ba51-4742-a982-d3e5283c9e3e%40googlegroups.com.

--
Samik Raychaudhuri, Ph.D.
http://in.linkedin.com/in/samikr/

Oleksandr Porunov

unread,
May 15, 2020, 12:03:09 PM5/15/20
to JanusGraph users
Hi Samik,

Would you mind sharing your experience and talks which you had? Why did you made such conclusions?

Here are some good experiences which use JanusGraph + ScyllaDB +  ElasticSearch in production:

Also, a tutorial which might be helpful:

It would be really helpful if you could share you experience / research as well even if that experience is negative.

Best regards,
Oleksandr
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages