Is it possible to choose on which server the varieties have to be placed?

60 views
Skip to first unread message

Alexandr Porunov

unread,
Oct 31, 2017, 9:08:21 AM10/31/17
to JanusGraph users
Hello,

I need to choose where vertices are stored on the application side.

For example:
I have 3 servers: server1, server2, server3
I need to store vertices: user1, user2, user3, user4, user5

I store them randomly:
server1: user1, user2
server2: user3, user4
server3: user5

Then I add edges:
user1 and user3 are friends
user1 and user5 are friends
user3 and user5 are friends
user2 and user4 are friends

After some time (maybe once a week) I want to read the graph and optimize graph traversal by minimizing edge cuts.

I want to move user vertices to be stored like this:
server1: user2, user4
server2: user1, user3, user5
server5: empty

Is it possible to achieve with JanusGraph?

Will the traversal work if I place vertices by my own?

Best regards,
Alexandr

Misha Brukman

unread,
Oct 31, 2017, 7:41:42 PM10/31/17
to Alexandr Porunov, JanusGraph users
What are "server1", "server2" and "server3" in your example?

JanusGraph does not store data itself, it uses a storage backend, such as BerkeleyDB (embedded, in-process) or a distributed storage backend such as HBase, Cassandra, Bigtable, etc.

As such, JanusGraph does not manage the storage or location of data, it defers that to its storage backend, and I don't believe JanusGraph has any influence over where the data is stored as it is behind an abstraction layer from JanusGraph's perspective.

If your concern is about performance, the general recommendation is to benchmark a storage engine or a few with a representative workload and see if the performance matches your requirements. If not, you may need to either tune that storage backend appropriately or choose a different storage backend.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/a8c540db-7097-43d6-9c2a-6cd08877eff7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Pitera

unread,
Oct 31, 2017, 8:20:41 PM10/31/17
to Misha Brukman, Alexandr Porunov, JanusGraph users
I am not sure _why_ you need different "servers" in your example, however it seems that what you might want are different _graphs_. You can open different graphs using the new ConfiguredGraphFactory http://docs.janusgraph.org/latest/configuredgraphfactory.html.

As Misha said, you use backends to store your data. A graph just points to a backend, which is basically defined by a location/port and directory/keyspace/table. Therefore, if you want three separate representations to hold data, you might want to create a Template Configuration pointing to a specific Cassandra location for example, and then use the ConfiguredGraphFactory#create(graphName) to create 3 separate graphs, and this will make each graph store the data in its own keyspace in Cassandra. The documentation will even lead you through other examples using the ConfigurationManagementGraph and ConfiguredGraphFactory APIs.

If you do not want to use the ConfiguredGraphFactory APIs, but storing data in separate representations is your goal, then you can use the JanusGraphFactory#open(Configuration) or JanusGraphFactory#open(File location) method to open a new graph by supplying an entire configuration for your graph, where you would configure each to point to a separate backend or table/keyspace/directory.

Robert Dale

unread,
Oct 31, 2017, 9:04:09 PM10/31/17
to David Pitera, Misha Brukman, Alexandr Porunov, JanusGraph users
I believe Alexandr is referring to Graph Partitioning strategies - http://docs.janusgraph.org/latest/graph-partitioning.html

Robert Dale

On Tue, Oct 31, 2017 at 8:20 PM, David Pitera <pitera...@gmail.com> wrote:
I am not sure _why_ you need different "servers" in your example, however it seems that what you might want are different _graphs_. You can open different graphs using the new ConfiguredGraphFactory http://docs.janusgraph.org/latest/configuredgraphfactory.html.

As Misha said, you use backends to store your data. A graph just points to a backend, which is basically defined by a location/port and directory/keyspace/table. Therefore, if you want three separate representations to hold data, you might want to create a Template Configuration pointing to a specific Cassandra location for example, and then use the ConfiguredGraphFactory#create(graphName) to create 3 separate graphs, and this will make each graph store the data in its own keyspace in Cassandra. The documentation will even lead you through other examples using the ConfigurationManagementGraph and ConfiguredGraphFactory APIs.

If you do not want to use the ConfiguredGraphFactory APIs, but storing data in separate representations is your goal, then you can use the JanusGraphFactory#open(Configuration) or JanusGraphFactory#open(File location) method to open a new graph by supplying an entire configuration for your graph, where you would configure each to point to a separate backend or table/keyspace/directory.

master...@gmail.com

unread,
Nov 1, 2017, 8:22:08 AM11/1/17
to JanusGraph users
Sorry that I didn't mention about graph partitioning. As Robert Dale noticed I am referring exactly to graph partitioning.

As we can see from the documentation it is recommended to use 2x partitions from the server's count (or from the size you are going to have in the foreseeable future). It means that each server will have 2 or more partitions. In this case we can indirectly say on which server the vertices are placed.

We can change partitioning algorithm. But it will still be predefined.
The thing is that in most cases we can't predict where to place vertices to minimize edge cuts in the future. I showed an example where users are not friends from the beginning. After some time they become friends. And now you have a lot of edge cuts. The query like "get user friends" will ask a lot of nodes (in a large graph) to get the list of friends. It is a problem which is solved by Facebook by using Kernighan–Lin algorithm.
They are storing vertices randomly and then they are using Apache Giraph to process the graph (incremental) and move vertices from servers to servers to minimize edge cuts. 
Here is the link:

So, my question is:
Can we decide where the vertex is placed on the application level? Can we move vertices from server to server (or from partition to partition)? Or the partitioning have to be predefined?

Best regards,
Alexandr

среда, 1 ноября 2017 г., 1:41:42 UTC+2 пользователь Misha Brukman написал:
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages