I would suggest that this is more of a data modeling issue than a technology one. I saw something similar at my day job a couple of weeks ago.
tl; dr: design your keys and properties to avoid conflicts and to support having all of the instance graphs in one graph database.
Guessing at your data model, I suspect that each instance is some permutation of highly similar data. So that it looks something like this:
instance 1:
- node1: (label: A, id: 1, str: 'def'}
- node2: (label: B, id: 2, str: 'ghi'}
- edge1: {label: C, src_id: 1, dst_id: 2, mag: 0.50}
instance 2:
- node1: (label: A, id: 1, str: 'def'}
- node2: (label: B, id: 2, str: 'ghi'}
- edge1: {label: C, src_id: 1, dst_id: 2, mag: 0.25}
I'm sure that your data is a lot more interesting than my trivial example, but it should be sufficient to illustrate the point.
I'm expecting that the workflow is something like this:
1. extract a specific instance into memory - this will be most if not all of the instance
2. perform some set of analysis or computations which has specific expectations around the id's in order to compare results
3. save results somewhere for further analysis
I suggest changing the data model to something like the following will support your workflow and storing of multiple instances in one database:
instance 1:
- vertex1: (label: A, id: '1:1', str: 'def', instance_id: 1, vertex_id: 1}
- vertex2: (label: B, id: '1:2', str: 'ghi', instance_id: 1, vertex_id: 2}
- edge1: {label: C, src_id: '1:1', dst_id: '1:2', mag: 0.50}
instance 2:
- vertex1: (label: A, id: '2:1', str: 'efd', instance_id: 2, vertex_id: 1}
- vertex2: (label: B, id: '2:2', str: 'igh', instance_id: 2, vertex_id: 2}
- edge1: {label: C, src_id: '2:1', dst_id: '2:2', mag: 0.25}
The trick here is to make the vertex's primary key, in this case the id property, a composite of the instance designation (instance_id) and the vertex designation (vertex_id). I've done that as a string concatenation, but there are other ways. For example, if each instance has fewer than 1,000 vertices, then you can multiply the instance_id by 1000 and then add the vertex id value. For instance 1, the IDs would be vertex1: 1001 and vertex2: 1002.
I do have a slight preference of integers over strings for the id values for performance reasons. But there are good "easily, quickly deciphered by humans" reasons for using strings, especially when the id performance costs are overshadowed by the cost/time of the other computations.
Note that this changes your analytics computations slightly since it must use the vertex_id values for doing comparisons between instances, and not the id value itself.
It should also simplify things in that you can get all of the vertices for a single instance with something like:
g.V().has('instance_id', 1)
I'm unsure of the JanusGraph indexing capabilities, but there should be some index in place on instance_id to avoid scanning all of the data in the graph every time a single instance is queried.
Best,
Josh