Schema and Data Migration

Stardust Nick

unread,

Jan 27, 2021, 12:35:38 PM1/27/21

to Gremlin-users

Hello,

I would like to ask a question, if there is any recommended

way to do schema and data migration.

I am using JanusGraph as the db.

Thanks in advance,
Nick.

Stephen Mallette

unread,

Feb 1, 2021, 6:59:54 AM2/1/21

to gremli...@googlegroups.com

I'm afraid that there aren't any tools that I'm aware of that helps with this process. You typically need to just write Gremlin scripts to modify schemas or do data migrations. If you have an especially large graph you will likely want to use Spark or some other approach to process your data in parallel.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/f325e4bb-6c7e-4434-91d5-e7c7bba88588n%40googlegroups.com.

Ryan Wisnesky

unread,

Feb 1, 2021, 2:52:22 PM2/1/21

to gremli...@googlegroups.com

We have some prototype technology that allows users to express graph schemas, write down schema mappings, and then emit gremlin code to re-structure the graph from one schema to another. If you have a particular use case, we could try it out and see what happens - do please contact me off-line.

Ryan

> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAA-H438aqVN5%2B%3D5QLoCys6C%2BrVYKd9f3Mo%2BMyuPD%2B1sxV1pa2w%40mail.gmail.com.

Joshua Shinavier

unread,

Feb 1, 2021, 7:19:52 PM2/1/21

to gremli...@googlegroups.com

FWIW, we used JanusGraph in the early days of Uber's Risk & Safety knowledge graph, and there was a schema migration utility I wrote and briefly described in a Graph Day talk. Open-sourcing the code may be more trouble than it is worth, but you might consider following a similar approach:

Define schemas in a high-level language. We used the simple schema language described in the slide show, with a YAML-based serialization format. This was a precursor to the Dragon language I've mentioned a few times on this list.
Compute diffs between adjacent versions of a schema. A diff is an ordered list of basic changes with types like the ones here.
Translate diffs into calls to JanusGraph's schema API. Our tool mapped basic changes into instances of classes like AddPropertyKey, AddEdgeIndex, etc. which encapsulated the JG calls appropriate to each change.

Initialization of a new database is the same as applying a diff between the empty schema and the first version of the schema you intend to use to constrain your data. When you are ready to advance the graph to version 2 of your schema, apply the version1-version2 diff. Note that only backward-compatible changes can be (safely) made in this way.

Josh

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/1B6D778D-0A45-430F-A416-281D325AA9AD%40conexus.com.

Reply all

Reply to author

Forward