Currently, the schema for JanusGraph is basically only a list of allowed labels (for vertices and edges) and available properties. What's missing in my opinion is the option to specify which vertex and edge label can have which property keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
This is of course only a toy graph, but I suspect that most real-world data models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph then they can be sure that no user of their database can insert data that doesn't comply with these constraints (e.g., a brother edge that connects a god with a location). So, a strict schema ensures that the graph is in a consistent state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the client applications as those applications need to know how they can access the data. So even if the database is schema-less, there is still an implicit schema. This means that updating the (implicit) schema isn't really easier without having it explicitly defined in the database as it needs to be changed in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy to tell new users what kind of data they can expect, e.g., they know that a location can't have an age, but a god can. This would also allow tools to fetch the schema from a JanusGraph instance to visualize it. Such a visualization makes it much easier to reason about the schema as it provides an easy to understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper) tools to fetch the schema from JanusGraph and translate it into entity classes which makes it possible to only have the schema defined in just one place (DRY principle).
So, in short, I propose that JanusGraph gets a strict schema, either as the only option or as an additional option for backwards-compatibility with existing deployments and their data models.
Regards,
Florian
[1] We actually had the problem with our JanusGraph database that it contained data which shouldn’t be possible. Our schema models the network traffic of malware samples, so we have edge labels like SampleToDomain or SampleToIp that connect samples with domains or IP addresses they contacted. At some point we found edges in our graph that connected samples with domains and had an edge label of SampleToIp which is problematic as our applications of course expect an IP address when they follow a SampleToIp edge.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/69428dda-baa3-489c-99a1-c316e0728e09%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex.