Managing node properties translations with Neo4j

114 views
Skip to first unread message

Christophe Willemsen

unread,
Jul 28, 2012, 3:06:42 PM7/28/12
to ne...@googlegroups.com
Hello,

I wrote a little blog post where I explain how I sketched node translations properties design in Neo4j.

I came up at the end with the following schema :

I was asking some feedback but I think comments are disabled on tumblr if you're not one of my followers, and I didn't want to write blog posts in English on my personal blog in French, so I open this entry to receive some feedback and comments.

I received an awesome answer from Jim Webber on Twitter, this one :

what about (post)-[:EN-uk]->(english), (post)-[:FR-fr]->(french) Where post contains metadata and the other nodes contain text?

I was really amazed by his answer as it is completely others than mine. He warns me by the way that this will works only if I don't care who add the translation.

I have just two problems with this design, and maybe someone can argument it if I'm thinking bad :

  1. The available languages are now managed with relationship_types. AFAIK you can not categorize the relationship types to say these relations types are for languages, so you can not really have a good structure for managing the available languages for the website.
  2. You have to traverse all nodes to calculate the translation percentage of the web application ( I'm more thinking here to nodes containing user interface translations rather than blog posts)
On the other side, what I really like with what Jim proposed to me, is that the initial data is also a related node to the blogPost node, it can gives possibilities like publish/unpublish blogPost in initial language by transfering the publish property in all translated nodes rather than in the initial blogPost node.

I'm maybe wrong ? So I'll be really thankful to people clarifying these points and why not proposing other implementations design.


Thank you very much,

Regards,

Christophe


Jim Webber

unread,
Jul 28, 2012, 4:32:39 PM7/28/12
to ne...@googlegroups.com
Hi Christophe,

This is way more expressive than 140 characters :-)

> • The available languages are now managed with relationship_types. AFAIK you can not categorize the relationship types to say these relations types are for languages, so you can not really have a good structure for managing the available languages for the website.

True. But in the same way you can use a node for en-UK and fr-FR you could add an index for those languages too.

> • You have to traverse all nodes to calculate the translation percentage of the web application ( I'm more thinking here to nodes containing user interface translations rather than blog posts)

Agreed. A language node helps there as does an index of posts for each language.

> On the other side, what I really like with what Jim proposed to me, is that the initial data is also a related node to the blogPost node, it can gives possibilities like publish/unpublish blogPost in initial language by transfering the publish property in all translated nodes rather than in the initial blogPost node.


If you keep the suggestion I made:

(french)<-[:fr-FR]-(post)-[:en-UK]->(english)

Then index the relationships then you can calculate percentages easily.

Or you could extend the model such that:

(fr-FR { count : 4567 } )<-[:WRITTEN_IN]-(french)<-[:fr-FR]-(post)-[:en-UK]->(english)-[:WRITTEN_IN]->(en-UK { count : 1234 } )

And then in the "language" nodes you can easily keep a count of incoming WRITTEN_IN relationships (don't bother counting them each time for a large number of relationships, it's quicker to have even that information duplicated so we don't have to trawl through the relationship store).

Now you could argue that there's duplication in the graph. But in response I'd argue that the nodes and relationships serve different purposes. The "language nodes" are there to provide high-level metadata - the percentage of nodes in English, French, German, Japanese, etc.

The relationships are there to help the when they're looking at an article and they'd prefer a different language. For example I think it's an easy mapping between these relationships and the HTTP Accept-Language header.

Just my thoughts. I think really the answer is model whatever is in your domain faithfully and don't worry if it seems denormalised compared to a relational model. As long as you're faithful to the domain, the graph will be faithful to you.

Jim

Christophe Willemsen

unread,
Jul 29, 2012, 5:46:04 AM7/29/12
to ne...@googlegroups.com
Hi Jim,

Thank you very much for your answer.

First, I'm glad that you say true to my thoughts, I'm not that bad then :)

Second, your explanation is a really nice example of relationships index use case, something that can be mysterious when we start with graph db's.

The mapping between relationship indexes and HTTP Accept-Langage headers is really interesting and I will surely look deeper on that.

I'll try your implementation with a initial set of nodes and I'll report back here.

BTW I think that such use cases should be modeled as examples in the Neo4j docs, not as strict models but as possible implementations examples.

Thanks and have a nice sunday !

Regards,

Christophe

Niels Hoogeveen

unread,
Jul 30, 2012, 7:02:17 AM7/30/12
to ne...@googlegroups.com
In multispective.com, we have taken another approach to translations, which is certainly not as impressive as the examples given in the blog post.
 
Let me start by saying that multispective doesn't use regular property names. Instead the property name is a String representation of the Id of an associated property-type. This makes it possible to change property names and frees up the namespace of properties.
 
Translations are added directly as properties on the original node, where the property name consists of the id of the property-type concatenated with a separator and a locale code.
 
It may not be the most elegant solution,especially if you want to extract statistics about translation use, or want a list of untranslated entries, but it is a solution with very little overhead cost.
 
Niels

Christophe Willemsen

unread,
Jul 30, 2012, 7:58:33 AM7/30/12
to ne...@googlegroups.com
Hi Niels,

Thanks for your feedback.

I understand by that that you are defining the schema of entities in the database, as Drupal do with Content Types ? It's interesting but yeah, I have some requirements and the fact that translations are not separated could cause some problems while wanting dedicate translations to some privileged users.

Thanks again for your £0.02

Grtz
Reply all
Reply to author
Forward
0 new messages