Hi Christophe,
This is way more expressive than 140 characters :-)
> • The available languages are now managed with relationship_types. AFAIK you can not categorize the relationship types to say these relations types are for languages, so you can not really have a good structure for managing the available languages for the website.
True. But in the same way you can use a node for en-UK and fr-FR you could add an index for those languages too.
> • You have to traverse all nodes to calculate the translation percentage of the web application ( I'm more thinking here to nodes containing user interface translations rather than blog posts)
Agreed. A language node helps there as does an index of posts for each language.
> On the other side, what I really like with what Jim proposed to me, is that the initial data is also a related node to the blogPost node, it can gives possibilities like publish/unpublish blogPost in initial language by transfering the publish property in all translated nodes rather than in the initial blogPost node.
If you keep the suggestion I made:
(french)<-[:fr-FR]-(post)-[:en-UK]->(english)
Then index the relationships then you can calculate percentages easily.
Or you could extend the model such that:
(fr-FR { count : 4567 } )<-[:WRITTEN_IN]-(french)<-[:fr-FR]-(post)-[:en-UK]->(english)-[:WRITTEN_IN]->(en-UK { count : 1234 } )
And then in the "language" nodes you can easily keep a count of incoming WRITTEN_IN relationships (don't bother counting them each time for a large number of relationships, it's quicker to have even that information duplicated so we don't have to trawl through the relationship store).
Now you could argue that there's duplication in the graph. But in response I'd argue that the nodes and relationships serve different purposes. The "language nodes" are there to provide high-level metadata - the percentage of nodes in English, French, German, Japanese, etc.
The relationships are there to help the when they're looking at an article and they'd prefer a different language. For example I think it's an easy mapping between these relationships and the HTTP Accept-Language header.
Just my thoughts. I think really the answer is model whatever is in your domain faithfully and don't worry if it seems denormalised compared to a relational model. As long as you're faithful to the domain, the graph will be faithful to you.
Jim