What are the best options to save and retrieve multilingual content with RavenDB?

532 views
Skip to first unread message

Alex

unread,
Oct 10, 2011, 4:21:53 PM10/10/11
to ravendb
Hello,

I use RavenDB on my site.
Soon I'm going to make a site in different languages.

What approaches and data structures to save multilingual content can
you suggest?

Thank you,
Alex

Ayende Rahien

unread,
Oct 11, 2011, 4:14:09 AM10/11/11
to rav...@googlegroups.com

Dody Gunawinata

unread,
Oct 11, 2011, 5:22:06 AM10/11/11
to rav...@googlegroups.com
Pretty much use TranslatedString class (listed on the first thread) as
replacement for string type and call the appropriate methods depending
the situation. You have to take care of create default, create in new
language, edit default,edit a specific language, view default, view
specific language scenarios - the thread has all of it.

It's probably a good idea to write a couple of pages of documentation
to deal with this issue.

--
nomadlife.org

Itamar Syn-Hershko

unread,
Oct 11, 2011, 5:24:25 AM10/11/11
to rav...@googlegroups.com
Dody, we will be launching our new site in the next days or so, we would really like it if you could write up an article there about that, discussing the problem and the different approaches.

We have an area in the site especially for those type of things.

Itamar Syn-Hershko

unread,
Oct 11, 2011, 5:24:55 AM10/11/11
to rav...@googlegroups.com
launching in beta that is....

Dody Gunawinata

unread,
Oct 11, 2011, 5:38:12 AM10/11/11
to rav...@googlegroups.com
I'll come up with a draft this or next weekend depending on situation.
I am based in Cairo, Egypt and this weekend looks 'interesting' after
the Copts vs Military debacle last Sunday.

On Tue, Oct 11, 2011 at 11:24 AM, Itamar Syn-Hershko
<ita...@hibernatingrhinos.com> wrote:

--
nomadlife.org

Itamar Syn-Hershko

unread,
Oct 11, 2011, 5:44:51 AM10/11/11
to rav...@googlegroups.com
Cool, thanks, and be safe

ZNS

unread,
Oct 22, 2011, 4:41:52 PM10/22/11
to ravendb

I think documentation on this topic would be great. I have implemented
Dody's solution myself and it works great for me. I have also extended
it with a custom list and interface for managing more complex objects.
I'd also like to see best practices for indexing multilingual
documents, considering collation and stop words for example. Let me
know if I can help in any way, and take care in Egypt.

On Oct 11, 11:38 am, Dody Gunawinata <empirebuil...@gmail.com> wrote:
> I'll come up with a draft this or next weekend depending on situation.
> I am based in Cairo, Egypt and this weekend looks 'interesting' after
> the Copts vs Military debacle last Sunday.
>
> On Tue, Oct 11, 2011 at 11:24 AM, Itamar Syn-Hershko
>
>
>
>
>
>
>
>
>
> <ita...@hibernatingrhinos.com> wrote:
> > Dody, we will be launching our new site in the next days or so, we would
> > really like it if you could write up an article there about that, discussing
> > the problem and the different approaches.
> > We have an area in the site especially for those type of things.
>
> > On Tue, Oct 11, 2011 at 11:22 AM, Dody Gunawinata <empirebuil...@gmail.com>
> > wrote:
>
> >> Pretty much use TranslatedString class (listed on the first thread) as
> >> replacement for string type and call the appropriate methods depending
> >> the situation. You have to take care of create default, create in new
> >> language, edit default,edit a specific language, view default, view
> >> specific language scenarios - the thread has all of it.
>
> >> It's probably a good idea to write a couple of pages of documentation
> >> to deal with this issue.
>
> >> On Tue, Oct 11, 2011 at 10:14 AM, Ayende Rahien <aye...@ayende.com> wrote:
> >> > This was discussed before, take a look at :
>
> >> >http://groups.google.com/group/ravendb/browse_thread/thread/720089ac6...
>
> >> >http://groups.google.com/group/ravendb/browse_thread/thread/160e1fa6d...

Dody Gunawinata

unread,
Oct 23, 2011, 12:33:59 PM10/23/11
to rav...@googlegroups.com
I am almost done with the article - hopefully it'll be ready by
Tuesday. I am just addressing the basic issue of multi language
support and there are rooms for more discussions regarding this
topic;it will be amazing if you can discuss indexing. Multi language
value support is a PITA in any storage system.

--
nomadlife.org

Itamar Syn-Hershko

unread,
Dec 4, 2011, 12:40:49 PM12/4/11
to rav...@googlegroups.com
Dody, any luck with this?

Karl Cassar

unread,
Feb 5, 2013, 9:20:20 AM2/5/13
to rav...@googlegroups.com, ita...@hibernatingrhinos.com
Any updates on RavenDB best practices for multilingual content?

Regards,
Karl

Troy

unread,
Feb 5, 2013, 10:39:07 AM2/5/13
to rav...@googlegroups.com, ita...@hibernatingrhinos.com
I am curious as well.. especially when it comes to indexing and having a client search for items using a specific language.

Itamar Syn-Hershko

unread,
Feb 6, 2013, 3:35:05 AM2/6/13
to rav...@googlegroups.com
There's more than one correct way of doing that. We just built something similar with another technology, and were having quite a bit of a discussion about it.

The main problem you are going to face is not in the indexing side, where you usually have prior knowledge about the language or you can use language detection, but on the query side. If your UIs are completely separate to a point where query language is easily identifiable, then you can probably use multi-tenancy or semantic IDs ("products/123/language")

Basically what I'm saying is there are quite a lot of factors that can influence such a design, and definitely no one magic solution.


--
You received this message because you are subscribed to the Google Groups "ravendb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Message has been deleted

Troy

unread,
Feb 6, 2013, 9:01:41 AM2/6/13
to rav...@googlegroups.com
"We just built something similar with another technology, and were having quite a bit of a discussion about it."

Would you share where this discussion might be? I saw another thread about it IIRC, but if this is an open discussion would love to peek in.

Thanks Itamar!

Kijana Woodard

unread,
Feb 6, 2013, 10:25:15 AM2/6/13
to rav...@googlegroups.com
I haven't tried translation with Raven, but from experience, I would translate the entire document as devmondo suggests rather that translating each field. Then have some fall back strategy for missing translations such has
es-mx -> es -> en.


On Wed, Feb 6, 2013 at 7:26 AM, devm...@hotmail.com <devm...@hotmail.com> wrote:
we have tried, Dictionary, Translate string, and each has problem

my humble advise is add a Language property to each Model, and then force the user in the administration panel  to select one when he add or edit the record, and if he wants a new translation then give him ability in the UI to  duplicate the record and select the new translation, the users like it, as sometimes they don't need to translated all the properties of a model, so when they duplicate it, they have the option to either modify the properties needed to be translated, which will save them time from having to re enter all information if not required, or start a fresh one.

on the front side, you just give a search by language option and query on the records with chosen language, or get the current Culture and query on records with language that matches that.

and  the way i store language is like this 
Language = "en-US"
or language = "ar_SA"

of course sometimes you need a main Record with data in specific language and others with translated ones, you can do this by creating a property on the translated record which points to the main Record Id, and pull the essential properties from there

hope it helps

Itamar Syn-Hershko

unread,
Feb 6, 2013, 11:12:48 AM2/6/13
to rav...@googlegroups.com
It was an internal discussion about a very specific implementation of a search engine. My point is tons of business rules and decisions will influence everything about such a system, and there are no rules. You just have to be familiar with the tech you use to be able to provide solutions. That said, if you have a specific scenario you are trying to solve, you can just raise it here to get opinions and advice.

Karl Cassar

unread,
Feb 6, 2013, 11:28:42 AM2/6/13
to rav...@googlegroups.com
Yes the issue is that some fields might not be multilingual in the document, and you would end up with a lot of duplicated content which I would like to avoid.

Apart from that, I would need to 'analyse' the text using Lucene Analysers, based on the language.  As far as I know, you can only specify one Analyser per field. Is it possible for the analyser to change, based on the language code?

For example:

  • Document.Subject 
    • LanguageCode = en : SimpleAnalyser
    • LanguageCode = fr : FrenchAnalyser
    • ...
The analyser works by analysing the fields value.  Now in this case, the field value can be in English, French, German etc, based on  a seperate field value (language code).

Regards,
Karl


Matt Johnson

unread,
Feb 6, 2013, 5:04:02 PM2/6/13
to rav...@googlegroups.com
The questions raised in this thread seem to be around choosing between one document like this:

products/1
{
  "Name" : {
    "en" : "ball",
    "es" : "bola",
    "fr" : "balle"
  }
}

Or multiple documents like these:

products/1/en
{
  "Name" : "ball"
}

products/1/es
{
  "Name" : "bola"
}

products/1/en
{
  "Name" : "balle"
}

Now I'm no expert in globalization techniques, but I can tell you that this the exact same problem I had when figuring out how to handle temporal data.

The first example of a single document is like the "Temporal Property" pattern, that I first tried.  It worked just fine as far as basic CRUD tasks, but failed miserably when it came to querying.

The second example with multiple documents is what I ended up with after creating the Temporal Versioning Bundle.  It moves the concern out of the domain, and into the infrastructure.  It was very difficult to get it right, but now that it's done - it is very easy and very powerful.

I can clearly see that content localization is very similar in concerns.  Especially when it comes to the side effects that the entire document is localized (or versisoned in my case).  Although it might be possible to work around this by storing the non-localized properties in a root document and only the localized ones in separate docs. A bundle could easily coordinate this.  In other words, the docs may look like:

products/1
{
  "Price" : 10.00
{

products/1/en
{
  "Name" : "ball"
}

products/1/es
{
  "Name" : "bola"
}

products/1/en
{
  "Name" : "balle"
}

When retrieved for a specific language, you would get back a merged document:

session.LocalizedFor("en").Load<Product>("products/1")

products/1
{
  "Price" : 10.00
  "Name" : "balle"
}

The "LocalizedFor()" would be implemented similarly to my temporal "Effective()" method.

The patterns are similar enough - I could probably create this bundle fairly easily.  Does this sound useful to you guys?  Or am I making a whole lot of nonsense? :)

Matt Johnson

unread,
Feb 6, 2013, 5:07:57 PM2/6/13
to rav...@googlegroups.com
Sorry, fudged the samples.  That third doc was supposed to be products/1/fr  (french)

Troy

unread,
Feb 6, 2013, 5:10:37 PM2/6/13
to rav...@googlegroups.com
Matt, this sounds tremendously useful. It totally makes sense how your work in TV Bundle would play very similar. I think due to the interest in this domain, it would be a great addition to the Contrib project. Easy for me to say since you offered up the bundle! ;-)

I have a future project that would use this exact implementation. It is super clear and very easy to use. I like it!

Thanks again!

Troy

unread,
Feb 6, 2013, 5:12:39 PM2/6/13
to rav...@googlegroups.com
If you work on it, one more thought... Could Products/1 contain all fields, as the default, if the language does not exist for that document? Or would there be like LocalizeFor("es","en-baseLanguageForFallback").Load<>


On Wednesday, February 6, 2013 5:04:02 PM UTC-5, Matt Johnson wrote:

Matt Johnson

unread,
Feb 6, 2013, 6:02:21 PM2/6/13
to rav...@googlegroups.com
I was thinking that the base document would have that.  I could do multiple levels like LocalizeFor("en-US") first looking at products/1/en/US then falling back to products/1/en and ultimately back to products/1.

But if you're saying that I call LocalizeFor("es-MX"), and I don't have ANY Spanish docs to go pull from an English one, then there would have to be a second parameter.  Maybe make it optional.  Perhaps a signature like:

  LocalizeFor(this IDocumentSession session, string locale, string fallbackLocale = null)

Just thinking aloud...

Troy

unread,
Feb 6, 2013, 6:34:28 PM2/6/13
to rav...@googlegroups.com
Optional parameter works for me!

Felipe Leusin

unread,
Feb 6, 2013, 6:58:01 PM2/6/13
to rav...@googlegroups.com
I actually liked the design where you have the document in the default language in /products/1 and keep differente documents for the properties that are localized. If you guys need help implementing this let me know.

Em quarta-feira, 6 de fevereiro de 2013 21h34min28s UTC-2, Troy escreveu:
Optional parameter works for me!

Troy

unread,
Feb 6, 2013, 7:06:19 PM2/6/13
to rav...@googlegroups.com
After thinking about. Matt, you first design might make more sense. Having the default properties in products/1 makes better sense than a fallback.

I like the products/1/en-US, then products/1/en, then products/1 is also nice.

Karl Cassar

unread,
Feb 6, 2013, 7:29:42 PM2/6/13
to rav...@googlegroups.com
This makes a lot of sense too in my opinion, and would love to see it as a globalization bundle to RavenDB :)

Regards,
Karl

Ryan Heath

unread,
Feb 7, 2013, 12:23:02 AM2/7/13
to rav...@googlegroups.com
Great idea, Matt!

Querying is now solved :)
But care to explain how indexing would work?

// Ryan
Reply all
Reply to author
Forward
0 new messages