[rails-i18n] Database Backends

dylanz

unread,

Aug 26, 2009, 12:33:18 PM8/26/09

to rails-i18n

Hello Sven!

First off, thank you much for the great work on the I18n front!

A colleague of mine and I wrote a Database Backend for the I18n
implementation early this year (http://github.com/dylanz/
i18n_backend_database/). It's currently in production, under heavy
use, and is working great (for some things!). The reason we wrote a
Database Backend is due to the following use case, which I think would
be pretty common:

"The site is live, translators are hired, and pointed to /admin/
translations, where they can start going down the list of untranslated
items, and translating them. Their updates are real-time, and
immediately propagated to the array of machines."

Our backend is pretty straight-forward as well. On any given lookup,
this is how it goes:
Key -> Cache -> DB -> Cache -> Value

We're using a distributed cache store like Memcached to front the
lookups, so we don't have to hit the database on every request, and to
keep the result set consistent across an array of machines. This
results in very fast lookup times, and is consistent... which is just
what the doctor ordered :)

Now, we started implementing it on another project, and are using some
things we didn't use on the first project... like the number and
currency helpers. Needless to say, things broke. We hadn't come
across the use case of lookups where the result is a Hash
(like :"number.format") in any of the other operations, so this was
new to us. Implementing this isn't going to be that straight-forward,
so we're re-thinking our approach.

A few questions for you!

1. If I had to lookup :"number.format", I wouldn't necessarily know
that it requires a Hash as a return value, unless I had some sort of
tree in the database, where all the children of "number.format" had
entries that "number.format" was their parent. So... would backends
need to adhere to a fall-through of supporting YAML/File if their
backend didn't return the correct value... or, (I'm assuming) would
backends need to support tree/hierarchy lookups like this?

2. I noticed the "active_record" and "active_record2" branches.
-> What are the differences between the two?
-> Are either in use in any production environments?
-> How are they going to limit the database lookups on heavily
translated pages?

Thanks Sven!
==
Dylan

Sven Fuchs

unread,

Aug 26, 2009, 1:09:28 PM8/26/09

to rails...@googlegroups.com

Hi Dylan,

welcome to the list :)

On 26.08.2009, at 18:33, dylanz wrote:
> 1. If I had to lookup :"number.format", I wouldn't necessarily know
> that it requires a Hash as a return value, unless I had some sort of
> tree in the database, where all the children of "number.format" had
> entries that "number.format" was their parent.

IIRC the shipped AR backend does that without an explicit tree/
hierarchie db model:

http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/test/backend/active_record/active_record_test.rb#L17

It basically joins the key to "foo.bar.baz" and then looks for records
that either have exactly that key or start with that key with the
separator following.

http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/lib/i18n/backend/active_record/translation.rb#L18

Then you can look at the result set and either returns a single value
or Hash from it.

http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/lib/i18n/backend/active_record.rb#L35

> So... would backends
> need to adhere to a fall-through of supporting YAML/File if their
> backend didn't return the correct value...

That's an option. You could use the Chain backend to put an AR backend
in front of the default Simple backend. The Simple backend could
provide all the default Rails translations, while your translators
could work on the app translations using the AR backend.

> or, (I'm assuming) would
> backends need to support tree/hierarchy lookups like this?

I'm not sure you need that. Maybe we're still missing something with
the shipped implementation (there hasn't been much feedback). But in
theory it should do what you need.

> 2. I noticed the "active_record" and "active_record2" branches.
> -> What are the differences between the two?

Interesting question :) Looking at the commit history there might not
be any difference. Maybe I just forgot to delete active_record2

> -> Are either in use in any production environments?

I don't know of any.

> -> How are they going to limit the database lookups on heavily
> translated pages?

You can put the Cache module in front of it. It uses
ActiveSupport::Cache, so it gives you a bunch of options.

Beware that the Cache module does not (yet) take care of any cache
expiration that probably will be necessary for your use case. Please
let me know if there are any issues with this.

Thanks :)

Sven

Iain Hecker

unread,

Aug 26, 2009, 1:16:44 PM8/26/09

to rails...@googlegroups.com

Hi Dylan,

I don't know about the alternative i18n branches, but about that other point. Rails depends on other datatypes than strings for certain functions. You'd probably should serialize/deserialize the values and display them in some other way or make some sort of fallback to yaml for these things and exclude them in your admin view. To be honest, I don't know how to represent a nested hash in HTML.

Iain

Dan Coutu

unread,

Aug 26, 2009, 3:00:37 PM8/26/09

to rails...@googlegroups.com

Dylan, I would suggest that you investigate the code within the
Globalize2 plugin. It uses a database backend for holding translations.

They may have already figured out a solution for you.

Dan

dylanz

unread,

Aug 26, 2009, 3:35:06 PM8/26/09

to rails-i18n

Awesome, thanks for the thorough answers!

> It basically joins the key to "foo.bar.baz" and then looks for records
> that either have exactly that key or start with that key with the
> separator following.
>http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/lib/i18n/backend/active_record/translation.rb#L18

In regard to that Hash/Key lookup, that's a great idea, but I don't
think it would work very well if you wanted to put a Cache store in
front... as you would be doing wildcard lookups. For what it's worth,
those are extremely slow in Memcached, and Memcached doesn't support
delete's via wildcard, so that operation would be much, much slower.

> Beware that the Cache module does not (yet) take care of any cache
> expiration that probably will be necessary for your use case. Please
> let me know if there are any issues with this.

Cool, and no problem! Since they are simple translations, we don't
really want to expire them. We can just remove the I18n.t() tag from
the text if we don't need the text translated, delete the key, or,
just update the value if the translation value needs to change. This
way, we don't need to worry about cache expiration at all.

In our implementation, I did use the ActiveSupport::Cache, so you can
declare something like this (I18n.backend.cache_store
= :memory_store), and it will use whatever you provide. Of course, it
would be wise to use something distributed like Memcached or Ehcache
(plug: http://github.com/dylanz/ehcache/), so you don't have
inconsistent translations in local memory across your array.

So without the Cache in front, I think the ActiveRecord Backend would
be pretty expensive, in terms of IO/CPU. For example, on an initial
request to our application (without a warm cache store), it fires off
100's of database requests to get the translation records, and set
them in the cache. Granted they are all pretty quick requests, it's
still a pretty hefty hit to the database, and I don't think anything
you'd want to do on a heavily trafficked site, on every request. (I
think that Globalize2 uses an in-memory Cache as a intermediate layer,
but I think you'd still run into consistency issues if you were using
an array of machines.) Throwing a distributed Cache layer like
Memcached in between is great, as even thought you're still doing
100's of cache requests per request, they are much, much cheaper.

What do you think? I'm all ears, as I'd love to help get a scalable
backend in place :)
Thanks!
==
Dylan

Lawrence Pit

unread,

Aug 26, 2009, 11:13:00 PM8/26/09

to rails...@googlegroups.com

http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/lib/i18n/backend/active_record/translation.rb#L18

In regard to that Hash/Key lookup, that's a great idea, but I don't
think it would work very well if you wanted to put a Cache store in
front... as you would be doing wildcard lookups.

First you get a cache miss, you use that db lookup method, you get a hash in return, store the hash in memcache.

So without the Cache in front, I think the ActiveRecord Backend would
be pretty expensive, in terms of IO/CPU.  For example, on an initial
request to our application (without a warm cache store), it fires off
100's of database requests to get the translation records, and set
them in the cache.

We fire one query per locale, loading all translations in one go.

  Granted they are all pretty quick requests, it's
still a pretty hefty hit to the database, and I don't think anything
you'd want to do on a heavily trafficked site, on every request.  (I
think that Globalize2 uses an in-memory Cache as a intermediate layer,
but I think you'd still run into consistency issues if you were using
an array of machines.)  Throwing a distributed Cache layer like
Memcached in between is great, as even thought you're still doing
100's of cache requests per request, they are much, much cheaper.

And much, much more expensive that in-memory lookups.

Have a look at fast_gettext for an indication of how fast things can be.

With regards to syncing an array of machines: if you use a db / in-memory combo, then you need some kind of release mechanism for your translations instead of immediately propagating a single translation string when it's modified. Usually this release would coincide with a new code release, so you simply piggyback on what you already have wrt deployments.

Lawrence

Sven Fuchs

unread,

Aug 27, 2009, 3:18:29 AM8/27/09

to rails...@googlegroups.com

Hi Dylan,

On 26.08.2009, at 21:35, dylanz wrote:
>> It basically joins the key to "foo.bar.baz" and then looks for
>> records
>> that either have exactly that key or start with that key with the
>> separator following.
>> http://github.com/svenfuchs/i18n/blob/df498763cd1968c58900d66a322325d9db8b0d06/lib/i18n/backend/active_record/translation.rb#L18
>
> In regard to that Hash/Key lookup, that's a great idea, but I don't
> think it would work very well if you wanted to put a Cache store in
> front... as you would be doing wildcard lookups. For what it's worth,
> those are extremely slow in Memcached, and Memcached doesn't support
> delete's via wildcard, so that operation would be much, much slower.

Maybe I'm missing something but I don't see a problem here. The Cache
just assumes that the backend is idempotent in that sense that for a
given set of arguments it always returns the same value. The arguments
can then be used as a cache key. When the backend returns a Hash of
translations for a given key then the Cache caches that. Same when the
backend just returns a single translation. Also, the Cache is done so
that when the backend raises an (e.g. translation missing) exception
for a given key, then the Cache will do the same.

> Cool, and no problem! Since they are simple translations, we don't
> really want to expire them. We can just remove the I18n.t() tag from
> the text if we don't need the text translated, delete the key, or,
> just update the value if the translation value needs to change. This
> way, we don't need to worry about cache expiration at all.

Not sure I follow. But it sounds great. :)

> In our implementation, I did use the ActiveSupport::Cache, so you can
> declare something like this (I18n.backend.cache_store
> = :memory_store), and it will use whatever you provide. Of course, it
> would be wise to use something distributed like Memcached or Ehcache
> (plug: http://github.com/dylanz/ehcache/), so you don't have
> inconsistent translations in local memory across your array.

I'll have to have a look at ehcache. Yeah, the Cache module uses
ActiveSupport::Cache, too.

> So without the Cache in front, I think the ActiveRecord Backend would
> be pretty expensive, in terms of IO/CPU.

I guess no matter how an AR backend is implemented, it always should
have some kind of cache in front of it.

> For example, on an initial
> request to our application (without a warm cache store), it fires off
> 100's of database requests to get the translation records, and set
> them in the cache. Granted they are all pretty quick requests, it's
> still a pretty hefty hit to the database, and I don't think anything
> you'd want to do on a heavily trafficked site, on every request.

On every request? Certainly not.

> Throwing a distributed Cache layer like
> Memcached in between is great, as even thought you're still doing
> 100's of cache requests per request, they are much, much cheaper.

Also, a cache warmup task could be an interesting idea.

> What do you think? I'm all ears, as I'd love to help get a scalable
> backend in place :)

Great! Any help is highly appreciated. Could you give the backend in
i18n/active_record a try in any of your apps?

Thanks

Sven

dylanz

unread,

Aug 27, 2009, 3:49:34 PM8/27/09

to rails-i18n

Ok, it all makes a lot more sense now...

> First you get a cache miss, you use that db lookup method, you get a
> hash in return, store the hash in memcache.

Heh... yeah, good point :)

> We fire one query per locale, loading all translations in one go.

Great idea, and when using a local memory store, that makes perfect
sense. Everything you both described makes perfect sense if you're
going to make your translations changes coincide with your
deployments... which, isn't necessarily a bad idea at all. In larger
organizations, deployments sometimes don't happen as frequently as one
would like, so we try to make everything independent of the deployment
process. Of course, there isn't any reason why a suite of independent
translation tasks couldn't be whipped up for the case when you want to
push translation changes across your array frequently.

Using a cache store like Ehcache would be interesting, as it could
handle the cache locally, as well as replicating real time changes
across to the other nodes. That way you'd get the dynamic features of
a external cache instance like Memcached, and the benefit of loading
all the translations up at once as you would a local memory store.

I'm definitely going to give the i18n/active_record branch a shot on a
project. I'll try and whip up some documentation in regard to the
architecture on the database/cache approach, just to make it clear to
new-comers of this branch (and the database approach in general).

BTW... fast_gettext looks fantastic. I didn't know it existed.
Excellent slides that Grosser published here: http://www.slideshare.net/grosser/fast-gettext

Thanks again guys!
==
Dylan

Reply all

Reply to author

Forward