data in multiple languages

264 views
Skip to first unread message

Christian Schilling

unread,
May 15, 2009, 8:47:47 AM5/15/09
to django-haystack
hi,

is there a preferred way to handle multilingual data in haystack?
i can't find any information on this in the docs or on this list.

there are a few database-content translation solutions out there (i am
using transdb)
and when using something like this modell fields can have different
values based on the current locale.
if setup my index like sown in the docs, haystack will only index one
version of the data (in the default language).

so my ideas would be:
1) add translated objects to the same SearchIndex multiple times, once
for each language and later filter the
results based on a "language" field.
as far as i can tell this is currently not possible in haystack.

2) create a IndexSite for each language
should be doable but may be problematic if someone supports many
languages.

any thoughts on this?

Daniel Lindsley

unread,
May 15, 2009, 9:53:46 AM5/15/09
to django-...@googlegroups.com
Christian,


Ooo, now that a difficult one. I'm guilty of not having thought
about storing multiple languages at once. I guess I'd imagine that,
for a given model instance that has translated content, you'd make a
SearchIndex that includes fields for each translation. So something
like:

class EntrySearchIndex(indexes.SearchIndex):
title = indexes.CharField(model_attr='title')
author = indexes.CharField(model_attr='author')
# Combine all the languages' content into a single template for
best searching maybe?
content = indexes.CharField(document=True, use_template=True)
content_en = indexes.CharField(use_template=True)
content_es = indexes.CharField(use_template=True)
content_de = indexes.CharField(use_template=True)
content_fr = indexes.CharField(use_template=True)
...

This is (maybe?) painful for many languages or many fields to
translate. You'd also likely need a custom form that searches the
right fields based on that user's language preference. Not sure what
to say beyond this and I'm definitely open to suggestions.


Daniel

Patryk Zawadzki

unread,
May 15, 2009, 10:13:02 AM5/15/09
to django-...@googlegroups.com
On Fri, May 15, 2009 at 3:53 PM, Daniel Lindsley <pola...@gmail.com> wrote:
> Christian,
>
>
>   Ooo, now that a difficult one. I'm guilty of not having thought
> about storing multiple languages at once. I guess I'd imagine that,
> for a given model instance that has translated content, you'd make a
> SearchIndex that includes fields for each translation. So something
> like:
>
> class EntrySearchIndex(indexes.SearchIndex):
>    title = indexes.CharField(model_attr='title')
>    author = indexes.CharField(model_attr='author')
>    # Combine all the languages' content into a single template for
> best searching maybe?
>    content = indexes.CharField(document=True, use_template=True)
>    content_en = indexes.CharField(use_template=True)
>    content_es = indexes.CharField(use_template=True)
>    content_de = indexes.CharField(use_template=True)
>    content_fr = indexes.CharField(use_template=True)
>    ...

Why not create a indexes.TranslatableCharField class that does this
internally? Take the list of languages from settings and upon indexing
loop over the langs, (1) set locale to current iterator value, (2)
render the template, (3) store result as a subfield.

Subfields could be done using a separate SearchIndex where each
language is one field or by dynamically creating fields in the current
SearchIndex.

Then when searching default to using active locale and allow
overriding it with .languages() similarly to how .models() works.

--
Patryk Zawadzki

christian schilling

unread,
May 16, 2009, 5:48:39 AM5/16/09
to django-...@googlegroups.com


2009/5/15 Patryk Zawadzki <pat...@pld-linux.org>


Why not create a indexes.TranslatableCharField class that does this
internally? Take the list of languages from settings and upon indexing
loop over the langs, (1) set locale to current iterator value, (2)
render the template, (3) store result as a subfield.

i'm now working on something like this, but instead of a new field type introduce a "class Meta" to
SearchIndex to tell what fields need translation.
howerer i am currently unable to get the testsuite to work :-(
i get lots of:


 .Problem installing fixture '/home/christian/code/etrub/packages/repos/haystack/tests/core/fixtures/initial_data.json': Traceback (most recent call last):
  File "/var/lib/python-support/python2.6/django/core/management/commands/loaddata.py", line 119, in handle
    obj.save()
  File "/var/lib/python-support/python2.6/django/core/serializers/base.py", line 163, in save
    models.Model.save_base(self.object, raw=True)
  File "/var/lib/python-support/python2.6/django/db/models/base.py", line 394, in save_base
    created=(not record_exists), raw=raw)
  File "/var/lib/python-support/python2.6/django/dispatch/dispatcher.py", line 148, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/christian/code/etrub/packages/haystack/indexes.py", line 105, in update_object
    self.backend.update(self, [instance])
  File "/home/christian/code/etrub/packages/haystack/backends/whoosh_backend.py", line 104, in update
    self.setup()
  File "/home/christian/code/etrub/packages/haystack/backends/whoosh_backend.py", line 59, in setup
    self.schema = self.build_schema(fields)
  File "/home/christian/code/etrub/packages/haystack/backends/whoosh_backend.py", line 98, in build_schema
    raise SearchBackendError("No fields were found in any search_indexes. Please correct this before attempting to search.")
SearchBackendError: No fields were found in any search_indexes. Please correct this before attempting to search.


anyone seen this error before?

Patryk Zawadzki

unread,
May 16, 2009, 6:57:16 AM5/16/09
to django-...@googlegroups.com
On Sat, May 16, 2009 at 11:48 AM, christian schilling
<init...@googlemail.com> wrote:
> i'm now working on something like this, but instead of a new field type
> introduce a "class Meta" to
> SearchIndex to tell what fields need translation.

I think a new filed type is a perfect solution as only the character
fields are translatable per definition so by introducing a new field
type you don't have to deal with people marking ints and pickles as
translatable.

--
Patryk Zawadzki

Christian Schilling

unread,
May 16, 2009, 12:26:06 PM5/16/09
to django-haystack


On May 16, 12:57 pm, Patryk Zawadzki <pat...@pld-linux.org> wrote:
>
> I think a new filed type is a perfect solution as only the character
> fields are translatable per definition so by introducing a new field
> type you don't have to deal with people marking ints and pickles as
> translatable.
>

i think we can never know what kind of fields people might want to
translate,
the "class Meta" aproach will not fail when someone marks pointless
translations,
it will just waste space.

also, i don't see how to cleanly make a field class add other fields
to the SearchIndex.
the "class Meta" thing on the other hand is quite easy to implement:
http://github.com/initcrash/django-haystack/tree/master

Daniel Lindsley

unread,
May 17, 2009, 11:15:51 PM5/17/09
to django-...@googlegroups.com
Christian,


That's my fault. I had a stale .pyc hanging around that showed
Whoosh's tests as passing. Will fix and commit.


Daniel

Andréas Kündig

unread,
Jun 18, 2009, 5:52:02 AM6/18/09
to django-haystack
One thing to consider is that fields might need to be indexed
differently according to different languages.

I don't know much about haystack, for now I am just looking for a way
to add search to a multilingual site. I noticed the template solr.xml
which is presumably used to generate configurations for solr. The
analyzer defined for text uses the EnglishPorterFilterFactory. I
guess for another language you'd have to create a new type of field by
hand, and then find a way to bring haystack to use this field.

notanumber

unread,
Jun 18, 2009, 1:18:41 PM6/18/09
to django-haystack
Maybe this doesn't apply, but when we had to index translatable
content I simply added a field with the language code to the index and
overrode the default search form with my own that filters search
results based on the current language.

In my index:

language = indexes.CharField(model_attr='language')


In my forms.py:

class SearchLangForm(SearchForm):
def __init__(self, language, *args, **kwargs):
super(SearchFilterForm, self).__init__(*args, **kwargs)
self.language = language

def search(self):
results = self.searchqueryset.auto_query(
self.cleaned_data['q']
).filter(
language=self.language
)

Greg Brown

unread,
Aug 17, 2009, 7:08:30 PM8/17/09
to django-...@googlegroups.com
Hi everyone,

I'm working on a multi-language index with Simplified Chinese and
English content, and my idea was to just index everything (both
languages) in the one index - so I have a template that looks like

{% for localised_info in object.localised_content_set.all %}
{{ localised_info.title }}
{{ localised_info.blurb }}
{{ localised_info.content }}
{% endfor %}

(localised_content_set is the related_name for the localised content
model's foreignkey to the object)

I'd have thought this would just work, albeit in not the most
sophisticated fashion, but it doesn't seem to like the Chinese
characters. When I put English words in the chinese content, it finds
them with no problems, but searching for chinese characters doesn't
work.

Has anyone experienced this, and can anyone point me in the right direction?

Cheers,
Greg





2009/6/19 notanumber <dns...@gmail.com>:
--
http://gregbrown.co.nz/

Daniel Lindsley

unread,
Aug 17, 2009, 10:32:18 PM8/17/09
to django-...@googlegroups.com
Greg,


So the problem is likely that the search engine you're using uses
an English stemmer. Because of that, it doesn't recognize the Chinese
words and can't effectively search on them. Allowing the user to
dynamically swap out a stemmer is going to be very difficult, which is
why it is not in Haystack yet. For now, unless you need the data
together in the same index, it may be best to use two separate indexes
with proper stemmers setup for each.

There is a ticket out there for this (#73) but I think it will
likely be a Haystack 1.1 feature. I don't see solving this problem
well within the next week or so, so I think it will have to get pushed
off.


Daniel
Reply all
Reply to author
Forward
0 new messages