How to perform a simple search ignoring characters with accents (á, é, í, ó, ú)

3,358 views
Skip to first unread message

Joni Bekenstein

unread,
Jun 1, 2012, 5:03:53 PM6/1/12
to django...@googlegroups.com
I need to do a simple search ignoring some characters with accents. The idea would be that "hola" matches the search term "holá".

What I'm currently doing is adding a CharField to my model to store the searchable term. For example:

class Book(models.Model):
  title = models.CharField(max_length=100)
  searchable_title = models.CharField(max_length=100)

When I save a Book I'll replace the accented characters in title with their non-accented counterparts and store the result in searchable_title. The, I'll also do the same thing with the search term before actually doing the query.

I don't like this approach because if I need to add more searchable fields things start getting noisy in my models.

Any ideas?

Kurtis Mullins

unread,
Jun 1, 2012, 5:41:44 PM6/1/12
to django...@googlegroups.com
Maybe you could just build a simple index? It'd basically be a set of keywords, each with a set of matching books.
So in your example, you'd have two keywords:

hola (with accent) -> book1, book2, etc..
hola (without accent) -> (same as previous)

And then just write some sort of functionality to run through your data and index it to a table. I can't vouch that this is the best approach - but I think it'd at least solve the problem.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/tdIUkptWpZgJ.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

djcoin [Simon Thépot]

unread,
Jun 4, 2012, 7:33:45 AM6/4/12
to Django users
Hi, if you are using postgreSQL, I released a week ago a library to do
just what you need: see: https://github.com/djcoin/django-unaccent/

Once you got the `unaccent` function set up in your postgreSQL db, you
can make unaccented search from the Django ORM as usual.
Eg: Given a book title of "The book of Café",
`Book.objects.filter(title__icontains_unaccent='Cafe')` will match

Cheers,
Simon

Peter of the Norse

unread,
Jun 4, 2012, 1:20:53 PM6/4/12
to django...@googlegroups.com
One possibility is to use MySQL. By default it indexes things so that a, á, and À are the same thing. There are some gotchas though: you have to make sure that it’s using an appropriate character set for the languages you’re using. (UTF-8 is a good choice.) There’s not a good similar solution for PostgreSQL. While it is possible to write a function, and create an index on that function, I haven’t found a way of searching on that index in Django. If anyone knows of a way to do it, I’d love to here it.
Peter of the Norse



bruno desthuilliers

unread,
Jun 5, 2012, 2:31:20 PM6/5/12
to Django users
Since no one seemed to mention it so far: what about using a real
fulltext search engine ?
(hint : django-haystack provides a django-ish, unified API over quite
a few known search engines).

akaariai

unread,
Jun 5, 2012, 8:28:12 PM6/5/12
to Django users
On Jun 4, 4:20 pm, Peter of the Norse <RahmC...@Radio1190.org> wrote:
> One possibility is to use MySQL. By default it indexes things so that a, á, and À are the same thing. There are some gotchas though: you have to make sure that it’s using an appropriate character set for the languages you’re using. (UTF-8 is a good choice.) There’s not a good similar solution for PostgreSQL. While it is possible to write a function, and create an index on that function, I haven’t found a way of searching on that index in Django. If anyone knows of a way to do it, I’d love to here it.

.extra() will likely allow what you are looking for. But
using .extra() tends to get ugly fast.

- Anssi

Joni Bekenstein

unread,
Jun 6, 2012, 1:37:35 PM6/6/12
to django...@googlegroups.com
django-haystack seemed overkill for me.

django-unaccent looks pretty good, since I'm using PostgreSQL. I'll take a look into that, thanks!
Reply all
Reply to author
Forward
0 new messages