Proposal: provide postgresql powered full-text search in djangoproject.com

416 views
Skip to first unread message

Paolo Melchiorre

unread,
May 6, 2017, 6:50:02 PM5/6/17
to Django developers (Contributions to Django itself)
Hello,

in the djangoproject.com the search is powered by elasticsearch.

Since the site uses postgresql as database backend I want propose to use the Full-Text Search function provided by django.contrib.postgres.search module.

I presented a talk "Full-Text Search in Django with PostgreSQL" at the last PyConIT 2017 Conference in Florence
https://twitter.com/pauloxnet/status/850766131338117120
and I proposed a similar talk for the next EuroPython 2017 
https://ep2017.europython.eu/conference/voting/#ord128

If you're interested in this proposal it will be nice to organize a related sprints at the next EuroPython 2017
https://ep2017.europython.eu/en/events/sprints

-- 
Paolo

Florian Apolloner

unread,
May 7, 2017, 3:16:48 AM5/7/17
to Django developers (Contributions to Django itself)
What would be the benefit of using django.contrib.postgresql aside from much work?

Paolo Melchiorre

unread,
May 7, 2017, 4:53:27 AM5/7/17
to django-d...@googlegroups.com
On Sun, May 7, 2017 at 9:16 AM, Florian Apolloner <f.apo...@gmail.com> wrote:
> On Sunday, May 7, 2017 at 12:50:02 AM UTC+2, Paolo Melchiorre wrote:
>>
>> Hello,
>>
>> in the djangoproject.com the search is powered by elasticsearch.
>>
>> Since the site uses postgresql as database backend I want propose to use
>> the Full-Text Search function provided by django.contrib.postgres.search
>> module.
>>
>> I presented a talk "Full-Text Search in Django with PostgreSQL" at the
>> last PyConIT 2017 Conference in Florence
>> https://twitter.com/pauloxnet/status/850766131338117120
>> and I proposed a similar talk for the next EuroPython 2017
>> https://ep2017.europython.eu/conference/voting/#ord128
>>
>> If you're interested in this proposal it will be nice to organize a
>> related sprints at the next EuroPython 2017
>> https://ep2017.europython.eu/en/events/sprints
>
> What would be the benefit of using django.contrib.postgresql aside from much
> work?

The benefit would be to have a djangoproject.com made with
technologies the site talks about and to demonstrate what you can
build using only officially documented Django module.

For the work to do this I can candidate myself to work on it!

--
Paolo

Adam Johnson

unread,
May 7, 2017, 6:45:27 AM5/7/17
to django-d...@googlegroups.com
I guess we'd also have the benefit of not having to keep elasticsearch running.

But I'm afraid I'm not familiar with Postgres. Is the FTS in Postgres mostly equivalent to ES, or will some kinds of search queries be affected?


--
Paolo

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAKFO%2Bx59T7ykkk9WT%3DoAyvG9xX-vxp4gSN0gLRAmSmz%2Byo1bqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Florian Apolloner

unread,
May 7, 2017, 8:22:49 AM5/7/17
to Django developers (Contributions to Django itself)
On Sunday, May 7, 2017 at 12:45:27 PM UTC+2, Adam Johnson wrote:
I guess we'd also have the benefit of not having to keep elasticsearch running.

On the contrary, putting it into postgres means we have to care about it. Putting it into Elasticsearch means we can let our hoster take care about that.
  
But I'm afraid I'm not familiar with Postgres. Is the FTS in Postgres mostly equivalent to ES, or will some kinds of search queries be affected?

For the queries we use I think we can get pretty much equivalent results.

Marc Tamlyn

unread,
May 8, 2017, 9:00:13 AM5/8/17
to django-d...@googlegroups.com
I'm not sure I see the benefit here. The strength and purpose of postgres FTS is that you can combine some FTS behaviour with some relational queries easily at the same time. I'm pretty sure our search requirements on dp.com need that, so using a dedicated search provider is a better option.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Adam Johnson

unread,
May 8, 2017, 9:40:48 AM5/8/17
to django-d...@googlegroups.com
 I'm pretty sure our search requirements on dp.com need that,

s/need/don't need/ ? 😉 


For more options, visit https://groups.google.com/d/optout.



--
Adam

Marc Tamlyn

unread,
May 8, 2017, 10:07:58 AM5/8/17
to django-d...@googlegroups.com

Tobias McNulty

unread,
May 8, 2017, 11:14:58 AM5/8/17
to django-developers
I'm no FTS expert, but based just on the facts raised in this thread, if using Postgres FTS
  1. would not break existing nor potential search needs (in fact it might expand the functionality available) and
  2. would allow eliminating an entire service from the infrastructure
that seems like a net win to me and as such at least worth exploring further. That is not to say I think we should commit to switching, but if we have volunteers who are excited to flesh out this proposal with some code and understand there's no guarantee it will actually get merged, I don't (yet) see a reason to say no.

Tobias

Tobias McNulty
Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com


Tim Graham

unread,
May 8, 2017, 12:28:06 PM5/8/17
to Django developers (Contributions to Django itself)
I agree that eliminating elasticsearch could be a simplification win from a maintenance perspective. For example, I spent some hours a few months ago debugging a problem with a new version of elasticsearch that caused our cluster to run out of memory and lock up every ~24 hours. Also, not having to set up elasticsearch to contribute to the docs.djangoproject.com search is nice. On the other hand, I wonder how moving the search load to PostgreSQL will effect server load, disk usage, etc. 
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Curtis Maloney

unread,
May 12, 2017, 4:13:11 AM5/12/17
to django-d...@googlegroups.com, Paolo Melchiorre
Dogfooding is a fairly strong argument, IMHO.

Especially when there's a volunteer to do the work.

--
C
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Aymeric Augustin

unread,
May 12, 2017, 5:42:51 AM5/12/17
to django-d...@googlegroups.com, Paolo Melchiorre
On 7 May 2017, at 11:32, Curtis Maloney <cur...@tinbrain.net> wrote:

Dogfooding is a fairly strong argument, IMHO.

Especially when there's a volunteer to do the work.

--
C

I agree.

I was mildly concerned about the effect on relevance, but the current search isn't all that good.

A quick test shows that it doesn't handle normalization:

or stemming:

or stopwords:

These features are desirable and easy to configure with ES, but that wasn't done, perhaps for lack of familiarity with ES.

Apparently they're also doable with Postgres FTS. If we can remove ES from the stack and improve relavance, that would be great.

-- 
Aymeric.


Paolo Melchiorre

unread,
Jul 8, 2017, 2:10:26 PM7/8/17
to django-d...@googlegroups.com
Hi all,

I'm going to start a personal branch with a PostgreSQL full-text search functionality for the djangoproject.com website.

I would to sprint on it during the next EuroPython 2017 in Rimini and I've added the Sprint proposal in the wiki:
https://wiki.python.org/moin/EuroPython2017/Sprints

I'll be there on Saturday 15/07/0217 and if someone would to join me I would be happy.

-- 
Paolo


On Mon, May 8, 2017 at 6:28 PM, Tim Graham <timog...@gmail.com> wrote:
I agree that eliminating elasticsearch could be a simplification win from a maintenance perspective. For example, I spent some hours a few months ago debugging a problem with a new version of elasticsearch that caused our cluster to run out of memory and lock up every ~24 hours. Also, not having to set up elasticsearch to contribute to the docs.djangoproject.com search is nice. On the other hand, I wonder how moving the search load to PostgreSQL will effect server load, disk usage, etc. 

On Monday, May 8, 2017 at 11:14:58 AM UTC-4, Tobias McNulty wrote:
I'm no FTS expert, but based just on the facts raised in this thread, if using Postgres FTS
  1. would not break existing nor potential search needs (in fact it might expand the functionality available) and
  2. would allow eliminating an entire service from the infrastructure
that seems like a net win to me and as such at least worth exploring further. That is not to say I think we should commit to switching, but if we have volunteers who are excited to flesh out this proposal with some code and understand there's no guarantee it will actually get merged, I don't (yet) see a reason to say no.

Jannis Leidel

unread,
Jul 15, 2017, 12:37:50 PM7/15/17
to Django developers, Paolo Melchiorre

> On 12. May 2017, at 11:42, Aymeric Augustin <aymeric....@polytechnique.org> wrote:
>
>> On 7 May 2017, at 11:32, Curtis Maloney <cur...@tinbrain.net> wrote:
>>
>> Dogfooding is a fairly strong argument, IMHO.
>>
>> Especially when there's a volunteer to do the work.
>>
>> --
>> C
>
> I agree.
>
> I was mildly concerned about the effect on relevance, but the current search isn't all that good.
>
> A quick test shows that it doesn't handle normalization:
> https://docs.djangoproject.com/fr/1.11/search/?q=mod%C3%A8le
> https://docs.djangoproject.com/fr/1.11/search/?q=modele
>
> or stemming:
> https://docs.djangoproject.com/fr/1.11/search/?q=documentation
> https://docs.djangoproject.com/fr/1.11/search/?q=documentations
>
> or stopwords:
> https://docs.djangoproject.com/fr/1.11/search/?q=de
>
> These features are desirable and easy to configure with ES, but that wasn't done, perhaps for lack of familiarity with ES.

It was not out of lack of familiarity with ES but lack of time when we worked on the redesign and first fundraiser round, two years ago. Gratis. In our spare time (https://github.com/django/djangoproject.com/pull/303 et al).

You of all people should know that comments like that are hurtful as they imply we just didn’t care enough to learn ES. But in fact both Tim and Honza (an elastic employee) reviewed the code and made lots of suggestions.

Since Paolo volunteered to work on an improved version, what does that say about the expectations you’ve set for him now?

Jannis

Aymeric Augustin

unread,
Jul 24, 2017, 2:33:47 AM7/24/17
to django-d...@googlegroups.com, Paolo Melchiorre
Yes, I should know better. Please accept my apologies.

In my own experience with ES, the learning curve is rough. It can take hour or days to figure out how to optimize a single line of configuration. That's what I had in mind when I said "familiarity". To be honest, the bigger problem in that sentence is "easy to configure"... Anyway: I'm sorry.

Best regards,

-- 
Aymeric.

Paolo Melchiorre

unread,
Nov 12, 2017, 11:49:26 AM11/12/17
to django-d...@googlegroups.com
Hi all.

Thanks for your feedbacks about my propose to use PostgrSQL FTS in djangoproject.com

After my talk about "Full-Text Search with PostgreSQL in Django" in the EuroPython 2017 I organized a sprint to work on a new branch of djangoproject.com as I proposed.
https://twitter.com/i/moments/886314684588208128

In the last months I've been a little busy but in the last days I found the time to complete the work started and I create a PR with last version of djangoproject.com and FTS working wiht PostgreSQL, my PR replace all the feature of the Elasticsearch  FTS and you can test locally.
https://github.com/django/djangoproject.com/pull/797

Can you send me some suggestions on it thanks ?

Regards,
Paolo

P.S. I've develop the multilingual search too, but it need Django 2.0 version because of this commit
https://github.com/django/django/commit/19b2dfd1bfe7fd716dd3d8bfa5f972070d83b42f

On Sat, Jul 8, 2017 at 8:10 PM, Paolo Melchiorre <pa...@melchiorre.org> wrote:
I'm going to start a personal branch with a PostgreSQL full-text search functionality for the djangoproject.com website.

I would to sprint on it during the next EuroPython 2017 in Rimini and I've added the Sprint proposal in the wiki:
https://wiki.python.org/moin/EuroPython2017/Sprints

Reply all
Reply to author
Forward
0 new messages