ElasticSearch and non-wagtail content

1,049 views
Skip to first unread message

Scot Hacker

unread,
Mar 3, 2015, 1:29:41 PM3/3/15
to wag...@googlegroups.com
My project involves a mixture of wagtail and non-wagtail content -  most of our content is being pulled in from various external campus databases and systems and is not human-editable.

I was able to get ElasticSearch working with my wagtail models  easily. But users of course expect that search field to search all content on the site. So the question now is whether I can somehow configure my non-wagtail models to populate the elastic index, and to be returned in search results. 

Any best practices or suggestions on this?

Thanks.

./s

Karl Hobley

unread,
Mar 4, 2015, 6:08:49 AM3/4/15
to wag...@googlegroups.com
Hi Scot,

Getting your custom models into the Elasticsearch index is documented here: http://docs.wagtail.io/en/stable/core_components/search/indexing.html#indexing-non-page-models


Wagtails searching mechanism is built around Django querysets which unfortunately, can only be used on one model at a time.


On a recent project, we were able to search across multiple models with a single query by modifying Wagtails ElasticSearchQuery class. This was a bit fiddly though, but it seems to work well. (code: https://gist.github.com/kaedroho/3f7d21c4495b8ef1010a)

This was made a bit easier for us as all the things we were searching on were pages (and we only needed to filter on fields in the specific page models). In order to mix pages with non-pages, it may require some tweaking of this bit of code (https://github.com/torchbox/wagtail/blob/master/wagtail/wagtailsearch/backends/elasticsearch.py#L287) to make it return each item in its correct model.

This is something that we could create some helpers within Wagtail for. I don't think this is a very uncommon thing to want to do with search.


Another possible solution may be to perform a query for each model and combine them in the app. But I think the above would run much faster, especially if you have many models to search.

Hope this helps,

Karl

Scot Hacker

unread,
Mar 5, 2015, 2:29:10 PM3/5/15
to wag...@googlegroups.com


On Wednesday, March 4, 2015 at 3:08:49 AM UTC-8, Karl Hobley wrote:
Hi Scot,

Getting your custom models into the Elasticsearch index is documented here: http://docs.wagtail.io/en/stable/core_components/search/indexing.html#indexing-non-page-models

Ah, thanks - I missed that somehow.
 

Wagtails searching mechanism is built around Django querysets which unfortunately, can only be used on one model at a time.

Hrrm, true. 
 


On a recent project, we were able to search across multiple models with a single query by modifying Wagtails ElasticSearchQuery class. This was a bit fiddly though, but it seems to work well. (code: https://gist.github.com/kaedroho/3f7d21c4495b8ef1010a)

This was made a bit easier for us as all the things we were searching on were pages (and we only needed to filter on fields in the specific page models). In order to mix pages with non-pages, it may require some tweaking of this bit of code (https://github.com/torchbox/wagtail/blob/master/wagtail/wagtailsearch/backends/elasticsearch.py#L287) to make it return each item in its correct model.

I hate to modify core components and  wonder if it might be more straightforward to Django Haystack instead, and configure it to index both wagtail and non-wagtail models? Anyone have experience with that approach?

 

This is something that we could create some helpers within Wagtail for. I don't think this is a very uncommon thing to want to do with search.

Agreed - it would be a really nice addition. 
 


Another possible solution may be to perform a query for each model and combine them in the app. But I think the above would run much faster, especially if you have many models to search.

Thanks for the feedback!

Scot Hacker

unread,
Aug 28, 2015, 8:24:23 PM8/28/15
to Wagtail support
Coming back around on this now, and needing to get a solution in place fairly quickly, just wanted to clarify some thoughts on this.

Problem: Need to provide universal search for a site that mixes wagtail and non-wagtail content. Wagtail makes it easy to get data from non-wagtail models into ElasticSearch, and it makes it easy to get data out one model at a time. But for truly integrated search across models, it seems like the only current options are:

1) Convert all of your non-wagtail models so they inherit from Page.
Cons: Doesn't feel right to modify internal data structures for this - lots of our data is definitely not page-oriented. And some of our data will come from read-only external data sources (reflected internally as Django models, but not modifiable). Not to mention that native wagtail content that inherits from Image or Document should be unified too.

2) Fork the wagtailsearch app and modify so that it queries multiple models in turn and aggregates results.
Cons: Hate to fork :)

Cons: Tricky to implement and get right

4) Use Django Haystack instead
Pro: Works with unlimited model types out of the box
Con: Giving up the sweet ElasticSearch hooks already provided by Wagtail. Additional complexity in our project.

I'm going to go with #2 temporarily, but since Karl mentioned:
"This is something that we could create some helpers within Wagtail for. I don't think this is a very uncommon thing to want to do with search."

Yes please! Is this on the current roadmap by chance?

Matthew Westcott

unread,
Aug 29, 2015, 3:11:36 AM8/29/15
to wag...@googlegroups.com
Just a quick thought - if Wagtail's queryset-oriented helpers don't fit your use case, then there ought to be nothing stopping you from running queries directly on the ElasticSearch index, using the elasticsearch python library that Wagtail uses behind the scenes. That way, there's no need to monkeypatch anything, and no need to reimplement the indexing side of things in Haystack.

Cheers,
- Matt
> --
> You received this message because you are subscribed to the Google Groups "Wagtail support" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to wagtail+u...@googlegroups.com.
> To post to this group, send email to wag...@googlegroups.com.
> Visit this group at http://groups.google.com/group/wagtail.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/wagtail/d20935ce-2154-4682-b799-f50d134f6d49%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Brett Grace

unread,
Aug 31, 2015, 10:13:14 PM8/31/15
to Wagtail support
I've done what Matt suggests—I wanted to return highlighted search results, so I just created my own copy of the ElasticSearch backend and made whatever changes I needed.  No need to fork—you can configure multiple backends so you can keep the default one if you like. Yes, I stole liberally from the default backend—however, the default backend is still in place, so everything that depends on it still works (such as admin area search).

Assuming you have a way to get this data into ElasticSearch (... and a way to associate each possible search result with a page result), and you know how to write ElasticSearch queries, then the problem is no more difficult than it would be without Wagtail.

WAGTAILSEARCH_BACKENDS = {
   
'default': {
       
'BACKEND': 'wagtail.wagtailsearch.backends.elasticsearch.ElasticSearch',
        'URLS': ['http://localhost:9200'],
        'INDEX': 'wagtail',
        'TIMEOUT': 5,
        },
    'highlighting': {
       
'BACKEND': 'core.core_elasticsearch.HighlightingSearch',
        'URLS': ['http://localhost:9200'],
        'INDEX': 'wagtail',
        'TIMEOUT': 5,
        }
}

Scot Hacker

unread,
Sep 1, 2015, 6:28:40 PM9/1/15
to Wagtail support
Thanks everyone. Querying ElasticSearch directly and/or writing a custom backend are both great ideas. I realized quickly that forking the wagtailsearch app was a fool's errand. There are a lot of ways to skin this cat, but here's the technique I came up with (creating a custom search app that utilizes/modifies a small amount of wagtail core). I can now not only return mixed results from wagtail and non-wagtail models, but have custom sub-templates for each content type in the results, e.g. for a Page-derived result we can display seo_description, while in a Profile result we can show "about" text and a user avatar. Here's a little recipe/sketch in case it's useful to anyone in the future.

On your non-wagtail model, inherit from index.Indexed:

from wagtail.wagtailsearch import index
...
class Profile(models.Model, index.Indexed):


and define the search fields you want indexed:

    # Wagtail/elastic search
    search_fields
= (
        index
.SearchField('user', partial_match=True, boost=10),
        index
.SearchField('about', partial_match=True, boost=10),
        index
.SearchField('site_org_title', partial_match=True),
        index
.SearchField('site_personal_title', partial_match=True),
   
)


Also make sure your custom model has defined `get_absolute_url()` (you'll need it in the search results):

    def get_absolute_url(self):
       
return reverse('people_profile_detail', args=[str(self.user.username)])



Edit a profile and test that the content is being added to the elastic index:


from wagtail.wagtailsearch.backends import get_search_backend
s
= get_search_backend()
from people.models import Profile
s
.search("foobar", Profile)


Create a new app in your project called "search"

./manage.py startapp search


and add it to INSTALLED_APPS.

In your main urls.py, override the default wagtail search with your own:

from search import urls as search_urls
    url
(r'^search/', include(search_urls)),


Copy wagtail's default search code from https://github.com/torchbox/wagtail/blob/master/wagtail/wagtailsearch/views/frontend.py and paste into your search app's views.py.

Modify your search.views.py to get and append results from other models, then chain them together. We use itertools' "chain" function to combine multiple queries into a single result set. Note that we also create and append a "appname_modelname" property to each instance in the results, which we'll use for specifying customized display templates in the results.

from itertools import chain
from wagtail.wagtailsearch.backends import get_search_backend


...


   
# Search
   
if query_string != '':
        page_results
= models.Page.search(
            query_string
,
            show_unpublished
=show_unpublished,
            search_title_only
=search_title_only,
            extra_filters
=extra_filters,
            path
=path if path else request.site.root_page.path
       
)


       
# Also query non-wagtail models
        s
= get_search_backend()
        profile_results
= s.search(query_string, Profile)


        search_results
= list(chain(profile_results, page_results))


       
# Append a template name to each element so we can render content types differently
       
for s in search_results:
            s
.template_name = '{app}_{model}'.format(
                app
=s.get_indexed_instance()._meta.app_label,
                model
=s.get_indexed_instance()._meta.model_name
           
)


Now in your search_results.html, call different templates per content type in the results loop:

        {% for result in search_results %}
           
<div class="panel panel-default">
               
<div class="panel-body">
               
{% with template_name=result.template_name|stringformat:"s"|add:".html" %}
                   
{% include "search/includes/"|add:template_name %}
               
{% endwith %}
               
<p><small>page type: {{result.template_name}}</small></p>
               
</div>
            </
div>
       
{% empty %}
           
<li>No results found</li>
       
{% endfor %}


The individual result templates are standard stuff.

This approach gives you a lot of control and flexibility, but does raise interesting questions about how to combine results. In this example we end up showing Page-derived content first, then all Profile results. If you wanted to combine them, how would you order them since Profiles (probably) don't have a publish date, and may not be considered as important, etc. Every use case is different, and depends on what fields are / are not available (this in turn explains why wagtail only provides Page-based search by default - too many open questions when you venture outside of that).

I may go for a more refined approach in the future but this works and is fairly clean. 

./s


Realtor Services

unread,
Feb 20, 2018, 2:38:28 PM2/20/18
to Wagtail support
Not sure if the .annotate_score() method existed in 2015, but this is what I threw together. Easy to expand if you have an unknown number of models to search and maintains their order.

def multi_model_query(search_query):
   
# Get backend
    search_backend
= get_search_backend()
   
# Whatever models you want searched
    page_results
= Page.objects.live().search(search_query).annotate_score('score')
    user_results
= search_backend.search(search_query, User).annotate_score('score')
    query
= Query.get(search_query)
   
# Record hit
    query
.add_hit()
   
# Combine results
    sorted_results
= sorted(
        chain
(
            page_results
,
            user_results
           
),
        key
=attrgetter('score'),
        reverse
=True)
   
return sorted_results
Reply all
Reply to author
Forward
0 new messages