model inheritance problem (haystack + whoosh)

34 views

Skip to first unread message

Sebastian Quiles

unread,

Mar 22, 2017, 3:16:39 PM3/22/17

to django-haystack

Hi,

first of all I want to congratulate for the excelent job done in this project. A couple of months ago we needed to prototipe very quickly a document repository with search capabilities and we discovered Whoosh and Haystack as a very simple way to make it work. Haystack was a good way to be able to switch to any other engine in a easy way.

When we were developing we could use haystack for the indexing process, but when we tried to use it for the search mechanism we faced some problems and decided to search directly with wohoos, skipping the haystack layer. Now that we have some more time to spend, we are trying to search with haystack but still having problems.

Our django model has some inheritance:

 
class Indexable(PolymorphicModel, TimeStampedModel):
    name = models.CharField(max_length=500, null=True) 
    text = models.TextField(null=True, blank=True, default="") 
    library = models.ForeignKey(Library, null=True) [...] 

class File(Indexable, unicode_name):
    FILE_TYPE = Choices('Image', 'WordDocument', 'PlainText', 'DataSet')

class Row(Indexable, unicode_name):
    def __str__(self):

we have several indexes (libraries) and we are storing Files or Rows in them.

we have only one "haystack index class"

class FileIndex(indexes.SearchIndex, indexes.Indexable): 
    text = FullLanguageCharField(document=True, model_attr='text') 
    [...] 
    def get_model(self): 
        return Indexable
    [...]

The search is performed with this code in Whoosh:

        ix = open_dir(settings.HAYSTACK_CONNECTIONS[library]['PATH'], readonly=True)
        searcher = ix.searcher()
        pagenum = int(request.query_params.get('pagenum', 1))
        pagelen = 10
        parser = MultifieldParser([text], ix.schema).parse('hello')

        results = searcher.search_page(parser, pagenum, pagelen=pagelen)

the same query performed in haystack

c = (SearchQuerySet().using(library).filter(text = 'hello').count())

I've also added the following line in settings.py

HAYSTACK_LIMIT_TO_REGISTERED_MODELS = False

the first thing I ve tried was to count and see if I have the same result size, but I've got one less (and it seems to be allways one) in the Haystack method.

so if I debug i fall to the following line in whoosh_backend:

            try:
                raw_page = searcher.search_page(
                    parsed_query,
                    page_num,
                    **search_kwargs
                )
            except ValueError:

and after executing it raw_page has the following values
offset    int: 0
pagecount    int: 14    <<< correct size
pagelen    int: 1
pagenum    int: 1

but in the whoosh_backend the following line is executed

            results = self._process_results(raw_page, highlight=highlight, query_string=query_string, spelling_query=spelling_query, result_class=result_class)

and then in the _process_results method:



    def _process_results(self, raw_page, highlight=False, query_string='', spelling_query=None, result_class=None):
        from haystack import connections
        results = []
        hits = len(raw_page)
        if result_class is None:
            result_class = SearchResult
        facets = {}
        spelling_suggestion = None
        unified_index = connections[self.connection_alias].get_unified_index()
        indexed_models = unified_index.get_indexed_models()  #<<<********************** Here indexed_models is instanced with [Indexable]
        for doc_offset, raw_result in enumerate(raw_page):   
            score = raw_page.score(doc_offset) or 0
            app_label, model_name = raw_result[DJANGO_CT].split('.')
            additional_fields = {}
            model = haystack_get_model(app_label, model_name)  #<<<<******************* here model is instanced with File class
            if model and model in indexed_models:           #<<<*********************** this IF is not executed because model is File but is not included in [Indexable]
               [...]
            else:
                hits -= 1

when the for is executed, the "if model in indexed_models" fails and hits is by decreased 1 but the for cycle finishes (it is only executed once, maybe because pagelen = 1) and the final result of the function is:

results = []
hits = 13
facets = []
spelling_suggestion = [] (the whole process is very fast BUT the line where spelling_suggestion is calculated take several seconds)

I've tried to chenge this line:

            if model and model in indexed_models:


for this one:

            if model and issubclass(model, tuple(indexed_models)):

but it then fails in the following line:

                    index = unified_index.get_index(model)

saying there is no =registered index for File.

Any one can help... thanks!

--
Sebastian Quiles
ARGENTINA

Reply all

Reply to author

Forward

0 new messages