AssertionError in PY38 on update_document

37 views
Skip to first unread message

Waylan Limberg

unread,
Jan 14, 2020, 10:57:07 AM1/14/20
to Whoosh
I have just updated a project from Python 2.7 to Python 3.8. Everything (searching the existing index, etc) is working fine expect for `update_document`. I've confirmed that all input is Unicode, but every time I get the following error:

  File "server3.py", line 93, in update_index                                                                          
    w
.update_document(**idata)                                                                                          
 
File "venv\lib\site-packages\whoosh\writing.py", line 210, in __exit__            
   
self.commit()                                                                                                      
 
File "venv\lib\site-packages\whoosh\writing.py", line 1037, in commit              
   
self.writer.commit(*args, **kwargs)                                                                                
 
File "venv\lib\site-packages\whoosh\writing.py", line 922, in commit              
    finalsegments
= self._merge_segments(mergetype, optimize, merge)                                                    
 
File "venv\lib\site-packages\whoosh\writing.py", line 827, in _merge_segments      
   
return mergetype(self, self.segments)                                                                              
 
File "venv\lib\site-packages\whoosh\writing.py", line 101, in MERGE_SMALL          
    writer
.add_reader(reader)                                                                                          
 
File "venv\lib\site-packages\whoosh\writing.py", line 710, in add_reader          
   
self.add_postings_to_pool(reader, basedoc, docmap)                                                                  
 
File "venv\lib\site-packages\whoosh\writing.py", line 647, in add_postings_to_pool
   
for item in items:                                                                                                  
 
File "venv\lib\site-packages\whoosh\writing.py", line 583, in _process_posts      
   
for fieldname, text, docnum, weight, vbytes in items:                                                              
 
File "venv\lib\site-packages\whoosh\reading.py", line 429, in iter_postings        
   
yield (fieldname, btext, m.id(), m.weight(), m.value())                                                            
 
File "venv\lib\site-packages\whoosh\codec\whoosh3.py", line 998, in value          
   
self._read_values()                                                                                                
 
File "venv\lib\site-packages\whoosh\codec\whoosh3.py", line 1119, in _read_values  
   
assert isinstance(vs, bytes_type)                                                                                  
AssertionError

 It appears to be complainting that something is not `bytes_type`, but  everything I'm passing in is (and should be) Unicode, so this doesn't make sense to me. And `vs` is not one of my fields, so this seems to be some internal variable anyway. Both in Python 2.7 and 3.8 I'm using Whoosh 2.7.4, but the error only occurs in Python 3.8. I do not have any other Python versions installed and therefore have not tested in those versions.

As an aside, the way my app works, a new `document` is added to the index with the one unique field only and that works fine. However, when I later try to update the `document` with data in the other fields, the error occurs. The additional data consists of 3 fields, two of which are indexed and a third which is stored. The two indexed fields contain unicode strings, while the stored field contains a dict of unicode strings.

Any help is appreciated.

Thanks,
Waylan

Waylan Limberg

unread,
Jan 14, 2020, 1:14:57 PM1/14/20
to Whoosh
I went spelunking through the code and it appears that when the `update_document` method attempts to build a collection of existing documents, the error occurs. It is not related to anything I pass to it, but from what it reads from the existing index. In other words, this appears to me to be an internal bug.

In whoosh3.py line 1109  it calls `self._read_data()` and the value of `_data[2]` is then tested to be a byte string a few lines later (line 1119) where is fails as apparently it is not a byte string. Looking at `self._read_data` it would appear that `_data` is just an unpickled object read from the file system index (see line 1077). This is deep in the code and way outside userland. ;(

Florian Schulze

unread,
Jan 15, 2020, 2:58:57 AM1/15/20
to 'Waylan Limberg' via Whoosh

Are you trying to reuse the index created by Python 2.7? In my experience that doesn't work, you have to recreate the index with Python 3.8.

Regards,
Florian Schulze

--
You received this message because you are subscribed to the Google Groups "Whoosh" group.
To unsubscribe from this group and stop receiving emails from it, send an email to whoosh+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/whoosh/6c5b6e62-7fb0-4753-ac6b-9aa853f0692b%40googlegroups.com.

Waylan Limberg

unread,
Jan 17, 2020, 2:49:32 PM1/17/20
to Whoosh
That resolved the issue. Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to who...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages