Hi, following StackOverflow question here:
https://stackoverflow.com/questions/53937669/integer-too-large-error-with-vectoring-during-whoosh-indexing
I was advised to ask the question here because some people in this forum managed to index a large amount of documents.
This is the error:
Traceback (most recent call last):
File "...", line 256, in <module>
...
File "/home/nlp/*/anaconda3/envs/riken/lib/python3.6/site-packages/whoosh/writing.py", line 771, in add_document
perdocwriter.add_vector_items(fieldname, field, vitems)
File "/home/nlp/*/anaconda3/envs/riken/lib/python3.6/site-packages/whoosh/codec/whoosh3.py", line 244, in add_vector_items
self.add_column_value(vecfield, VECTOR_COLUMN, offset)
File "/home/nlp/*/anaconda3/envs/riken/lib/python3.6/site-packages/whoosh/codec/base.py", line 821, in add_column_value
self._get_column(fieldname).add(self._docnum, value)
File "/home/nlp/*/anaconda3/envs/riken/lib/python3.6/site-packages/whoosh/columns.py", line 678, in add
self._dbfile.write(self._pack(v))
struct.error: 'I' format requires 0 <= number <= 4294967295
From the StackOverflow question:
"It looks like the field that is used as a document index is only designed to be a 32-bit unsigned int, which gives you a limit of roughly 4M documents."
If you have a way of circumventing or fixing this problem, help is appreciated.