By default the "dictionary" the spell-checker uses is the the other words in the index. In this case your index is tiny and only contains a handful of words, and doesn't include the words you're trying to correct: "from", "India", etc.
Whoosh includes a way to check against a list of words instead, but I ran into two problems testing this:
* I just tried it and found two bugs in the implementation (fixed in commit a16ebac).
After I fixed the bugs, I ran into the second, much worse problem:
* If you use a long list of words (for example, the /usr/share/dict/words file on Unix/macOS), it will give weird suggestions because the list isn't weighted for word usage, and because the list may have some really very obscure words.
As an example, after I fixed the bugs, the suggested correction I got using /usr/share/dict/words was:
frim idea broking for campana
Not very useful! "Frim", "broking", and "campana" might technically be words in English (I've sure never heard of them), but they're obviously not the best suggestions!
I want to do better with this in upcoming versions, allowing import of actual spelling dictionary formats with usage weightings, and doing smarter suggestions with pyscho-acoustics like a real spell checker.
import os.path
from whoosh.spelling import ListCorrector
# Make a corrector that pulls from a list of words
if not os.path.exists("/usr/share/dict/words"):
assert False
with open("/usr/share/dict/words", encoding="utf8") as wordfile:
wordlist = sorted(wordfile.read().strip().split())
corrector = ListCorrector(wordlist)
schema = fields.Schema(title=fields.TEXT(stored=True),
path=fields.ID(stored=True),
content=fields.TEXT)
with TempIndex(schema) as ix:
with ix.writer() as w:
w.add_document(title=u"First document", path=u"/a",
content=u"This is the first document we've added!")
w.add_document(title=u"Second document", path=u"/b",
content=u"The second one is even more interesting!")
with ix.searcher() as searcher:
qstring = "frm indea wroking for campany"
from whoosh.qparser import QueryParser
qp = QueryParser("content", schema)
q = qp.parse(qstring)
# Make a dictionary associating fields with any custom corrector you
# want to use to check that field
field_correctors = {
"title": corrector,
"content": corrector
}
corrected = searcher.correct_query(q, qstring, prefix=1,
correctors=field_correctors)
print(corrected.string)
> --
> You received this message because you are subscribed to the Google Groups "Whoosh" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
whoosh+un...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.