How to perform spell checking using Whoosh python library

413 views
Skip to first unread message

Chowdam Praveen

unread,
Jun 29, 2017, 9:37:54 AM6/29/17
to Whoosh
How to perform spell checking using whoosh library. I have added some code which is there in documentation.
but it is not correcting words. Please find my code.

def main(): print " Hi" schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in("/home/praveen/Downloads/who", schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b",content=u"The second one is even more interesting!") writer.commit() qstring = "frm indea wroking for campany" qp = qparser.QueryParser("content", ix.schema) q = qp.parse(qstring) # Try correcting the query with ix.searcher() as s: corrected = s.correct_query(q,qstring) print(corrected) print(corrected.query) if corrected.query != q: print("Did you mean:", corrected.string) if __name__ == "__main__": main();



and my output is :

Hi Correction(And([Term('content', u'frm'), Term('content', u'indea'), Term('content', u'wroking'), Term('content', u'campany')]), 'frm indea wroking for campany') (content:frm AND content:indea AND content:wroking AND content:campany)

Here input is qstring ="
frm indea wroking for campany" ,

I am expecting as a output like " from India working for company"
But output is coming as it is equal to input string . It is not correcting . Not getting
"Did you mean:"

Thanks,

Matt Chaput

unread,
Jun 30, 2017, 1:40:08 PM6/30/17
to who...@googlegroups.com
By default the "dictionary" the spell-checker uses is the the other words in the index. In this case your index is tiny and only contains a handful of words, and doesn't include the words you're trying to correct: "from", "India", etc.

Whoosh includes a way to check against a list of words instead, but I ran into two problems testing this:

* I just tried it and found two bugs in the implementation (fixed in commit a16ebac).

After I fixed the bugs, I ran into the second, much worse problem:

* If you use a long list of words (for example, the /usr/share/dict/words file on Unix/macOS), it will give weird suggestions because the list isn't weighted for word usage, and because the list may have some really very obscure words.

As an example, after I fixed the bugs, the suggested correction I got using /usr/share/dict/words was:

frim idea broking for campana

Not very useful! "Frim", "broking", and "campana" might technically be words in English (I've sure never heard of them), but they're obviously not the best suggestions!

I want to do better with this in upcoming versions, allowing import of actual spelling dictionary formats with usage weightings, and doing smarter suggestions with pyscho-acoustics like a real spell checker.




import os.path
from whoosh.spelling import ListCorrector

# Make a corrector that pulls from a list of words
if not os.path.exists("/usr/share/dict/words"):
assert False

with open("/usr/share/dict/words", encoding="utf8") as wordfile:
wordlist = sorted(wordfile.read().strip().split())
corrector = ListCorrector(wordlist)

schema = fields.Schema(title=fields.TEXT(stored=True),
path=fields.ID(stored=True),
content=fields.TEXT)

with TempIndex(schema) as ix:
with ix.writer() as w:
w.add_document(title=u"First document", path=u"/a",
content=u"This is the first document we've added!")
w.add_document(title=u"Second document", path=u"/b",
content=u"The second one is even more interesting!")

with ix.searcher() as searcher:
qstring = "frm indea wroking for campany"
from whoosh.qparser import QueryParser
qp = QueryParser("content", schema)
q = qp.parse(qstring)

# Make a dictionary associating fields with any custom corrector you
# want to use to check that field
field_correctors = {
"title": corrector,
"content": corrector
}

corrected = searcher.correct_query(q, qstring, prefix=1,
correctors=field_correctors)
print(corrected.string)
> --
> You received this message because you are subscribed to the Google Groups "Whoosh" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to whoosh+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages