Peter Norvig's spelling corrector in Racket

5 views
Skip to first unread message

Brian Adkins

unread,
Oct 12, 2015, 5:18:34 PM10/12/15
to TriFunc
Here's a port of Peter Norvig's Python spelling corrector in Racket. Got a few helpful tips from some #racket folks on IRC. It's still an early draft, so I may be able to make it a bit more concise later, but I was fairly pleased with Racket's expressiveness. 

A couple areas where the Python code is nicer is the list comprehensions and the fact that an empty list/set is considered "false". I had to write a little helper function (nes) to return false for an empty set or the set if not empty.

Norvig's version is here:

http://norvig.com/spell-correct.html

Brian

--- snip ---
#lang racket

; Thanks to Vincent St-Amour and Sam Tobin-Hochstadt for their tips on #racket

(define (words text)
  (regexp-match* #rx"[a-z]+" (string-downcase text)))

(define (train features)
  (define model (make-hash))
  (for ([f features])
    (hash-update! model f add1 1))
  model)

(define nwords
  (train (words (file->string "spelling-words.txt"))))

(define alphabet "abcdefghijklmnopqrstuvwxyz")

(define (edits1 word)
  (let* ([splits (for/list ([i (in-range (+ (string-length word) 1))])
                   (cons (substring word 0 i) (substring word i)))]
         [deletes (for/set ([(a b) (in-dict splits)]
                             #:when (> (string-length b) 0))
                    (string-append a (substring b 1)))]
         [transposes (for/set ([(a b) (in-dict splits)]
                                #:when (> (string-length b) 1))
                       (string-append a (substring b 1 2) (substring b 0 1) (substring b 2)))]
         [replaces (for/set ([(a b) (in-dict splits)]
                              #:when (> (string-length b) 0)
                              [c alphabet])
                     (string-append a (string c) (substring b 1)))]
         [inserts (for*/set ([(a b) (in-dict splits)]
                             [c alphabet])
                    (string-append a (string c) b))])
    (set-union deletes transposes replaces inserts)))

(define (known-edits2 word)
  (for*/set ([e1 (edits1 word)]
              [e2 (edits1 e1)]
              #:when (hash-has-key? nwords e2))
    e2))

(define (known words)
  (for/set ([w words] #:when (hash-has-key? nwords w))
    w))

; If the set argument is non-empty, return it; otherwise, return #f
; nes = non-empty set
(define (nes set)
  (if (set-empty? set)
      #f
      set))

(define (correct word)
  (let ([candidates (or (nes (known (list word)))
                        (nes (known (edits1 word)))
                        (nes (known-edits2 word))
                        (set word))])
    (argmax (λ (w) (hash-ref nwords w 1)) (set->list candidates))))
--- snip ---

Reply all
Reply to author
Forward
0 new messages