Willem Rein Oudshoorn
unread,Sep 9, 2012, 3:14:24 PM9/9/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to montez...@googlegroups.com
I think I found a small bug in the standard analyzer.
If I do
(montezuma:all-tokens * nil "a is bee there")
with '*' the standard analyzer, I expect only one token 'bee', but
I get a token for all the words.
As far as I can see, the reason is that the generic method
(token-stream ((self standard-analyzer) ...))
Ignores the stop-words that are stored in the 'standard-analyzer' class
(which is a subclass of stop-analyzer).
The call to
(token-stream ((self stop-analyzer) ...))
Does take the stop-words into account.
(See below for a full transcript to see where I think it goes wrong.)
I have made a patch, to fix this.
Shall I send a pull request?
(Or maybe this isn't a problem at all and I am misunderstanding what the
intention is.)
Kind regards
Wim Oudshoorn.
PS. Transcript of my confusion
im-xml> (montezuma:analyzer *index*)
#<MONTEZUMA:STANDARD-ANALYZER {100BFFB503}>
im-xml> (inspect *)
The object is a STANDARD-OBJECT of type MONTEZUMA:STANDARD-ANALYZER.
0. STOP-WORDS: ("a" "an" "and" "are" "as" "at" "be" "but" "by" "for" "if"
"in" "into" "is" "it" "no" "not" "of" "on" "or" "s" "such"
"t" "that" "the" "their" "then" "there" "these" "they"
"this" "to" "was" "will" "with")
> (montezuma:all-tokens * nil "a is bee there")
(#S(MONTEZUMA::TOKEN :IMAGE "a" :START 0 :END 1 :INCREMENT 1 :TYPE :WORD)
#S(MONTEZUMA::TOKEN :IMAGE "is" :START 2 :END 4 :INCREMENT 1 :TYPE :WORD)
#S(MONTEZUMA::TOKEN
:IMAGE "bee"
:START 5
:END 8
:INCREMENT 1
:TYPE :WORD)
#S(MONTEZUMA::TOKEN
:IMAGE "there"
:START 9
:END 14
:INCREMENT 1
:TYPE :WORD))