I want to include the support for Danish in the HTMLParser of Lucene.
Workflow:
1) In the HTMLParser.jj I have added this to a token: < #LET:
["A"-"Z","a"-"z","0"-"9","æ","å","Ø","ø","Å","Æ"] >
2) In the StandarTokenizer.jj I have added "\u0080"-"\u00ff",
"\u0100"-"\u017f", "\u0180"-"\u024f", "\u00c0"-"\u00d6" to the
#LETTER: tag
When I search (after compiling and indexing) after words with special
characters in it, the search engine can't find them. For example: when
I search for "civilingeniør", I will get no result. When I write out
the comment string of the result (after another search of a word
nearby), I see this in my browser: "civilingeniør". So somehow the
parser is mapping the wrong characters.. What am I doing wrong?
Thx..
Hope it helps
Arnaud
"Code Master" <gal...@x-cago.com> wrote in message
news:be49b900.02092...@posting.google.com...