Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

AI::Categorizer and Umlauts?

4 views
Skip to first unread message

Robert Barta

unread,
Jun 4, 2007, 12:08:45 PM6/4/07
to per...@perl.org
Hi,

I seem to have problems with umlauts, such as in words

Präsentation

When a document is added with

return new AI::Categorizer::Document(name => $filename,
content => $content);

to the collection, after loading and finish, the feature vector
contains only fragments of these words, such as

pr => 1
sentation => 1

Setting the locale on the shell or in Perl does not have any effect

use locale;

not even with turning on de_AT explicitly.

--

Aaaaaah, lib/AI/Categorizer/Document.pm is NOT using locale and use locale
is very, uhm, local %-)

Patching the file does not seem to break the test cases.

\rho

0 new messages