UDF for libstemmer

30 views
Skip to first unread message

Arnold Daniels

unread,
Jul 17, 2008, 12:38:36 AM7/17/08
to MySQL UDF Repository
Hi All,

I've just released an UDF lib to stem words using the snowball stemmer.
It only contains 1 function stem_word. Please have a look at
http://www.mysqludf.com/lib_mysqludf_stem.

Also, there a 3 bugs for the sys, str and preg udf libs. Please take a
look at http://bugs.mysqludf.com.

Best regards,
Arnold


Roland Bouman

unread,
Jul 17, 2008, 3:24:10 AM7/17/08
to The UDF Repository for MySQL
Hi Arnold,

> I've just released an UDF lib to stem words using the snowball stemmer.
> It only contains 1 function stem_word. Please have a look athttp://www.mysqludf.com/lib_mysqludf_stem.

nice job! I was able to install and run stem_word without issue.

A few comments on miscellaneous things:

#1
"The location of the plugin dir defaults to: <mysql-home>/lib/mysql/
but can be configured to a custom location in the my.cnf."

It's not a big deal, because you properly linked to the documentation
page. However, I always suggest people to run:

SHOW VARIABLES LIKE 'plugin_dir'

so they know immediately where to move the lib.

#2
In the API section, it would be cool if you could document the return
type for the functions.

#3
I was wondering what value to use for the language argument. I think
both 'en' and 'English' should work, but is it possible to document a
list of valid languages, or better, ask the stemmer which languages
are supported?

If I input some obvious nonsense, like stem_word('~', 'bla') I get
NULL. I am not sure what the best approach is although I expected to
get an error to inform me that '~' is not a supported language.

#4
I tried passing a non-constant for the language argument:

mysql> select stem_word(language, language) from world.countrylanguage
limit 1;
+-------------------------------+
| stem_word(language, language) |
+-------------------------------+
| NULL |
+-------------------------------+
1 row in set (0.00 sec)

Is this because you handle the error in the row-level function rather
than in the row-level function?
Personally I would prefer do this check in the init function:

if( args->args[0] != NULL){
strcpy(message, "Invalid argument value: language argument must be a
constant value.");
return 1;
}

(or something like this)

#5
I got the impression that stem_word coerces non-string arguments to
strings, is that correct? I was wondering if it would make more sense
to fail with an error in case a non-string is passed. (Not really
sure, just a thought)

(I tried: mysql> select stem_word(0,10) limit 1;
+-----------------+
| stem_word(0,10) |
+-----------------+
| NULL |
+-----------------+
)

#6 Finally I was wondering if you anticipate more functions will be
added to this lib. It seemed to me that it may make sens to move
stem_word to lib_mysqludf_str, but this is just a thought - maybe you
have some reasons to want to keep it in a separate lib.

>
> Also, there a 3 bugs for the sys, str and preg udf libs. Please take a
> look athttp://bugs.mysqludf.com.

I added a comment to the bug for the sys lib. I hope it will be picked
up so I get the feedback.
I wasn't really aware a bug was assigned to me, I guess I should check
it on a regular basis. Would it be possible to send automatic email
notification? That is, if something like that is already in place,
great, I just don't recall getting an email.

(of course rss/atom feed would work just as well for me, apologies if
that is in place already, I haven't seen it.)

kind regards,

Roland
Reply all
Reply to author
Forward
0 new messages