Add hyphen to regular expression used in default token definition settings

55 views
Skip to first unread message

theres...@gmail.com

unread,
Jan 12, 2016, 3:19:30 AM1/12/16
to AntWordProfiler-Discussion
Hi everyone,

I've been using the AntWordProfiler with the default token definition settings so far. Now I have a few texts in which words are separated by hyphens (e.g., "unfortun-ately"). In order not to mess up the token count, I'd like to define the hyphen as a character too. However, since I've used the default settings so far and since I want the analyses to be as similar as possible, I'd like to add the hypen to the regular expression used for the default settings: (?<![\p{N}\p{L}])\p{L}+[\p{N}]*
How would I do that? Unfortunately that exceeds my very basic knowledge of regular expressions..
Thanks a lot in advance!

Kind regards,
Theresa

Laurence Anthony

unread,
Jan 12, 2016, 3:23:18 AM1/12/16
to antword...@googlegroups.com
I think you just need to replace the middle \p{L}+ with the following:

[-\p{L}]+

Regards,

Laurence.




###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antwordprofil...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

theres...@gmail.com

unread,
Jan 12, 2016, 4:00:26 AM1/12/16
to AntWordProfiler-Discussion
Thanks a lot for the fast response! I'll give that a try.

Best,
Theresa
Reply all
Reply to author
Forward
0 new messages