Add hyphen to regular expression used in default token definition settings

theres...@gmail.com

unread,

Jan 12, 2016, 3:19:30 AM1/12/16

to AntWordProfiler-Discussion

Hi everyone,

I've been using the AntWordProfiler with the default token definition settings so far. Now I have a few texts in which words are separated by hyphens (e.g., "unfortun-ately"). In order not to mess up the token count, I'd like to define the hyphen as a character too. However, since I've used the default settings so far and since I want the analyses to be as similar as possible, I'd like to add the hypen to the regular expression used for the default settings: (?<![\p{N}\p{L}])\p{L}+[\p{N}]*

How would I do that? Unfortunately that exceeds my very basic knowledge of regular expressions..

Thanks a lot in advance!

Kind regards,

Theresa

Laurence Anthony

unread,

Jan 12, 2016, 3:23:18 AM1/12/16

to antword...@googlegroups.com

I think you just need to replace the middle \p{L}+ with the following:

[-\p{L}]+

Regards,

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antwordprofil...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

theres...@gmail.com

unread,

Jan 12, 2016, 4:00:26 AM1/12/16

to AntWordProfiler-Discussion

Thanks a lot for the fast response! I'll give that a try.

Best,

Theresa

Reply all

Reply to author

Forward