inception tokenizer

36 views
Skip to first unread message

Anya Gajjar

unread,
Dec 14, 2020, 5:50:47 AM12/14/20
to webanno-user
Hi all,

Was wondering what tokenizer inception uses for the English language?

Cheers,
Anya

Ute Winchenbach

unread,
Dec 15, 2020, 12:44:25 AM12/15/20
to webann...@googlegroups.com

Hi Anya,

INCEpTION uses the Java BreakIterator for tokenization.

Best,

Ute

--
You received this message because you are subscribed to the Google Groups "webanno-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webanno-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webanno-user/55e8ddfe-949d-49d2-8df1-7c3f1e7d1a97n%40googlegroups.com.

Richard Eckart de Castilho

unread,
Dec 15, 2020, 4:00:45 AM12/15/20
to Anya Gajjar, webanno-user
Hi,

> On 14. Dec 2020, at 11:50, Anya Gajjar <anjaniin...@gmail.com> wrote:
>
> Was wondering what tokenizer inception uses for the English language?

WebAnno uses the Java BreakIterator. The exact rules are determined by the Java runtime that you are using since this is a platform class. You can look at [1] for what the rules *may** look like.

Cheers,

-- Richard

[1] http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/text/resources/BreakIteratorRules.java#l289

Cf: https://groups.google.com/g/webanno-user/c/tgSLYTblvtE/m/Yo4DxdTCDQAJ

Anji

unread,
Dec 15, 2020, 5:07:44 AM12/15/20
to Richard Eckart de Castilho, webanno-user
Thank you very much Ute and Richard. Also thanks for the link. I wanted to look at these details.
--
Anjani K Dhrangadhariya
Reply all
Reply to author
Forward
0 new messages