Wordbreak and word extraction in Go?

167 views
Skip to first unread message

Ingo Oeser

unread,
Sep 20, 2016, 5:34:29 PM9/20/16
to golang-nuts
Hi all,

I am pretty sure I am overlooking something in the repository https://godoc.org/golang.org/x/text but I cannot find something to split text into words according to the next Unicode word splitting algorithm.

Has anyone examples or can point me to the right direction? Can anyone confirm that this is missing? If missing, I would like to file an issue against the text repository for this.

Shawn Milochik

unread,
Sep 20, 2016, 5:35:57 PM9/20/16
to golang-nuts
How about strings.Fields?



Ingo Oeser

unread,
Sep 20, 2016, 6:06:29 PM9/20/16
to golang-nuts
Thanks for the suggestion, but I am looking for an implementation of http://unicode.org/reports/tr29/

Nigel Tao

unread,
Sep 20, 2016, 11:42:29 PM9/20/16
to Ingo Oeser, Marcel van Lohuizen, golang-nuts
I'd ask mpvl (CC'ed).

mp...@golang.org

unread,
Sep 22, 2016, 11:51:48 AM9/22/16
to Nigel Tao, Ingo Oeser, Marcel van Lohuizen, golang-nuts
Hi Ingo,

Thanks for your interest in x/text!  Text segmentation is high on the priority list for x/text, but not yet implemented. Indeed, x/text/cases implements a (close) approximation of Annex #29 optimized for title casing, but it is not the full thing.

For now, if your main interest is word segmentation, your best bet is to use github.com/blevesearch/segment. This is a decent implementation of Annex #29 for word breaking. I've been talking with Marty to see if this can be integrated with x/text even.

But it would help to file an issue with exactly what you need.

Please let me know if you have any other questions.

Best regards,

Marcel

Ingo Oeser

unread,
Sep 29, 2016, 4:07:28 PM9/29/16
to golang-nuts, nige...@golang.org, night...@googlemail.com, mp...@golang.org
thx Marcel,


if you could outline a possible designe there, that would be great.
Reply all
Reply to author
Forward
0 new messages