Count words

65 views
Skip to first unread message

John Otto Knoke

unread,
Dec 6, 2015, 8:58:13 AM12/6/15
to openr...@googlegroups.com
Hello,

I need help to create an Expression that counts how many words are in each cell row.For example, the value at the screenshot has 4 words. Thanks in advance!


Inline image 1

Joe Wicentowski

unread,
Dec 6, 2015, 9:12:48 AM12/6/15
to openr...@googlegroups.com
Hi John,

> I need help to create an Expression that counts how many words are in each cell row.For example, the value at the screenshot has 4 words. Thanks in advance!

This should do the trick:

value.split(" ").length()

If you want to account for multiple spaces, you can use the regular
expression form for "whitespace characters", \s, with the "one or
more" cardinality expression, +:

value.split(/\s+/).length()

Joe

John Otto Knoke

unread,
Dec 6, 2015, 9:58:51 AM12/6/15
to openr...@googlegroups.com
Thanks, Joe, it works perfectly. I understand the split at the space but why you added at the end length()

Thanks again


Joe

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joe Wicentowski

unread,
Dec 6, 2015, 1:23:29 PM12/6/15
to openr...@googlegroups.com
Hi John,

Great!  Split returns an array, and length counts the number of items in an array.  You'd think they'd have chosen the word "count", but alas.  Here is the documentation for array-related functions:


Joe

Sent from my iPhone

Thad Guidry

unread,
Dec 6, 2015, 2:33:18 PM12/6/15
to openrefine
The idea of the "length of array" dates back to I think ANSI C constructs, maybe before, but here is basically the definition as to the "why" in paragraph 2 of this:
https://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html

Specifically, "If an array has n components, we say n is the length of the array; the components of the array are referenced using integer indices from 0 to n - 1, inclusive."

We decided not to break convention from many other programming languages by going with the term "count".

Reply all
Reply to author
Forward
0 new messages