Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Number of Words in a String

6 views
Skip to first unread message

Gregory Lypny

unread,
Aug 11, 2009, 3:58:29 AM8/11/09
to
Hello everyone,

Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.

Length[ReadList[StringToStream["The cat in the hat."], Word]]

Returns 5.

Gregory

David Annetts

unread,
Aug 12, 2009, 4:31:31 AM8/12/09
to

Hi Gregory,

What about Length@StringSplit@"The cat in the hat"?

Regards,

Dave.


Bill Rowe

unread,
Aug 12, 2009, 4:31:53 AM8/12/09
to
On 8/11/09 at 4:02 AM, gregor...@videotron.ca (Gregory Lypny)
wrote:

>Is this the simplest way to find the number of words in a string?
>Seems a little complicated, and I can't seem to turn it into a
>function because when I replace the string with the argument
>placeholder myString_ I get an error message saying that a string is
>expected in that spot.

>Length[ReadList[StringToStream["The cat in the hat."], Word]]

>Returns 5.

It seems to me simpler to do:

In[5]:= Length@StringSplit["The cat in the hat."]

Out[5]= 5

This of course implicitly assumes words are any series of
non-space characters. That assumption can be modified by using
the second argument to StringSplit.

But this doesn't seem to address your question about a
placeholder. It is unclear to me what you are doing. Literally,
myString_ is seen by Mathematica as a named pattern that matches
anything, not a string. If I enter

Length@StringSplit[myString_]

I will see an error message since the argument supplied to
StringSplit is not a string as expected.


Bob Hanlon

unread,
Aug 12, 2009, 4:32:25 AM8/12/09
to

wordCount[str_String] :=
Module[{s = StringReplace[str,
Whitespace -> " "]},
StringCount[s, " "] + 1]

str = "The cat in the hat.";

wordCount[str]

5


Bob Hanlon

---- Gregory Lypny <gregor...@videotron.ca> wrote:

=============
Hello everyone,

Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.

Length[ReadList[StringToStream["The cat in the hat."], Word]]

Returns 5.

Gregory

David Reiss

unread,
Aug 12, 2009, 4:32:36 AM8/12/09
to
Length[StringSplit["The cat in the hat."]]

--David

> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory


Gregory Lypny

unread,
Aug 12, 2009, 4:32:58 AM8/12/09
to
Ahhh, that's what I'm talkin' about! Four different solutions that
give me a much better understanding of the text functions. Thank you,
Bob, David, Leonid, and Patrick.

Gregory

Leonid Shifrin

unread,
Aug 12, 2009, 4:33:31 AM8/12/09
to
Gregory,

why don't you just use

In[1] =
wordNumber[x_String] := Length@StringCases[x, LetterCharacter ..];

In[2] =
wordNumber["The cat in the hat."]

Out[2] = 5

Regards,
Leonid


On Tue, Aug 11, 2009 at 12:02 PM, Gregory Lypny
<gregor...@videotron.ca>wrote:

> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>

> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
>
>

Patrick Scheibe

unread,
Aug 12, 2009, 4:34:03 AM8/12/09
to
Hi,

a simple version is

Length@StringSplit["The cat in the hat."]

Cheers
Patrick

Pillsy

unread,
Aug 12, 2009, 4:34:25 AM8/12/09
to
On Aug 11, 3:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:

> Is this the simplest way to find the number of words in a string?

[...]


> Length[ReadList[StringToStream["The cat in the hat."], Word]]

I would use StringCount:

StringCount["Now is the time for all good men to come to the aid of
their country.",
LetterCharacter..]

will return 16. You can tweak the pattern if you want to, say, treat
strings of digits as words as well.

Cheers,
Pillsy

Thomas Dowling

unread,
Aug 12, 2009, 4:34:46 AM8/12/09
to
Hello,

1. One possibility is to use StringSplit

For example:

In[105]= Length[StringSplit["The cat in a hat."]]

Out[105] = 5


2. You may wish to include a delimiter

In[107]= StringSplit["The cat in a hat, (not on the
mat)??.",Except[WordCharacter]..]

Out[107]= {The,cat,in,a,hat,not,on,the,mat}

(and take the Length)

3. Another way that works which I found in the documentation is StringCases

In[113]= Length@StringCases["The cat in a hat, (not on the
mat)??",WordCharacter..]

Out[113]= 9


Tom Dowling

On Tue, Aug 11, 2009 at 9:02 AM, Gregory Lypny
<gregor...@videotron.ca>wrote:

> Hello everyone,
>


> Is this the simplest way to find the number of words in a string?

> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>

> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>

> Returns 5.
>
> Gregory
>
>

Tomas Garza

unread,
Aug 12, 2009, 4:35:20 AM8/12/09
to
I would use

In[1]:= yourlist="The cat in the hat."

In[2]:= Length[StringSplit[yourlist]]
Out[2]= 5
I feel it is simple enough.

Tomas


> Date: Tue, 11 Aug 2009 04:02:50 -0400
> From: gregor...@videotron.ca
> Subject: Number of Words in a String
> To: math...@smc.vnet.net

Yves Klett

unread,
Aug 12, 2009, 4:35:53 AM8/12/09
to
Hi,

directly from the documentation:

StringCount["The cat in the hat.", WordCharacter..]

Funny enough, for the U.S. constitution example, vim, Word and
OpenOffice report 7620 words instead of Mathematica's 7632. So
WordCharacter.. seems not completely equivalent to other counting methods.

Regards,
Yves

Gregory Lypny schrieb:

jae...@wolfram.com

unread,
Aug 12, 2009, 4:36:25 AM8/12/09
to
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
>
>

This might be simpler:

In[24]:= Length[StringSplit["The cat in the hat."]]
Out[24]= 5

- Jaebum


meitnik

unread,
Aug 12, 2009, 4:36:46 AM8/12/09
to
hi,

Try this:

CountWords[str_String]:=Length[StringSplit[str]]

Its not perfect, still working on it. I suspect only via a RegEx use
will we get what we want.

andrew

pfalloon

unread,
Aug 12, 2009, 4:37:08 AM8/12/09
to

were you using Set ( a = b) rather than SetDelayed (a := b) in your
function definition? The following works for me:

In[12]:= f[str_] := Length[ReadList[StringToStream[str], Word]]
f["hey there"]
Out[13]= 2

However, it's much quicker to use the StringSplit function:

In[14]:= Length@StringSplit["hey there"]
Out[14]= 2
In[19]:= Do[f["hey there"],{5000}] // AbsoluteTiming
Out[19]= {5.1405921,Null}
In[21]:= Do[Length@StringSplit["hey there"],{5000}] // AbsoluteTiming
Out[21]= {0.0156249,Null}

Cheers,
Peter.

Albert Retey

unread,
Aug 12, 2009, 4:37:29 AM8/12/09
to
Hi,

>
> Is this the simplest way to find the number of words in a string?

I don't know if you would consider this simpler, it is straight from the
documentation for StringCount:

StringCount["The cat in the hat.", WordCharacter ..]

> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.

I don't know what you did when turning it into a function, but this:

numwords[s_String] := Length[ReadList[StringToStream[s], Word]]

numwords["The cat in the hat."]

seems to work alright....


hth,

albert

Armand Tamzarian

unread,
Aug 12, 2009, 4:38:02 AM8/12/09
to
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory

StringCount["The cat in the hat.", Whitespace] + 1

probably a better way is use WordBoundary.

StringCount["The cat in the hat.", WordBoundary]/2

Mike

David Annetts

unread,
Aug 13, 2009, 3:21:01 AM8/13/09
to

Hi Albert,


> I don't know what you did when turning it into a function, but this:
>
> numwords[s_String] := Length[ReadList[StringToStream[s], Word]]
>
> numwords["The cat in the hat."]
>
> seems to work alright....

It does for a few calls. But many calls to this (or similar) functions
leaves many open streams which slows your machine. To see this, try the
following sequence (which numwords defined as above)

opn = Streams[]
numwords@"the cat in the hat" & /@ Range[15];
opn = Streams[]

The streams can be closed using Close[#]& /@ Select[opn, SameQ[Head@#,
InputStream] &];

There are at least three remedies to this.

One is to remember to periodically close all opened streams.

Another is to modify Albert's function to something like

numwordsb[s_String] := Block[{opn, lng}, lng = Length[ReadList[opn =
StringToStream[s], Word]]; Close[opn]; lng]

But the most effective is probably to avoid StringToStream as much as you
can. For me, this is by using Import[, "Table"] rather than my <6.0
hand-rolled code.

YMMV.

Regards,

Dave.


ADL

unread,
Aug 14, 2009, 5:59:13 AM8/14/09
to
With respect to the use of regular expressions or Mathematica string
patterns, note that the former are faster 20-30%:

In[1]:= NN = 1000000;

For word splitting:

In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}

In[3]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}

In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}

In[5]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}

so: 22% faster with regex.


For word counting:

In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9

In[7]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}

In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9

In[9]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}

so, 32% faster with regex.


ADL

0 new messages