Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.
Length[ReadList[StringToStream["The cat in the hat."], Word]]
Returns 5.
Gregory
What about Length@StringSplit@"The cat in the hat"?
Regards,
Dave.
>Is this the simplest way to find the number of words in a string?
>Seems a little complicated, and I can't seem to turn it into a
>function because when I replace the string with the argument
>placeholder myString_ I get an error message saying that a string is
>expected in that spot.
>Length[ReadList[StringToStream["The cat in the hat."], Word]]
>Returns 5.
It seems to me simpler to do:
In[5]:= Length@StringSplit["The cat in the hat."]
Out[5]= 5
This of course implicitly assumes words are any series of
non-space characters. That assumption can be modified by using
the second argument to StringSplit.
But this doesn't seem to address your question about a
placeholder. It is unclear to me what you are doing. Literally,
myString_ is seen by Mathematica as a named pattern that matches
anything, not a string. If I enter
Length@StringSplit[myString_]
I will see an error message since the argument supplied to
StringSplit is not a string as expected.
str = "The cat in the hat.";
wordCount[str]
5
Bob Hanlon
---- Gregory Lypny <gregor...@videotron.ca> wrote:
=============
Hello everyone,
Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.
Length[ReadList[StringToStream["The cat in the hat."], Word]]
Returns 5.
Gregory
--David
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory
Gregory
why don't you just use
In[1] =
wordNumber[x_String] := Length@StringCases[x, LetterCharacter ..];
In[2] =
wordNumber["The cat in the hat."]
Out[2] = 5
Regards,
Leonid
On Tue, Aug 11, 2009 at 12:02 PM, Gregory Lypny
<gregor...@videotron.ca>wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
>
>
a simple version is
Length@StringSplit["The cat in the hat."]
Cheers
Patrick
> Is this the simplest way to find the number of words in a string?
[...]
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
I would use StringCount:
StringCount["Now is the time for all good men to come to the aid of
their country.",
LetterCharacter..]
will return 16. You can tweak the pattern if you want to, say, treat
strings of digits as words as well.
Cheers,
Pillsy
1. One possibility is to use StringSplit
For example:
In[105]= Length[StringSplit["The cat in a hat."]]
Out[105] = 5
2. You may wish to include a delimiter
In[107]= StringSplit["The cat in a hat, (not on the
mat)??.",Except[WordCharacter]..]
Out[107]= {The,cat,in,a,hat,not,on,the,mat}
(and take the Length)
3. Another way that works which I found in the documentation is StringCases
In[113]= Length@StringCases["The cat in a hat, (not on the
mat)??",WordCharacter..]
Out[113]= 9
Tom Dowling
On Tue, Aug 11, 2009 at 9:02 AM, Gregory Lypny
<gregor...@videotron.ca>wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
>
>
In[1]:= yourlist="The cat in the hat."
In[2]:= Length[StringSplit[yourlist]]
Out[2]= 5
I feel it is simple enough.
Tomas
> Date: Tue, 11 Aug 2009 04:02:50 -0400
> From: gregor...@videotron.ca
> Subject: Number of Words in a String
> To: math...@smc.vnet.net
directly from the documentation:
StringCount["The cat in the hat.", WordCharacter..]
Funny enough, for the U.S. constitution example, vim, Word and
OpenOffice report 7620 words instead of Mathematica's 7632. So
WordCharacter.. seems not completely equivalent to other counting methods.
Regards,
Yves
Gregory Lypny schrieb:
This might be simpler:
In[24]:= Length[StringSplit["The cat in the hat."]]
Out[24]= 5
- Jaebum
Try this:
CountWords[str_String]:=Length[StringSplit[str]]
Its not perfect, still working on it. I suspect only via a RegEx use
will we get what we want.
andrew
were you using Set ( a = b) rather than SetDelayed (a := b) in your
function definition? The following works for me:
In[12]:= f[str_] := Length[ReadList[StringToStream[str], Word]]
f["hey there"]
Out[13]= 2
However, it's much quicker to use the StringSplit function:
In[14]:= Length@StringSplit["hey there"]
Out[14]= 2
In[19]:= Do[f["hey there"],{5000}] // AbsoluteTiming
Out[19]= {5.1405921,Null}
In[21]:= Do[Length@StringSplit["hey there"],{5000}] // AbsoluteTiming
Out[21]= {0.0156249,Null}
Cheers,
Peter.
>
> Is this the simplest way to find the number of words in a string?
I don't know if you would consider this simpler, it is straight from the
documentation for StringCount:
StringCount["The cat in the hat.", WordCharacter ..]
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
I don't know what you did when turning it into a function, but this:
numwords[s_String] := Length[ReadList[StringToStream[s], Word]]
numwords["The cat in the hat."]
seems to work alright....
hth,
albert
StringCount["The cat in the hat.", Whitespace] + 1
probably a better way is use WordBoundary.
StringCount["The cat in the hat.", WordBoundary]/2
Mike
It does for a few calls. But many calls to this (or similar) functions
leaves many open streams which slows your machine. To see this, try the
following sequence (which numwords defined as above)
opn = Streams[]
numwords@"the cat in the hat" & /@ Range[15];
opn = Streams[]
The streams can be closed using Close[#]& /@ Select[opn, SameQ[Head@#,
InputStream] &];
There are at least three remedies to this.
One is to remember to periodically close all opened streams.
Another is to modify Albert's function to something like
numwordsb[s_String] := Block[{opn, lng}, lng = Length[ReadList[opn =
StringToStream[s], Word]]; Close[opn]; lng]
But the most effective is probably to avoid StringToStream as much as you
can. For me, this is by using Import[, "Table"] rather than my <6.0
hand-rolled code.
YMMV.
Regards,
Dave.
In[1]:= NN = 1000000;
For word splitting:
In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}
In[3]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}
In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}
In[5]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}
so: 22% faster with regex.
For word counting:
In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9
In[7]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}
In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9
In[9]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}
so, 32% faster with regex.
ADL