[R] substring comparison

0 views
Skip to first unread message

Claus O'Rourke

unread,
Apr 29, 2010, 1:17:52 PM4/29/10
to r-h...@r-project.org
Hi all,

I'm writing a script to do some basic text analysis in R. Let's assume
I have a data frame named data which contains a column named 'utt'
which contains strings. Is there a straightforward way to achieve
something like this:

data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n")

or

data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n")
?

I tried using grep
data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n")

but this doesn't work becausee grep only returns the rows for which
grep succeeded.

Thanks for any pointers

Claus

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
You received this message because you are subscribed to the Google Groups "R-help-archive" group.
To post to this group, send email to r-help-...@googlegroups.com.
To unsubscribe from this group, send email to r-help-archiv...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/r-help-archive?hl=en.

David Winsemius

unread,
Apr 29, 2010, 1:24:15 PM4/29/10
to Claus O'Rourke, r-h...@r-project.org

On Apr 29, 2010, at 1:17 PM, Claus O'Rourke wrote:

> Hi all,
>
> I'm writing a script to do some basic text analysis in R. Let's assume
> I have a data frame named data which contains a column named 'utt'
> which contains strings. Is there a straightforward way to achieve
> something like this:
>
> data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n")
>
> or
>
> data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n")
> ?
>
> I tried using grep
> data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n")
>
> but this doesn't work

> becausee grep only returns the rows for which
> grep succeeded.

?grepl # which is on the same help page as grep


>
> Thanks for any pointers
>
> Claus

David Winsemius, MD
West Hartford, CT

Henrique Dallazuanna

unread,
Apr 29, 2010, 1:24:43 PM4/29/10
to Claus O'Rourke, r-h...@r-project.org
Try with grepl:

data$ContainsThe <- ifelse(grepl("the",data$Utt),"y","n")

On Thu, Apr 29, 2010 at 2:17 PM, Claus O'Rourke <claus....@gmail.com>wrote:

> Hi all,
>
> I'm writing a script to do some basic text analysis in R. Let's assume
> I have a data frame named data which contains a column named 'utt'
> which contains strings. Is there a straightforward way to achieve
> something like this:
>
> data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n")
>
> or
>
> data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n")
> ?
>
> I tried using grep
> data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n")
>
> but this doesn't work becausee grep only returns the rows for which
> grep succeeded.
>
> Thanks for any pointers
>
> Claus
>
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

Claus O'Rourke

unread,
Apr 29, 2010, 2:07:39 PM4/29/10
to r-h...@r-project.org
Thanks. It works perfectly.
Reply all
Reply to author
Forward
0 new messages