[R] strings

2 views
Skip to first unread message

Robin Mjelle

unread,
May 23, 2013, 2:04:34 PM5/23/13
to r-h...@r-project.org
I have two files containing words. I want to print the are in file 1 but
NOT in file 2.
How do I go about?

file 1:
ABL1
1 ALKBH1
2 ALKBH2
3 ALKBH3
4 ANKRD17
5 APEX1
6 APEX2
7 APTX
8 ASF1A
9 ASTE1
10 ATM
11 ATR
12 ATRIP
13 ATRX
14 ATXN3
15 BCCIP
16 BLM
17 BRCA1
18 BRCA2


file2:
ALKBH2
1 ALKBH3
2 APEX1
3 APEX2
4 APLF
5 APTX
6 ATM
7 ATR
8 ATRIP
9 BLM
10 BRCA1
11 BRCA2
12 BRIP1
13 BTBD12
14 CCNH

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

unread,
May 23, 2013, 3:05:05 PM5/23/13
to R help
Hi,
Try:

dat1<- structure(list(V2 = c("ALKBH1", "ALKBH2", "ALKBH3", "ANKRD17",
"APEX1", "APEX2", "APTX", "ASF1A", "ASTE1", "ATM", "ATR", "ATRIP",
"ATRX", "ATXN3", "BCCIP", "BLM", "BRCA1", "BRCA2")), .Names = "V2", class = "data.frame", row.names = c(NA,
18L))


dat2<- structure(list(V2 = c("ALKBH3", "APEX1", "APEX2", "APLF", "APTX",
"ATM", "ATR", "ATRIP", "BLM", "BRCA1", "BRCA2", "BRIP1", "BTBD12",
"CCNH")), .Names = "V2", class = "data.frame", row.names = c(NA,
14L))


library(sqldf)
sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')
#       V2
#1  ALKBH1
#2  ALKBH2
#3 ANKRD17
#4   ASF1A
#5   ASTE1
#6    ATRX
#7   ATXN3
#8   BCCIP



#or
dat2$id<- 1
res<-merge(dat1,dat2,all=TRUE)
subset(res,is.na(res$id))[1]
#        V2
#1   ALKBH1
#2   ALKBH2
#4  ANKRD17
#9    ASF1A
#10   ASTE1
#14    ATRX
#15   ATXN3
#16   BCCIP
A.K.

MacQueen, Don

unread,
May 23, 2013, 3:08:45 PM5/23/13
to Robin Mjelle, r-h...@r-project.org
See the
setdiff()
function

--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

William Dunlap

unread,
May 23, 2013, 3:18:29 PM5/23/13
to arun, R help
You recommended
> library(sqldf)
> sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')

Using nothing but the core R packages setdiff() returns the difference between two sets.
> setdiff(dat1$V2, dat2$V2)
[1] "ALKBH1" "ALKBH2" "ANKRD17" "ASF1A" "ASTE1" "ATRX" "ATXN3" "BCCIP"
If there are possibly duplicates in dat1$V2, so it is not a "set", and you want the duplicates
in the result, use
> dat1$V2[ !is.element(dat1$V2, dat2$V2) ]
[1] "ALKBH1" "ALKBH2" "ANKRD17" "ASF1A" "ASTE1" "ATRX" "ATXN3" "BCCIP"

> a <- c(1, 2, 3, 2, 1, 4)
> b <- c(1, 3)
> setdiff(a, b)
[1] 2 4
> a[ !is.element(a, b) ]
[1] 2 2 4


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

arun

unread,
May 23, 2013, 3:27:10 PM5/23/13
to William Dunlap, R help
#or
 dat1$V2[is.na(match(dat1$V2,dat2$V2))]
#[1] "ALKBH1"  "ALKBH2"  "ANKRD17" "ASF1A"   "ASTE1"   "ATRX"    "ATXN3" 
#[8] "BCCIP" 
 a[is.na(match(a,b))]
#[1] 2 2 4
A.K.
Reply all
Reply to author
Forward
0 new messages