Problem with matching strings

2 views
Skip to first unread message

James, Cassandra

unread,
Sep 28, 2012, 12:44:44 AM9/28/12
to tropi...@googlegroups.com

Hi All…

 

I am having a problem with matching strings. I have a master list of 1200 species and I am trying to subset a large number of csv data files I have downloaded at the family level into individual species csv files. I have found that my code general works OK but for some unknown reason simply won’t find certain species (but will find other ones in the same file!). I have checked all the details on the names, lengths etc and I can see no apparent differences in the strings. I have re-imported the data, changed the format etc all to no avail. If I manually put the trouble species names into the code – it does find them. Anyone any ideas or  does anyone know a better way of doing this?

 

Any help appreciated!

 

Cassie

 

 

 

 

for (fam in 1:length(families)) {

 

              fam.data = as.data.frame(read.csv(paste(ala.dir, '/',families[fam],sep='')))

             

        for (sp in 1:length(species)) {

 

                   if(species[sp] %in% (fam.data$Matched.Scientific.Name)) {

                   plantsp <- (fam.data[which(as.character(fam.data$Matched.Scientific.Name) == as.character(species[sp])),])

                   write.csv(plantsp,paste("C:/Users/jc246980/Documents/Freshwater refugia project/ALA data/Plants/Plant_species/",gsub(' ','_',species[sp]),".csv", sep = ''), row.names = F )

                   } else {plantsp=NULL}

                  

        }

}

 

Dr Cassandra James


Centre for Tropical Biodiversity & Climate Change Research

School of Marine and Tropical Biology

James Cook University

Townsville QLD 4811

 

Phone: | Mobile: 0429 380 953|  cassand...@jcu.edu.au

Address: ATSIP Bld 145 James Cook Drive, James Cook University Douglas Campus Qld 4811,

 

James, Cassandra

unread,
Sep 28, 2012, 1:56:31 AM9/28/12
to tropi...@googlegroups.com

Hi, Typically I solved the problem the moment I pushed it to the list. It was a problem with embedded space and nonprinting characters (Unicode). A combination of CLEAN, TRIM and SUBSTITUTE in EXCEL solved the issue. I am sure there must be a way of sorting this out in R! Cassie

Phillips, Ben

unread,
Sep 28, 2012, 2:30:41 AM9/28/12
to <tropical-r@googlegroups.com>
Random thought.  Sometimes, depending on the data source, you can get non-standard characters that look like standard characters.

Good idea to enforce all text as ASCII before importing into R.  There are a variety of text editors that can do this and probably something in R that can do it too.

B


**********************************************************
Dr Ben Phillips
ARC QEII Research Fellow
Centre for Tropical Biodiversity and Climate Change

School of Marine and Tropical Biology
James Cook University, Australia
+61 7 4781 4557
**********************************************************

-- 
An R group for questions, tips and tricks relevant to spatial ecology and climate change.
All R questions welcome.
--- 
You received this message because you are subscribed to the Google Groups "Tropical R" group.
To post to this group, send an email to tropi...@googlegroups.com.
To unsubscribe from this group, send email to tropical-r+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.    

Reply all
Reply to author
Forward
0 new messages