James, Cassandra

Sep 28, 2012, 12:44:44 AM9/28/12

Hi All…


I am having a problem with matching strings. I have a master list of 1200 species and I am trying to subset a large number of csv data files I have downloaded at the family level into individual species csv files. I have found that my code general works OK but for some unknown reason simply won’t find certain species (but will find other ones in the same file!). I have checked all the details on the names, lengths etc and I can see no apparent differences in the strings. I have re-imported the data, changed the format etc all to no avail. If I manually put the trouble species names into the code – it does find them. Anyone any ideas or  does anyone know a better way of doing this?


for (fam in 1:length(families)) {


     =, '/',families[fam],sep='')))


        for (sp in 1:length(species)) {


                   if(species[sp] %in% ($Matched.Scientific.Name)) {

                   plantsp <- ([which(as.character($Matched.Scientific.Name) == as.character(species[sp])),])

                   write.csv(plantsp,paste("C:/Users/jc246980/Documents/Freshwater refugia project/ALA data/Plants/Plant_species/",gsub(' ','_',species[sp]),".csv", sep = ''), row.names = F )

                   } else {plantsp=NULL}





James, Cassandra

Sep 28, 2012, 1:56:31 AM9/28/12

Hi, Typically I solved the problem the moment I pushed it to the list. It was a problem with embedded space and nonprinting characters (Unicode). A combination of CLEAN, TRIM and SUBSTITUTE in EXCEL solved the issue. I am sure there must be a way of sorting this out in R! Cassie

Phillips, Ben

Sep 28, 2012, 2:30:41 AM9/28/12
Random thought.  Sometimes, depending on the data source, you can get non-standard characters that look like standard characters.

Good idea to enforce all text as ASCII before importing into R.  There are a variety of text editors that can do this and probably something in R that can do it too.


