How to stop formatMult from turning all my variables into factors

253 views
Skip to first unread message

Kyle Van den Bosch

unread,
Jan 13, 2016, 3:27:24 PM1/13/16
to unmarked
Hi group,

First time Unmarked user; first time poster. I want to analyze my data using colext, but in order to do that I need to get all the pieces in the right place. I have managed to get my data into a UnmarkedMultFrame through the formatMult function, but all of my variables a turned into factors. How do I stop this from happening? I checked the CSV file I read into R and on that file the correct variables are numeric. Any suggestions?

Thanks
Kyle  

Jeffrey Royle

unread,
Jan 13, 2016, 3:29:19 PM1/13/16
to unma...@googlegroups.com
show us the R commands you used and the result of str(unmarkedframe)



--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris SUtherland

unread,
Jan 13, 2016, 3:32:18 PM1/13/16
to unma...@googlegroups.com

Hi Kyle

 

Also, when you read in the .csv file, what does this look like?

 

head(data) #where data is the name of your dataframe

 

Thanks

Chris

--

Kyle Van den Bosch

unread,
Jan 13, 2016, 3:54:21 PM1/13/16
to unmarked
Thanks for the quick response. 
Rcommands&str().R
head().R

Jeffrey Royle

unread,
Jan 13, 2016, 4:39:09 PM1/13/16
to unma...@googlegroups.com
hi Kyle,
 most likely some of your data processing is producing a data frame or other data structure which gets coerced into characters by the unmarked frame formatting. 

For this object here:
bncmulti <- birdandcov[,c(2,1,3,33,6:32,4:5)]

check the data type of each column.  Coerce this to a matrix maybe.....

###TADA###
FISPumf <- formatMult(bncmulti)
> str(FISPumf)

--

Adam Smith

unread,
Jan 14, 2016, 9:32:16 AM1/14/16
to unmarked
Kyle,

In the current release version, formatMult corrupted covariates whenever a factor variable was present.  I think I fixed that problem recently, but that fix is not yet available from CRAN.  

Try the following (in a fresh R session) to overwrite the formatMult function with the new version until the updated package is released, and let me know if that gives you a better outcome...

Thanks,
Adam:

library(unmarked)
library(R.utils)

formatMult <- function(df.in)
{
  years <- sort(unique(df.in[[1]]))
  nY <- length(years)
  df.obs <- list()
  nsamp <- numeric()
  maxsamp <- max(table(df.in[[1]], df.in[[2]])) # the maximum samples/yr
  for(t in 1:nY){
    df.t <- df.in[df.in[[1]] == years[t],] # subset for current year
    df.t <- df.t[,-1] # remove year column
    df.t <- dateToObs(df.t)
    nsamp <- max(df.t$obsNum)
    if(nsamp < maxsamp) {
      newrows <- df.t[1:(maxsamp - nsamp), ] # just a placeholder
      newrows[,"obsNum"] <- ((nsamp + 1) : maxsamp)
      newrows[,3 : (ncol(df.t) - 1)] <- NA
      df.t <- rbind(df.t, newrows)
    }
    df.obs <- rbind(df.obs,cbind(year = years[t],df.t))
  }
  dfnm <- colnames(df.obs)
  nV <- length(dfnm) - 1  # last variable is obsNum
  
  ### Identify variables that are not factors
  fac <- sapply(df.obs[, 5:nV], is.factor)
  nonfac <- names(df.obs[, 5:nV])[!fac]
  
  # create y matrix using reshape
  expr <- substitute(recast(df.obs, var1 ~ year + obsNum + variable,
                            id.var = c(dfnm[2],"year","obsNum"),
                            measure.var = dfnm[4]),
                     list(var1 = as.name(dfnm[2])))
  y <- as.matrix(eval(expr)[,-1])
  
  # create obsdata with reshape
  # include date (3rd col) and other measured vars
  expr <- substitute(recast(df.obs, newvar ~ year + obsNum ~ variable,
                            id.var = c(dfnm[2],"year","obsNum"),
                            measure.var = dfnm[c(3,5:nV)]),
                     list(newvar=as.name(dfnm[2])))
  obsvars <- eval(expr)
  
  rownames(y) <- dimnames(obsvars)[[1]]
  colnames(y) <- dimnames(obsvars)[[2]]
  y <- as.matrix(y)
  
  obsvars.list <- arrToList(obsvars)
  
  # Return any non-factors to the correct mode
  if (length(nonfac) >= 1) {
    modes <- apply(df.obs[, nonfac], 2, mode)
    for (i in 1:length(nonfac)) {
      mode(obsvars.list[[nonfac[i]]]) <- modes[i]
    }
  }
  
  obsvars.list <- lapply(obsvars.list, function(x) as.vector(t(x)))
  obsvars.df <- as.data.frame(obsvars.list)
  
  ## check for siteCovs
  obsNum <- ncol(y)
  M <- nrow(y)
  site.inds <- matrix(1:(M*obsNum), M, obsNum, byrow = TRUE)
  siteCovs <- sapply(obsvars.df, function(x) {
    obsmat <- matrix(x, M, obsNum, byrow = TRUE)
    l.u <- apply(obsmat, 1, function(y) {
      row.u <- unique(y)
      length(row.u[!is.na(row.u)])
    })
    ## if there are 0 or 1 unique vals per row, we have a sitecov
    if(all(l.u %in% 0:1)) {
      u <- apply(obsmat, 1, function(y) {
        row.u <- unique(y)
        ## only remove NAs if there are some non-NAs.
        if(!all(is.na(row.u)))
          row.u <- row.u[!is.na(row.u)]
        row.u
      })
      u
    }
  })
  siteCovs <- as.data.frame(siteCovs[!sapply(siteCovs, is.null)])
  if(nrow(siteCovs) == 0) siteCovs <- NULL
  
  ## only check non-sitecovs
  obsvars.df2 <- as.data.frame(obsvars.df[, !(names(obsvars.df) %in%
                                                names(siteCovs))])
  names(obsvars.df2) <- names(obsvars.df)[!(names(obsvars.df) %in%
                                              names(siteCovs))]
  
  yearlySiteCovs <- sapply(obsvars.df2, function(x) {
    obsmat <- matrix(x, M*nY, obsNum/nY, byrow = TRUE)
    l.u <- apply(obsmat, 1, function(y) {
      row.u <- unique(y)
      length(row.u[!is.na(row.u)])
    })
    ## if there are 0 or 1 unique vals per row, we have a sitecov
    if(all(l.u %in% 0:1)) {
      u <- apply(obsmat, 1, function(y) {
        row.u <- unique(y)
        ## only remove NAs if there are some non-NAs.
        if(!all(is.na(row.u)))
          row.u <- row.u[!is.na(row.u)]
        row.u
      })
      u
    }
  })
  yearlySiteCovs <- as.data.frame(yearlySiteCovs[!sapply(yearlySiteCovs,
                                                         is.null)])
  if(nrow(yearlySiteCovs) == 0) yearlySiteCovs <- NULL
  
  umf <- unmarkedMultFrame(y = y, siteCovs = siteCovs,
                           obsCovs = obsvars.df, yearlySiteCovs =
                             yearlySiteCovs,
                           numPrimary = nY)
  return(umf)
}

reassignInPackage("formatMult", "unmarked", formatMult)

Jeffrey Royle

unread,
Jan 14, 2016, 9:33:37 AM1/14/16
to unma...@googlegroups.com
Adam, thanks for your improvement of this code!
We will probably put a new release of unmarked out in March. Right now we are hacking on "distsampOpen" which is a Dail-Madsen version of hierarchical distance sampling. We hope to finish this before the new release of unmarked.

regards
andy


--

Kyle Van den Bosch

unread,
Jan 14, 2016, 4:28:05 PM1/14/16
to unmarked
Thank you for the help. Now I have another problem. Colext is returning the following error message when I include yearlysiteCovs for col. and ext. in the model: 

 Error in optim(starts, nll, method = method, hessian = getHessian, ...) : 
  initial value in 'vmmin' is not finite

I read elsewhere in the group that this could be related to NA's, so to be prudent I changed the NA's to values, but I still got the same error message.

Thanks in advance,
Kyle

Adam Smith

unread,
Jan 14, 2016, 4:39:49 PM1/14/16
to unma...@googlegroups.com
Kyle,

In looking more closely, this surely has something to do with all the yearlySiteCovs also appearing in the obsCovs.  I'm guessing the multFormat function is having issues guessing which category they fall into...  I'll dig deeper tomorrow.

Best,
Adam

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/hW-qq5ByJC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.

Adam Smith

unread,
Jan 15, 2016, 1:16:17 PM1/15/16
to unmarked
Kyle,

formatMult was erroneously retaining siteCovs and yearlySiteCovs as obsCovs, resulting in duplicate variables.  I've fixed it but it won't show up until the next release.  In the meantime, you can substitute the updated function at the end of this email.  

Using the new function, you'll still get the same optimization error as before, but now it's due to the large amount of missing data.  Replacing the NAs with zero (only as an example; this is not advised) results in a model solution.  For example, using the data you provided off-list (thanks!):

### RUN THE FUNCTION REASSIGNMENT CODE BELOW FIRST ###
bncmulti <- read.csv("path/to/bncmulti.csv")
bnc <- bncmulti[,-1]
FISPumf <- formatMult(bnc)

# Still produces optimization error
m2 <- colext(~lat,~shrub_cover,~shrub_cover,~Obs,FISPumf)

# Replace NAs; MLE solution obtained
yearlySiteCovs(FISPumf)[is.na(yearlySiteCovs(FISPumf))] <- 0 
m2a <- colext(~lat,~shrub_cover,~shrub_cover,~Obs,FISPumf)

Let me know if you get this running without error...

Best,
Adam

### REASSIGN formatMult FUNCTION IN unmarked
library(unmarked)
library(R.utils)

formatMult <- function(df.in)
{
  years <- sort(unique(df.in[[1]]))
  nY <- length(years)
  df.obs <- list()
  nsamp <- numeric()
  maxsamp <- max(table(df.in[[1]], df.in[[2]])) # the maximum samples/yr
  for(t in 1:nY){
    df.t <- df.in[df.in[[1]] == years[t],] # subset for current year
    df.t <- df.t[,-1] # remove year column
    df.t <- unmarked:::dateToObs(df.t)
  obsvars.list <- unmarked:::arrToList(obsvars)
  # Extract siteCovs and yearlySiteCovs from obsvars
  finalobsvars.df <- as.data.frame(obsvars.df[, !(names(obsvars.df) %in%
                                                    c(names(siteCovs),
                                                      names(yearlySiteCovs)))])
  names(finalobsvars.df) <- names(obsvars.df)[!(names(obsvars.df) %in% 
                                                  c(names(siteCovs),
                                                    names(yearlySiteCovs)))]
  
  umf <- unmarkedMultFrame(y = y, siteCovs = siteCovs,
                           obsCovs = finalobsvars.df, yearlySiteCovs =

Kyle Van den Bosch

unread,
Jan 15, 2016, 3:26:14 PM1/15/16
to unmarked
Hi Adam,

I was given the following error where I attempted to put the data into formatMult:


> FISPumf <- formatMult(bnc)
Error in unmarked:::arrToList(obsvars) : object 'obsvars' not found

Thanks,
Kyle

Kyle Van den Bosch

unread,
Jan 15, 2016, 5:23:08 PM1/15/16
to unmarked
Of course you were right Adam. I copied the incomplete code from this topic thread, instead of using the correct one from the email.  

Thank you for all of your help with this.

Rachel Moseley

unread,
Jul 25, 2016, 9:52:02 PM7/25/16
to unmarked
Hi Adam,
I am a new user to unmarked and I seem to be having the same issues Kyle had using formatMult. All variables are being converted to factors and currently two of my variables appear both as yearlySitecovs and as obsCovs. If this same issue has yet to be fixed should I be using the same code you supplied to Kyle, or is something else wrong? I am attaching the R commands, str(), and head() outputs.

Once this is all formatted properly I want to use the colext function to do a multiple season analysis on one year of data that is broken up in to 4 seasons. I monitored a total of 45 sites for 7 nights consecutively. If I understand correctly, then I have 4 primary sampling periods and 7 secondary sampling periods. I will have 3 years of data at  the end of this project which is why I want to understand multiple season models now. Any advice is greatly appreciated. 

Cheers,
Rachel
Rcommands&str&header.R

Adam Smith

unread,
Jul 26, 2016, 12:28:12 AM7/26/16
to unma...@googlegroups.com
Hi Rachel,

You are correct that the change is still only in the development version. So, yes, I'd recommend trying the function reassignment code you referenced and see if that fixes the variable conversion issues. If not, I'll take a look when I get back into the office next week. Others may be better able to help you with the dynamic occupancy models. 

Best,
Adam
--

Kery Marc

unread,
Jul 26, 2016, 5:51:12 PM7/26/16
to unma...@googlegroups.com
Dear Rachel,

you write this: "Once this is all formatted properly I want to use the colext function to do a multiple season analysis on one year of data that is broken up in to 4 seasons. I monitored a total of 45 sites for 7 nights consecutively. If I understand correctly, then I have 4 primary sampling periods and 7 secondary sampling periods. I will have 3 years of data at  the end of this project which is why I want to understand multiple season models now. Any advice is greatly appreciated."

Sounds reasoable. You have a total of 28 nights then, right (4 seasons times 7 nights) ? It sounds like a good idea to "practice" on your data way before you collected them all (i.e., for all 3 years).

Best regards --  Marc
 

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Rachel Moseley [moseley...@gmail.com]
Sent: 26 July 2016 03:15
To: unmarked
Subject: Re: [unmarked] Re: How to stop formatMult from turning all my variables into factors

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.

Rachel Moseley

unread,
Jul 27, 2016, 4:11:08 PM7/27/16
to unmarked
Hi Marc,
Thanks for the quick reply. That's correct, I have a total of 28 nights.
Cheers,
Rachel

Rachel Moseley

unread,
Jul 27, 2016, 5:22:30 PM7/27/16
to unmarked
Hi Adam,
Thanks for the quick reply. I tried using the code you supplied but it produces an error "Could not find function DateToObs". Any idea where the error is coming from? I am attaching the output for what I pasted in to a new R session.
Cheers,
Rachel
To unsubscribe from this group and all its topics, send an email to unmarked+unsubscribe@googlegroups.com.
formatMult_reassign.R

Adam Smith

unread,
Jul 27, 2016, 10:45:01 PM7/27/16
to unmarked
Hi Rachel. I think this is easily overcome but I can't get to it until early next week. I'll post an update when I get a chance to look at it.

Best,
Adam

Adam Smith

unread,
Aug 1, 2016, 1:42:07 PM8/1/16
to unma...@googlegroups.com
Rachel,

It looks like you're using a version of the formatMult function different from what I included at the end of the thread with Kyle.  The difference is subtle but causes the error you observed most recently.  Use the reassignment code you included in your initial post (included here for completeness), and then try again:
Adam
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/hW-qq5ByJC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.

Rachel Moseley

unread,
Aug 1, 2016, 4:58:09 PM8/1/16
to unmarked
Hi Adam,
Thanks for catching that. It works now.
Cheers,
Rachel
Adam
To unsubscribe from this group and all its topics, send an email to unmarked+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rachel Moseley

unread,
Aug 1, 2016, 8:19:30 PM8/1/16
to unmarked
I have one new issue with the formatMult reassignment. After looking more closely at the data it appears that formatMult is still turning two of my variables into factors ("jdate" and "night"). I need to standardize the obsCovs which includes Julian date. I'm not interested in night, so I can just remove that column. I only realized the issue when I went to scale my obscovs and got this error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric. From what I've read converting a factor column to numeric is tricky. What would you suggest? I'm including the R commands here.
Cheers,
Rachel
obsCovs factor error.R

Adam Smith

unread,
Aug 2, 2016, 9:54:05 AM8/2/16
to unmarked
Rachel, 

Would you be willing to send the data off-list so I can have a look?  

Thanks,
Adam

Adam Smith

unread,
Aug 5, 2016, 10:31:20 AM8/5/16
to unmarked
Rachel,

Thanks for sending the data, and thanks for finding a "bug" in the `formatMult` code.  The short version is that the Julian date/visit chronology variable was getting converted into a factor when other factor variables were present in the data.frame input to `formatMult`.  The issue has been fixed and, I suspect, will be incorporated into the next CRAN release.  For now, simply source the attached file after loading the unmarked package, and all should be well.

For your situation, this would look like:

library(unmarked)
source("path/to/update_formatMult.R")
dat <- read.csv("path/to/HHBObs_yr1.csv")
Bat <- formatMult(dat)
str(Bat)

Best,
Adam
update_formatMult.R

Rachel Moseley

unread,
Aug 15, 2016, 12:56:19 PM8/15/16
to unmarked
Thanks, Adam!
Reply all
Reply to author
Forward
0 new messages