How to interpret / validate cox PH model

已查看 27 次

跳至第一个未读帖子

Michael Smith

未读，

2021年6月19日 11:40:302021/6/19

收件人 R-inla discussion group

This is a follow-up to a previous post - hope I'm not repeating myself...

I've been trying to model deforestation with survival analysis. I have a raster map where unaffected areas are zero and deforested pixels have values 1-20 depending on the year of deforestation (2001-20). I take 10km square grids as 'individuals' and the 'event' is where the number of deforested pixels reaches a certain level. The other grid squares are 'survivors' (i.e. there is no such thing as 'lost to follow-up'). I have 4 covariates.

Adding a spatial term to the regression seems to make a big difference - not surprising, as forest tends to be cleared near to already-cleared areas. What I'd like to do next is to build a model from the 2000-10 data and use that to see how well it predicts deforestation from 2010-20. I know Cox models only output hazard ratios so this is a tricky subject.

I know the rms package has some tools to use with the output from coxph models in the 'survival' package...what can I do with an INLA object? Can I get a concordance statistic somehow?

My code was:

# read in 10km grid squares with covariates, count of 'deforested' pixels and mean 'deforested' pixel value (= year of deforestation)
grid5 <- readOGR(paste(getwd(), "/grid_10km_with_cost_dem_popn_aspect_globcover_PNG.shp", sep = ""))
# convert from m to km
grid5 <- spTransform(grid5, "+proj=utm +zone=54 +south +ellps=WGS72 +towgs84=0,0,1.9,0,0,0.814,-0.38 +units=km +no_defs")

# 'deforested' grid squares defined as those with >10% of the maximum number of 'deforestation' pixels found across all the grid
maxval <- max(grid5@data$def_count)/10
# status: 1 = event (deforestation), 0 = censored ('relativelyintact')
grid5@data$status <- ifelse(grid5@data$def_count >= maxval, 1, 0)
# set response variable (mean year of deforestation activity and censoring status)
surv <- inla.surv(grid5@data$def_mean, grid5@data$status)
# set up unique ID for each grid
grid5@data$id2 <- 1:nrow(grid5@data)
# use grid IDs and coordinates to set up INLA lattice graph
lattice_temp <- poly2nb(grid5, row.names = grid5@data$id2)
nb2INLA(paste(getwd(), "/lattice.graph", sep = ""), lattice_temp)
Lattice.adj <- paste(getwd(), "/lattice.graph", sep = "")
# set up intercept
intercept1 = rep(1, nrow(grid5@data))
# INLA requires a dataframe as input
grid10 <- grid5@data
# z-scale the covariates
grid10[c("elev_mean", "popn_mean", "cost_mean")] <- scale(grid10[c("elev_mean", "popn_mean", "cost_mean")])
# globcov_ma is a land cover category variable - must be a factor
grid10$globcov_ma <- as.factor(grid10$globcov_ma)

# formula
coxinlaZ <- inla.coxph(surv ~ -1 + intercept1 + cost_mean + elev_mean + popn_mean + globcov_ma +
                         f(id2, model = "bym", graph = Lattice.adj, scale.model = TRUE),
                       list(surv = surv, cost_mean=grid10$cost_mean,     elev_mean=grid10$elev_mean, popn_mean=grid10$popn_mean, popn_mean=grid10$popn_mean, globcov_ma=grid10$globcov_ma, id2=grid10$id2, intercept1 = intercept1))
rzPNGcov3 <- inla(coxinlaZ$formula,
             family = coxinlaZ$family,
             data=c(as.list(coxinlaZ$data), coxinlaZ$data.list),
             E = coxinlaZ$E,
             control.compute = list(mlik = T))

回复全部

回复作者

0 个新帖子