ddf.gof() producing P = NaN

244 views
Skip to first unread message

Evan Curtis

unread,
Nov 29, 2023, 5:23:04 PM11/29/23
to distance-sampling
Hi Listfolk,

I have MRDS models with binned data and I've noticed some strange behaviour when using ddf.gof(). All models right down to the simplest produce p-value = NaN when I used the ddf.gof call.

I don't use QQ plots because my data are binned, but if I check the output using qqplot.ddf() it appears to produce a small Cramer von p-value. For instance my best model $CvM$p = 7.2e-07.

Does this seem odd? 
Why wouldn't ddf.gof() produce a p-value?
How sensible is it to proceed with my best AIC model (which looks good on all other fronts) regardless?

Thanks,
Evan

Eric Rexstad

unread,
Nov 30, 2023, 3:51:55 AM11/30/23
to distance-sampling, Evan Curtis
Evan

The goodness of fit results you describe are unusual and unexpected. With binned data, Cramer von Mises tests are not appropriate, so that's not a useful route to pursue.

Regarding NaN values coming from the chi-square test is a puzzle.  I examined our example data set (golftees) that ships with the mrds​ package, converting the exact distance to bins with the trial​ configuration, without mishap:

data(book.tee.data)
detections <- book.tee.data$book.tee.dataframe # detection information
region <- book.tee.data$book.tee.region # region info
samples <- book.tee.data$book.tee.samples # transect info
obs <- book.tee.data$book.tee.obs # links detections to transects and regions
detections$sex <- as.factor(detections$sex)
detections$exposure <- as.factor(detections$exposure)

mybreaks <- c(0,.25,.5,.75,1,1.5,2,2.5,3,4)
tmp <- as.numeric(cut(detections$distance, breaks=mybreaks))
distb <- mybreaks
diste <- mybreaks[-1]

detections$distbegin <- distb[tmp]
detections$distend <- diste[tmp]

fi.mr.dist <- ddf(method='trial.fi',
                  mrmodel=~glm(link='logit',formula=~distance),
                  data=detections,
                  meta.data=list(width=4, binned=TRUE,
                                 breaks=mybreaks))
ddf.gof(fi.mr.dist)


Examine the component output of the chi-square GOF test. Are there problems in the "distance" or "mark recapture" component of the GOF tests? That might narrow down the suspects.

From: 'Evan Curtis' via distance-sampling <distance...@googlegroups.com>
Sent: 29 November 2023 22:23
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] ddf.gof() producing P = NaN
 
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/944c1a2d-377b-4b07-99e2-0e4f6d918eb2n%40googlegroups.com.

Evan Curtis

unread,
Nov 30, 2023, 7:42:03 PM11/30/23
to distance-sampling
Hi Eric,

Thanks again! It looks like it's having problems calculating a positive value for degrees of freedom. This would suggest that the number of parameters in the model is larger than the sample size. But my dht() output says there is between 27 and 29 degrees of freedom witt a sample size n = 1324. Below is the output from ddf.gof() run on both MRDS and MCDS models.

> MRDS.6ii7=ddf( + dsmodel=~mcds(key="hn",formula=~side+obsname), + mrmodel=~glm(~obsname * distance + size:distance), + data=survey.data, + method="io", + meta.data=list(binned=TRUE,point=FALSE,width=300,breaks=c(0,50,100,200,300))) > ddf.gof(MRDS.6ii7) Goodness of fit results for ddf object Chi-square tests Distance sampling component: [0,50] (50,100] (100,200] (200,300] Total Observed 393.000 308.000 467.000 156.000 1324.00 Expected 391.184 337.476 426.142 169.198 1324.00 Chisquare 0.008 2.575 3.917 1.029 7.53 No degrees of freedom for test Mark-recapture component: Capture History 10 [0,50] (50,100] (100,200] (200,300] Total Observed 131 118 160 81 490 Expected 132 109 178 70 489 Chisquare 0 1 2 2 4 Capture History 01 [0,50] (50,100] (100,200] (200,300] Total Observed 136 112 179 54 481 Expected 134 112 178 58 482 Chisquare 0 0 0 0 0 Capture History 11 [0,50] (50,100] (100,200] (200,300] Total Observed 126 78 128 21 353 Expected 127 87 112 28 353 Chisquare 0 1 2 2 5 Total chi-square = 17.214 P = NaN with -32 degrees of freedom Warning message: In pchisq(chisq.1 + chisq.2, 3 * nc - length(model$par) - 1) : NaNs produced

And if I only run the DS part of the model I get no dof:
 
> MCDS.25 <- ddf(data=front.dat, dsmodel = ~mcds(key="hn", formula = ~side+obsname), + meta.data=list(binned=TRUE,point=FALSE,width=300,breaks=c(0,50,100,200,300))) > ddf.gof(MCDS.25) Goodness of fit results for ddf object Chi-square tests [0,50] (50,100] (100,200] (200,300] Total Observed 257.00 196.000 288.000 102.000 843.000 Expected 256.99 215.311 260.106 110.593 843.000 Chisquare 0.00 1.732 2.991 0.668 5.391 No degrees of freedom for test

Evan Curtis

unread,
Nov 30, 2023, 7:59:02 PM11/30/23
to distance-sampling
P.S. How do you copy R output into here so that it keeps it's format when posted? lol

Eric Rexstad

unread,
Dec 1, 2023, 2:55:24 AM12/1/23
to distance-sampling, Evan Curtis
Evan

Degrees of freedom for the chi-square test is not equivalent to the number of detections.

Degrees of freedom for such a test is described in slide 6 from the introductory distance sampling workshop:



In your situation, "u" (the number of distance categories) is 4. I can't guess the number of parameters in your mrmodel where you have two categorical variables (multiplied?) plus an interaction of size (discrete or continuous?) and distance. I'm  certain that model has buckets more parameters than 4. It is true that ddf.gof​ should have not calculated a negative number of degrees of freedom, and simply reported that a test could not be performed.

However, I have some doubt about that mrmodel​ you have tried to fit. I'm skeptical that the parameters in that model are estimable. It seems an odd model, not something I've come across previously (an interaction without a main effect). Have you looked at the parameter estimates coming from the ddf​ output? I think the ddf.gof()​ is trying to warn you of a possibly deeper problem; non-convergence. I suggest a simpler model.

P.S. I don't do anything special to get well behaved cut-and-paste from RStudio. Note I'm taking code from the editor window, rather than the console. Cut and paste from the console is messier. My email client asks when I attempt to paste something from the console if I want to paste with or without formatting. Maybe that's the secret.

From: 'Evan Curtis' via distance-sampling <distance...@googlegroups.com>
Sent: 01 December 2023 00:42
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] ddf.gof() producing P = NaN
 
From: 'Evan Curtis' via distance-sampling <distance...@googlegroups.com>
Sent: 01 December 2023 00:42
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] ddf.gof() producing P = NaN
 
Message has been deleted

Eric Rexstad

unread,
Dec 4, 2023, 3:17:22 AM12/4/23
to distance-sampling, Evan Curtis
Evan

Thanks for the model summary (reproduced below) and re-training of how to specify model formulas.

> summary(MRDS.6ii7)
Summary for io.fi object
Number of observations : 1324
Number seen by primary : 843
Number seen by secondary : 834
Number seen by both : 353
AIC : 2705.31
Conditional detection function parameters:
                 estimate         se
(Intercept)        -1.0796625270 0.277260949
obsnameBP           1.1046744569 0.361828611
obsnameBW           0.5433139316 0.394464395
obsnameCF           1.1961084816 0.382167121
obsnameCS           1.1934733554 0.396189048
obsnameDH           0.7954464197 0.343565299
obsnameEC           1.1975425396 0.395574213
obsnameMG           0.5859840295 0.362524121
obsnameMH           2.1259248539 0.409690893
obsnamePL           1.5807107511 0.425884100
obsnamePOB          0.5848155343 0.401523354
obsnameRL           2.5931041668 0.495230694
obsnameRT           1.7839252838 0.421544945
obsnameSD           1.4995302978 0.439472284
distance           -0.0036643313 0.002600023
obsnameBP:distance -0.0099988195 0.003096288
obsnameBW:distance  0.0078620272 0.003474921
obsnameCF:distance -0.0067997640 0.003229027
obsnameCS:distance -0.0007569504 0.003252560
obsnameDH:distance -0.0017379043 0.002718869
obsnameEC:distance -0.0030372291 0.003149064
obsnameMG:distance  0.0046375639 0.003172420
obsnameMH:distance -0.0114362985 0.003412340
obsnamePL:distance -0.0102762333 0.003554344
obsnamePOB:distance 0.0068440041 0.003242920
obsnameRL:distance -0.0089213506 0.003706149
obsnameRT:distance -0.0079749867 0.003662896
obsnameSD:distance  0.0011171377 0.004304027
distance:size       0.0008886445 0.000333803

Estimate SE CV
Average primary p(0) 0.5055804 0.02953350 0.05841505
Average secondary p(0) 0.5061558 0.02857945 0.05646373
Average combined p(0) 0.7542005 0.02799493 0.03711869

Summary for ds object
Number of observations : 1324
Distance range : 0 - 300
AIC : 3466.773
Detection function: Half-normal key function
Detection function parameters
Scale coefficient(s):
              estimate      se
(Intercept) 5.18683812 0.1369847
sideR      -0.33191261 0.1158185
obsnameBP  -0.75280207 0.2001663
obsnameBW   0.23603704 0.1826911
obsnameCF  -0.47405823 0.1781684
obsnameDH  -0.26030426 0.1699321
obsnameEC   0.32659673 0.2128797
obsnameMG  -0.02400511 0.1146002
obsnameMH  -0.29883388 0.1445970
obsnamePL  -0.03879631 0.1367317
obsnamePOB  0.31083754 0.2505650
obsnameRL  -0.12420240 0.1382154
obsnameRT  -0.42317349 0.1877019
obsnameSD  -0.06626935 0.1670976
          Estimate        SE     CV
Average p 0.550091 0.01437652 0.02613479


There are 29 estimated parameters in your mrmodel​ and 14 estimated parameters in your dsmodel​. That doesn't explain how ddf.gof​ imagined there were -32 degrees of freedom in the chi-square test, but it does demonstrate that there are more parameters than bins in your distance data. The software should have reported no test possible in all instances. I'll create an issue about this on our Github repository.


From: 'Evan Curtis' via distance-sampling <distance...@googlegroups.com>
Sent: 03 December 2023 23:27

To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] ddf.gof() producing P = NaN
 
Hi Eric,

The main effect terms are obsname and distance. I've specified this in a sort of short hand way (obsname * distance is the equivalent of obsname + distance + obsname:distance).
There's nothing obvious in the summary that suggests non-convergence. See ddf.gof() for the DS component only  below.

Here is the ddf summary:

> MRDS.6ii7=ddf( + dsmodel=~mcds(key="hn",formula=~side+obsname), + mrmodel=~glm(~obsname * distance + size:distance), + data=survey.data, + method="io", + meta.data=list(binned=TRUE,point=FALSE,width=300,breaks=c(0,50,100,200,300))) > summary(MRDS.6ii7) Summary for io.fi object Number of observations : 1324 Number seen by primary : 843 Number seen by secondary : 834 Number seen by both : 353 AIC : 2705.31 Conditional detection function parameters: estimate se (Intercept) -1.0796625270 0.277260949 obsnameBP 1.1046744569 0.361828611 obsnameBW 0.5433139316 0.394464395 obsnameCF 1.1961084816 0.382167121 obsnameCS 1.1934733554 0.396189048 obsnameDH 0.7954464197 0.343565299 obsnameEC 1.1975425396 0.395574213 obsnameMG 0.5859840295 0.362524121 obsnameMH 2.1259248539 0.409690893 obsnamePL 1.5807107511 0.425884100 obsnamePOB 0.5848155343 0.401523354 obsnameRL 2.5931041668 0.495230694 obsnameRT 1.7839252838 0.421544945 obsnameSD 1.4995302978 0.439472284 distance -0.0036643313 0.002600023 obsnameBP:distance -0.0099988195 0.003096288 obsnameBW:distance 0.0078620272 0.003474921 obsnameCF:distance -0.0067997640 0.003229027 obsnameCS:distance -0.0007569504 0.003252560 obsnameDH:distance -0.0017379043 0.002718869 obsnameEC:distance -0.0030372291 0.003149064 obsnameMG:distance 0.0046375639 0.003172420 obsnameMH:distance -0.0114362985 0.003412340 obsnamePL:distance -0.0102762333 0.003554344 obsnamePOB:distance 0.0068440041 0.003242920 obsnameRL:distance -0.0089213506 0.003706149 obsnameRT:distance -0.0079749867 0.003662896 obsnameSD:distance 0.0011171377 0.004304027 distance:size 0.0008886445 0.000333803 Estimate SE CV Average primary p(0) 0.5055804 0.02953350 0.05841505 Average secondary p(0) 0.5061558 0.02857945 0.05646373 Average combined p(0) 0.7542005 0.02799493 0.03711869 Summary for ds object Number of observations : 1324 Distance range : 0 - 300 AIC : 3466.773 Detection function: Half-normal key function Detection function parameters Scale coefficient(s): estimate se (Intercept) 5.18683812 0.1369847 sideR -0.33191261 0.1158185 obsnameBP -0.75280207 0.2001663 obsnameBW 0.23603704 0.1826911 obsnameCF -0.47405823 0.1781684 obsnameDH -0.26030426 0.1699321 obsnameEC 0.32659673 0.2128797 obsnameMG -0.02400511 0.1146002 obsnameMH -0.29883388 0.1445970 obsnamePL -0.03879631 0.1367317 obsnamePOB 0.31083754 0.2505650 obsnameRL -0.12420240 0.1382154 obsnameRT -0.42317349 0.1877019 obsnameSD -0.06626935 0.1670976 Estimate SE CV Average p 0.550091 0.01437652 0.02613479 Summary for io object Total AIC value : 6172.083 Estimate SE CV Average p 0.414879 0.01871825 0.04511737 N in covered region 3191.292213 159.25048517 0.04990157


####
ddf.gof() and summar for the DS component only: > MCDS.25 <- ddf(data=front.dat, dsmodel = ~mcds(key="hn", formula = ~side+obsname), + meta.data=list(binned=TRUE,point=FALSE,width=300,breaks=c(0,50,100,200,300))) > ddf.gof(MCDS.25) Goodness of fit results for ddf object Chi-square tests [0,50] (50,100] (100,200] (200,300] Total Observed 257.00 196.000 288.000 102.000 843.000 Expected 256.99 215.311 260.106 110.593 843.000 Chisquare 0.00 1.732 2.991 0.668 5.391 No degrees of freedom for test > summary(MCDS.25) Summary for ds object Number of observations : 843 Distance range : 0 - 300 AIC : 2174.916 Detection function: Half-normal key function Detection function parameters Scale coefficient(s): estimate se (Intercept) 5.4159064 0.1910553 sideR -0.3656326 0.1234385 obsnameBP -1.1203421 0.2892929 obsnameBW 0.3749711 0.3666190 obsnameCF -0.9148969 0.2552450 obsnameDH -0.3351073 0.2491033 obsnameEC -0.1352048 0.2197059 obsnameMG -0.1461062 0.1850177 obsnameMH -0.6371619 0.1940995 obsnamePL -0.4896951 0.1874705 obsnamePOB 0.5631620 0.6293066 obsnameRL -0.6226947 0.1910394 obsnameRT -0.8143725 0.2681655 obsnameSD -0.2060383 0.2236076 Estimate SE CV Average p 0.5301358 0.01839002 0.03468925 N in covered region 1590.1585476 67.82135263 0.04265069
Reply all
Reply to author
Forward
0 new messages