Two-by-two tables commonly arise in comparative trials and cross-sectional studies. In medical studies, two-by-two tables may have a small sample size due to the rarity of a condition, or to limited resources. Current recommendations on the appropriate statistical test mostly specify the chi-squared test for tables where the minimum expected number is at least 5 (following Fisher and Cochran), and otherwise the Fisher-Irwin test; but there is disagreement on which versions of the chi-squared and Fisher-Irwin tests should be used. A further uncertainty is that, according to Cochran, the number 5 was chosen arbitrarily. Computer-intensive techniques were used in this study to compare seven two-sided tests of two-by-two tables in terms of their Type I errors. The tests were K. Pearson's and Yates's chi-squared tests and the 'N-1' chi-squared test (first proposed by E. Pearson), together with four versions of the Fisher-Irwin test (including two mid-P versions). The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed). This policy was found to have increased power compared to Cochran's recommendations.
According to this paper, the authors say "Milletari et al. (2016) proposed to change the denominator to the square form for faster convergence..." They are referencing the V-Net paper, which to my knowledge does not mention why they squared the terms in the denominator. However, I think that the formula used in that paper is only based on the Dice coefficient. They define it for binary segmentation volumes, so squaring does not change the values. But the same cannot be true if you are using softmax probabilities.
It makes sense though that squaring the terms could create a smoother loss landscape and therefore, faster convergence. For what it is worth, I have used the non-squared version of dice loss (1 - DSC) with good results. They are still optimizing the same thing (i.e. region overlap). However, if you are reporting the DSC as a performance metric, I would use the non-squared version.
On Thursday, October 15, 2015, a disbelieving student posted on Reddit: My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this? It attracted a fair amount of attention, at least compared to other posts about statistics on Reddit.
In case you forgot or didn't know, R-squared is a statistic that often accompanies regression output. It ranges in value from 0 to 1 and is usually interpreted as summarizing the percent of variation in the response that the regression model explains. So an R-squared of 0.65 might mean that the model explains about 65% of the variation in our dependent variable. Given this logic, we prefer our regression models to have a high R-squared. Shalizi, however, disputes this logic with convincing arguments.
1. R-squared does not measure goodness of fit. It can be arbitrarily low when the model is completely correct. By making \(\sigma^2\) large, we drive R-squared towards 0, even when every assumption of the simple linear regression model is correct in every particular.
Shalizi's statement is easy enough to demonstrate. The way we do it here is to create a function that (1) generates data meeting the assumptions of simple linear regression (independent observations, normally distributed errors with constant variance), (2) fits a simple linear model to the data, and (3) reports the R-squared. Notice the only parameter for sake of simplicity is sig (sigma). We then "apply" this function to a series of increasing \(\sigma\) values and plot the results.
Sure enough, R-squared tanks hard with increasing sigma, even though the model is completely correct in every respect. 2. R-squared can be arbitrarily close to 1 when the model is totally wrong. Again, the point being made is that R-squared does not measure goodness of fit. Here we use code from a different section of Shalizi's lecture 10 notes to generate non-linear data.
It's very high at about 0.85, but the model is completely wrong. Using R-squared to justify the "goodness" of our model in this instance would be a mistake. Hopefully one would plot the data first and recognize that a simple linear regression in this case would be inappropriate.
3. R-squared says nothing about prediction error, even with \(\sigma^2\) exactly the same, and no change in the coefficients. R-squared can be anywhere between 0 and 1 just by changing the range of X. We're better off using Mean Square Error (MSE) as a measure of prediction error.
The R-squared falls from 0.94 to 0.15 but the MSE remains the same. In other words the predictive ability is the same for both data sets, but the R-squared would lead you to believe the first example somehow had a model with more predictive power.
4. R-squared cannot be compared between a model with untransformed Y and one with transformed Y, or between different transformations of Y. R-squared can easily go down when the model assumptions are better fulfilled. L
It's even lower! This is an extreme case and it doesn't always happen like this. In fact, a log transformation will usually produce an increase in R-squared. But as just demonstrated, assumptions that are better fulfilled don't always lead to higher R-squared. And hence R-squared cannot be compared between models with different transformations of the outcome. 5. It is very common to say that R-squared is "the fraction of variance explained" by the regression. [Yet] if we regressed X on Y, we'd get exactly the same R-squared. This in itself should be enough to show that a high R-squared says nothing about explaining one variable by another. This is the easiest statement to demonstrate:
Does x explain y, or does y explain x? Are we saying "explain" to dance around the word "cause"? In a simple scenario with two variables such as this, R-squared is simply the square of the correlation between x and y:
Why not just use correlation instead of R-squared in this case? But then again correlation summarizes linear relationships, which may not be appropriate for the data. This is another instance where plotting your data is strongly advised. Let's recap:
And that's just what we covered in this article. Shalizi gives even more reasons in his lecture notes. And it should be noted that adjusted R-squared does nothing to address any of these issues. So is there any reason at all to use R-squared? Shalizi says no. ("I have never found a situation where it helped at all.") No doubt, some statisticians and Redditors might disagree. Whatever your view, if you choose to use R-squared to inform your data analysis, it would be wise to double-check that it's telling you what you think it's telling you.
This metric is pernicious because it does not measure what people actually care about (is the model good?). In fact, in my experience high R-squared is regularly correlated with having a worse model.
R-squared is an extremely misleading metric. It is a measure of how little variation there is in the dependent variable and it encourages analysts to ignore the interpretation of the model at the expense of simply including more variables just to drive up the R-squared vanity metric. Use out of sample MAPE (plus a scientific understanding of your domain) instead.
Is there a way to quickly create the sum of revenue squared as a calculated measure? Measuring confidence on calculated measures, so getting access to the data to preform this function in Excel - is proving difficult.
There is no generally agreed upon way to compute R-squared for generalized linear models, such as PROC MIXED. A number of methods have been proposed, these all have certain advantages and certain disadvantages. Your favorite search engine will find many discussions about this.
I chose this formula, R-squared = 1 - SSE_Model / SSE_IntOnly. SSE represents the sum of squared residuals from the model and SSE_IntOnly represents the sum of squared residuals from the intercept-only model. I chose this model because I was looking for a simple and less complicated formula to calculate the percent reduction in variance from the null model to the full model. I used the covariance parameter estimates table from proc mixed to calculate the R-squared. _Design.pdf
I am not sure if I have explained this well! I am very new to calculating the R-squared for multilevel models. I am not sure if this approach is the best or if R-squared should even be calculated this way, but it was a simple formula for me.
I also found this formula, R-squared = SSR/CTSS, where the SSR is the reduction sums of squares due to the model over and above the mean and the CTSS is the corrected total sum of squares. I got the same percent reductions using this formula.
I actually looked at the AIC as well! Maybe I should just focus on the AIC instead of the pseudo-R-squared because as you have stated the sum of squares is not what is being optimized in mixed models.
31c5a71286