The implicit intercept in the formula language

505 views
Skip to first unread message

Douglas Bates

unread,
Aug 22, 2014, 4:11:50 PM8/22/14
to julia...@googlegroups.com
In a discussion on julia-users about the formula language for the MixedModels package I found myself again explaining the implicit intercept in the R formula language and its ramifications.  This led me to consider not having an implicit intercept and requiring the user to write y ~ 1 + x when they want an intercept and slope.

It is two extra characters to type which is negligible.  However, it would throw off any R users who expect the implicit intercept.  Which is the lesser of the two evils?  I personally think that requiring an explicit intercept term would make the connection between the formula and the fitted coefficients much clearer but that doesn't mean that there wouldn't be howls of protest. 

Milan Bouchet-Valat

unread,
Aug 22, 2014, 4:30:18 PM8/22/14
to julia...@googlegroups.com
Le vendredi 22 août 2014 à 13:11 -0700, Douglas Bates a écrit :
In a discussion on julia-users about the formula language for the MixedModels package I found myself again explaining the implicit intercept in the R formula language and its ramifications.  This led me to consider not having an implicit intercept and requiring the user to write y ~ 1 + x when they want an intercept and slope.


It is two extra characters to type which is negligible.  However, it would throw off any R users who expect the implicit intercept.  Which is the lesser of the two evils?  I personally think that requiring an explicit intercept term would make the connection between the formula and the fitted coefficients much clearer but that doesn't mean that there wouldn't be howls of protest.
That's an interesting idea, but I'm concerned that it would confuse not only R users, but also users of most statistical frameworks I know of (MATLAB, Python's StatModels, SAS, Stata). People will inevitably forget to add the intercept because they take inspiration from existing material using other software, and we'll end up answering this question much more often than currently with the reverse situation. I'd say 90% of cases, you want the intercept to be included.

It would be useful, though, to print the formula of the model in its expanded form, i.e. adding `1 +` when not specified.

My two cents

Stefan Karpinski

unread,
Aug 22, 2014, 5:13:15 PM8/22/14
to julia-stats
+1 (get it)

But seriously, I would love this. Of course, I never learned the formula language in R and it was just this weird thing that would show up in code snippets that I never actually understood. Having the explicit `y ~ x + 1` makes it so much clearer to me as a person with a mathematical background who never specifically learned this syntax. Since Julia's formula syntax is deviating from R's already, it might be better for it to just be "its own thing" instead of a slightly "broken" version of R's formulas.


On Fri, Aug 22, 2014 at 4:11 PM, Douglas Bates <dmb...@gmail.com> wrote:
In a discussion on julia-users about the formula language for the MixedModels package I found myself again explaining the implicit intercept in the R formula language and its ramifications.  This led me to consider not having an implicit intercept and requiring the user to write y ~ 1 + x when they want an intercept and slope.

It is two extra characters to type which is negligible.  However, it would throw off any R users who expect the implicit intercept.  Which is the lesser of the two evils?  I personally think that requiring an explicit intercept term would make the connection between the formula and the fitted coefficients much clearer but that doesn't mean that there wouldn't be howls of protest. 

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Myles White

unread,
Aug 22, 2014, 5:19:01 PM8/22/14
to julia...@googlegroups.com
I also like making the intercept explicit with a + 1.

On a related but separate, I'd also prefer that we fit the intercept without storing a column of 1's in the design matrix. This is how lots of machine learning systems like Vowpal Wabbit work, because it makes it easier to (a) work with an existing matrix of features without mutating it and (b) perform regularization of all terms except the intercept by doing things like cost += norm(coef, 1) or cost += norm(coef, 2).

Stefan, if you're never learned the formula syntax (especially the lme4 variant), I'd encourage you to try it out. I'd argue that it's one of the best features of R.

-- John

Stefan Karpinski

unread,
Aug 22, 2014, 5:26:22 PM8/22/14
to julia-stats
On Fri, Aug 22, 2014 at 5:18 PM, John Myles White <johnmyl...@gmail.com> wrote:
Stefan, if you're never learned the formula syntax (especially the lme4 variant), I'd encourage you to try it out. I'd argue that it's one of the best features of R.

Will do :-)

Johan Sigfrids

unread,
Aug 22, 2014, 5:30:16 PM8/22/14
to julia...@googlegroups.com
I myself have at times wished the formula interface in R would require explicit intercept. It would be easier for students to map the terms to the terms of a linear formula they're already familiar with.

Gray Calhoun

unread,
Aug 22, 2014, 5:49:59 PM8/22/14
to julia...@googlegroups.com
On Friday, August 22, 2014 3:11:50 PM UTC-5, Douglas Bates wrote:
...
This led me to consider not having an implicit intercept and requiring the user to write y ~ 1 + x when they want an intercept and slope.

As a user and teacher, I'd prefer the implicit intercept. The most common case should usually
be the default, and regression with an intercept is overwhelmingly the common case.
Users can already write "y ~ 1 + x" if they want the intercept to be made explicit, and it
gives the same results as "y ~ x".

I agree with John's suggestion that the intercept should not actually be stored as a column
of 1s in the design matrix, though.

--Gray
--Gray

Stefan Karpinski

unread,
Aug 22, 2014, 5:54:48 PM8/22/14
to julia-stats
It is generally a good principle to make the default the right thing. But in this case it makes specifying that there is no intercept so bizarre – writing `y ~ x + 0` and having that be different from `y ~ x` but having `y ~ x + 1` is just strange.


--

Milan Bouchet-Valat

unread,
Aug 22, 2014, 5:58:49 PM8/22/14
to julia...@googlegroups.com
Le vendredi 22 août 2014 à 14:30 -0700, Johan Sigfrids a écrit :
> I myself have at times wished the formula interface in R would require
> explicit intercept. It would be easier for students to map the terms
> to the terms of a linear formula they're already familiar with.
So if most people are in favor of requiring an explicit `+ 1`, what do
you think of also requiring the user to specify that the intercept
should not be included, using `- 1` or `+ 0 `? That way, and error could
be printed when nothing is specified, which would catch the most common
mistake of people coming from all other languages.


Regards

> On Friday, August 22, 2014 11:11:50 PM UTC+3, Douglas Bates wrote:
> In a discussion on julia-users about the formula language for
> the MixedModels package I found myself again explaining the
> implicit intercept in the R formula language and its
> ramifications. This led me to consider not having an implicit
> intercept and requiring the user to write y ~ 1 + x when they
> want an intercept and slope.
>
>
> It is two extra characters to type which is negligible.
> However, it would throw off any R users who expect the
> implicit intercept. Which is the lesser of the two evils? I
> personally think that requiring an explicit intercept term
> would make the connection between the formula and the fitted
> coefficients much clearer but that doesn't mean that there
> wouldn't be howls of protest.

John Myles White

unread,
Aug 22, 2014, 6:07:15 PM8/22/14
to julia...@googlegroups.com
I believe not specifying anything would count as no intercept.

-- John

Gray Calhoun

unread,
Aug 22, 2014, 6:07:41 PM8/22/14
to julia...@googlegroups.com
On Friday, August 22, 2014 4:54:48 PM UTC-5, Stefan Karpinski wrote:
It is generally a good principle to make the default the right thing. But in this case it makes specifying that there is no intercept so bizarre – writing `y ~ x + 0` and having that be different from `y ~ x` but having `y ~ x + 1` is just strange.

R allows `y ~ x -1` for "without intercept" which has always seemed reasonable to me.  I agree that `y ~ x + 0` being different from `y ~ x` is strange.

FWIW I've never heard someone say, "I ran a regression of salary on education level and an intercept." But people will say it out loud if there's no intercept. Regression without an intercept is something that almost never comes up. (I know I'm repeating myself...)

Jeremy Miles

unread,
Aug 22, 2014, 6:19:39 PM8/22/14
to julia...@googlegroups.com
On 22 August 2014 15:07, John Myles White <johnmyl...@gmail.com> wrote:
I believe not specifying anything would count as no intercept.



That seems very dangerous to me. If non-experts end up using it (and maybe that's not a good thing, but it will almost certainly happen) they're going to screw it up, and present nonsense results without knowing it.  Perhaps if you don't specify anything about the intercept, an error or warning should be given.   But how often do you run regressions without the intercept? For me, it's extremely close to never.  (Most commonly, it's to demonstrate how you get screwy results if you do it.)

Jeremy  

 

Diego Javier Zea

unread,
Aug 22, 2014, 6:42:34 PM8/22/14
to julia...@googlegroups.com
I prefer to use an implicit 1. It is simply and most people are going to expect it.
But also would be great to have the ability for use any numeric constant to indicate a factor in the intercept term. 
You can choose 0 for not allowing the intercept term.
Also you can do some fancy stuff, like allowing to find factors of pi or other constants ;)

y = a.x + b
y ~   x

y = a.x + 1.b
y ~   x + 1

y = a.x - 1.b
y ~   x - 1

y = a.x + 0.b
y ~   x + 0

y = a.x + pi.b
y ~   x + pi

Bradley Setzler

unread,
Aug 24, 2014, 12:19:50 PM8/24/14
to julia...@googlegroups.com
Hi,

I have never seen OLS without intercept in economics, except as a homework exercise, so I would encourage including the intercept by default but with a simple command to remove it. 

Best,
Bradley

John Myles White

unread,
Aug 24, 2014, 1:49:39 PM8/24/14
to julia...@googlegroups.com
I’m surprised that many of the e-mails on this thread are so focused on what kinds of models are commonly fit, rather than considering what are, in my mind, the core issues of programming language design: simplicity of definition, ease of implementation and explicitness of semantics. Since we’re not trying to figure out what the Huffman code would be for writing down statistical models, it’s not clear to me that we as a community should be so concerned with the statistics of how OLS is used when the proposed burden of explicit intercepts is the addition of 4 characters worth of typing.

To put it another way, we’re designing a programming language, so we should put some effort into making sure that we create a language with simple and precise semantics.

After some reflection, I personally think that the use of implicit intercepts substantively worsens the formula language. In particular, I’d like to raise some concerns that I have about the use of implicit intercepts.

(1) It means that the + operator doesn’t have any coherent meaning. If intercept terms are explicit, the + operator is exactly equivalent to the statement, “add a new set of columns into my design matrix.” But if you have to write y ~ 0 + x, you’ve broken that interpretation of the + operator. Adding special cases to core operators is risky because any quirk in the core primitives of a language is likely to produce crazy results when you consider more complex expressions written in that language. For example, what does y ~ (0 + x) * f mean? Just how many special cases are you really adding to the language by allowing the 0 in formulas?

This problem of downstream poisoning of semantics is even worse if you consider allowing the use of -1 to reflect the absence of intercepts. If you allow people to write y ~ x - 1, you’ve added an entire operator to your language without adding any expressive power relative to the use of explicit intercepts.

But you haven’t just added in functionality that’s not strictly necessary. You’ve also introduced a lot of edge cases that make the formula language harder to work with. Here are some of the problems you now need to solve:

(1a) What’s the precedence of the subtraction operator? Is the precedence inherited from Julia appropriate?

(1b) How does the - operator interact with things like the * operator in a formula such as y ~ (x - 1) * f?

(1c) Can you use subtraction to remove terms other than intercepts? For example, could you write y ~ a*b - a&b to remove an interaction from a model with both main effects and interactions? If not, why not?

(1d) Where can -1 occur? Can you write y ~ -1? Can you write y ~ -1 + x? Can you write y ~ x - 1? Are the last two of these formulas identical or not?

These kinds of edge case problems need complete, formal solutions. And the solutions need to obey simple desiderata: for example, the formula language shouldn’t use +  if the operator isn’t commutative.

In short, the use of implicit intercepts makes the formula language substantially more complex without making it substantially more expressive.

And that’s not my only concern about the implicit intercept approach. I’m also worried by the fact that:

(2) The use of implicit intercepts means that the nested model relationship is no longer reflected in surface syntax. The model y ~ 1 is a subset of the model y ~ x, but the formulas are not. This makes quick visual analysis of formulas harder.

(3) It makes the formula language less explicit, which means that newcomers need to memorize more rules and implicit assumptions to understand the language.

(4) It attempts to retain superficial similarities to R in a system that’s got very different semantics from R. False similiarities to R are likely to make learning Julia harder because of proactive interference. A formula language that’s more obviously different from R will probably help people appreciate that Julia’s language is completely separate from R’s.

(5) It introduces synonyms into the formula language in the sense that y ~ x and y ~ 1 + x are meant to be identical. This means that implementations of the formula language need to be very careful to make sure that those synonyms never behave differently.

 — John

Stefan Karpinski

unread,
Aug 24, 2014, 2:26:48 PM8/24/14
to julia...@googlegroups.com
+1,000,000,000

A really good process for this kind of design is to try to document it. If the process of explaining something precisely and clearly is too difficult – e.g. you find yourself adding "except" and "but" everywhere – then you've got a problem. I suspect that implicit intercepts are a feature that leads to a lot of these kinds of digressions in explaining the formula language and that explicit intercepts would make it much easier to explain and only trivially more verbose to use.

Johan Sigfrids

unread,
Aug 24, 2014, 2:37:21 PM8/24/14
to julia...@googlegroups.com
+10^(# R users)

The R formula language is a great interface, but the implicit intercept was a big mistake.

Gray Calhoun

unread,
Aug 24, 2014, 2:37:52 PM8/24/14
to julia...@googlegroups.com
There are a lot of good points in this email, but I don't think that having the explicit "1" is universally better. One could define addition and multiplication on formulas as
(y ~ x) + (y ~ z) = y ~ x + z
(y ~ x) * (y ~ z) = y ~ x + z + x&z
if the LHS variable is the same in both formulas. With an explicit constant we now get redundant terms to deal with.

AFAIK, the intercept/no-intercept distinction only makes sense in a regression context, which might be why people have focused on the models that are commonly fit. But if we're thinking about other uses of formulas, it would be strange to use
plot(y ~ x, exampledata) to graph a scatter plot, but have to use `y ~ 1 + x` to estimate its slope. Especially because

f = y ~ x
plot(f, exampledata)
lm(f, exampledata)
plot(lm(f, exampledata))

looks like a very nice way to plot a relationship, estimate it, and plot the estimated relationship. If plot(y ~ x + 1, exampledata) so it's consistent with the glm use, what does plot(y ~ x, exampledata) generate?

Since formulas should be used in other places than estimation, but the distinction probably only makes sense in regression, another approach might be to make `nointercept` a separate argument for glm and depreciate using +0. Then `y ~` could be a valid but discouraged way of writing what's now `y ~ 1`.

--Gray

Gray Calhoun

unread,
Aug 24, 2014, 2:40:50 PM8/24/14
to julia...@googlegroups.com

If plot(y ~ x + 1, exampledata) so it's consistent with the glm use, what does plot(y ~ x, exampledata) generate?

Sorry, this should be "If we require users to write... plot(y ~ x + 1, exampledata)..."

John Myles White

unread,
Aug 24, 2014, 2:50:07 PM8/24/14
to julia...@googlegroups.com
To be honest, I’m not at all concerned about that issue, since I am steadfastly opposed to the idea of using formulas for anything other than the specification of design matrices.

That tradition in R violates what is arguably the most central concept in Julia style: don’t write puns. We should never reuse operators to mean things that have no relationship with their core semantics.

Also, just to resolve one potential source of disagreement: everyone here understands that Julia is eagerly evaluated, right? I ask because the absence of delayed evaluation means that almost all R idioms involving the ~ tilde operator don’t make any sense in Julia and will never work in Julia.

 — John

Gray Calhoun

unread,
Aug 24, 2014, 3:09:14 PM8/24/14
to julia...@googlegroups.com
On Sunday, August 24, 2014 1:50:07 PM UTC-5, John Myles White wrote:
To be honest, I’m not at all concerned about that issue, since I am steadfastly opposed to the idea of using formulas for anything other than the specification of design matrices.

That tradition in R violates what is arguably the most central concept in Julia style: don’t write puns. We should never reuse operators to mean things that have no relationship with their core semantics.

Also, just to resolve one potential source of disagreement: everyone here understands that Julia is eagerly evaluated, right? I ask because the absence of delayed evaluation means that almost all R idioms involving the ~ tilde operator don’t make any sense in Julia and will never work in Julia.

I may be missing something, but I don't see how any of those preclude plot(y ~ x, exampledata). Both y and x would be columns and exampledata would be something like a DataFrame. Julia doesn't need to automagically guess the labels from the formula, just extract the right data. I agree that "R does it this way" is not any sort of justification for how Julia should do it.

On the subject of puns: it would be kind of strange to change the formula to make the intercept explicit, but change the algorithm to make the column of 1s implicit ;)

John Myles White

unread,
Aug 24, 2014, 3:24:30 PM8/24/14
to julia...@googlegroups.com
Here are problems I have with plot(y ~ x, exampledata):

(1) It’s just a synonym for an interface that we already have. This violates the very effective design rule that "There should be one — and preferably only one — obvious way to do it."

(2) We can never make something like plot(y ~ log(x), exampledata) work like it would in a language with delayed evaluation, which means that all of the power of the R interface won’t be present in Julia. The only similar thing we could do would be something like @plot(:y ~ log(:x), exampledata).

(3) The plot usage of formulas uses something that looks superficially like the formula language, but has literally none of the expressive power of the formula language. If the formula language only consisted of +, then you could always get the exact same effect by writing out explicit lists in a syntax like plot([:y1, :y2], [:x1, :x2, :x3]). I think that kind of explicitness would be a larger gain in readability over something like plot(y1 + y2 ~ x1 + x2 + x3). The whole reason that the formula language isn’t just an alternative language for enumerating lists is that it provides you with interaction terms. Therefore, any interface that doesn’t use interaction terms could be written using lists.

— John

Gray Calhoun

unread,
Aug 24, 2014, 4:23:38 PM8/24/14
to julia...@googlegroups.com

Okay, thanks. `plot` may not have been the best example, then.  The
general point I was trying to make is that answering "how to handle
the intercept" should depend on whether formulas are going to be used
outside of a regression context. But still a context where the
interactions and power of formulas to create objects like design
matrices are useful.

You've convinced me that `+0` and `-1` are bad. If you decide to
require `+1` for an intercept, it would prevent a lot of errors if glm, etc
gives a warning when it's called without an intercept. (I know it
would be strange to have the "default" regression give a
warning... but having coherent formulas could probably be worth it.
The warning should never come up anyway :))

John Myles White

unread,
Aug 24, 2014, 6:21:45 PM8/24/14
to julia...@googlegroups.com
There’s definitely a solid argument to be made for raising a warning by default, but I’m not sure that we should start assuming people don’t know what they’re doing. I’m generally a believer in the principle that you should (1) assume that users mean exactly what they say and (2) only consider something a mistake if you’re willing to raise a fatal error in response to it.

 — John

Gray Calhoun

unread,
Aug 24, 2014, 8:26:12 PM8/24/14
to julia...@googlegroups.com
Fair enough. There are certainly many other parts of Julia that can
cause errors if they're used naively, and I definitely appreciate that
the default perspective is to trust the user. I'm especially
(over)sensitive to this particular user mistake because I teach
introductory econometrics, but other people would be a better judge of
its importance.

"Users from other languages expect this behavior" is usually a pretty
bad way to decide to give a warning...

John Myles White

unread,
Aug 24, 2014, 8:39:16 PM8/24/14
to julia...@googlegroups.com
I completely agree that beginners make a lot of mistakes. I do a lot of teaching as well and see firsthand that many parts of R confuse new users.

But I see the error-proneness of beginners’ choices as one of the strongest arguments for explicitness, because explicitness makes it easier for an expert to review a person’s code and see that they’ve made mistakes. It’s much harder to catch errors in code that makes lots of implicit assumptions.

I also agree with you that adopting warnings based on the expectations that users have coming from other languages isn’t a good strategy.

 — John

Simon Byrne

unread,
Aug 25, 2014, 5:20:04 AM8/25/14
to julia...@googlegroups.com
For anyone interested in the background, apparently the R formula notation is (a variant of) what is known as Wilkinson-Rogers notation, which was originally proposed in this article:


Unfortunately they don't explain why the intercepts are implicit.

In my mind, the argument for the implicit intercept ultimately comes down to what you consider the "root" model: is it 
 (a) Y = error, or
 (b) Y = constant + error ?

In the ANOVA context (for which it was originally proposed), it is (b): you're trying to explain the "variance" (i.e. everything that isn't constant) by adding factors into the model, and so I think the implicit intercept makes complete sense here.

Of course, statistics has changed a lot in the 40 years since: we now have glms, mixed and hierarchical models, lasso methods, etc. Rather than dividing up sum-of-squares by factor, we now typically think in a more "generative" sense, by constructing models that represent the process which we're modelling. When linear models and glms are taught, they're usually framed in terms of a linear predictor

eta = beta_0 + beta_1 X_1 + ... beta_k X_k

While this makes sense from a linear algebra and sampling theory perspective (the intercept is just another column), there is still something special about the intercept. It's not just that models with an intercept are more common, it's that models without an intercept are almost always nonsensical.

Personally, I always found that linear models and glms always made the most sense when thinking in terms of an affine space, not a linear space: 
* for categorical covariates, you don't care about the absolute coefficients, you care about the relative coefficients (the contrasts): while we're on it, one of the things that really annoys me about R is that it doesn't make the contrasts explicit in the output.
* for numerical covariates, the coefficient describes the change in response per unit change in the covariate.

Moreover, the intercept has special behaviour of its own. For instance,
* it's not even possible to exclude the intercept if you have categorical covariates, as it will be absorbed into covariates of the contrasts. Although the coefficients will have different numerical values, you still get exactly the same model.
* in lasso models, it is the one beta that you don't penalise.

As a result, I would lean toward keeping an implicit intercept, though don't have particularly strong feelings about it. I don't have much experience with mixed models, so can't really comment on the utility of intercepts there. Also, I've never had to teach it to new users, so can't really say what is easier to learn.

As to how to state "no intercept": in my mind, there's no real need for this in the standard formula interface. What you're really doing is imposing a constraint on the model space, and so should be handled in the same manner you would impose other such constraints (however I don't have any good suggestions for that either...)

-Simon

Milan Bouchet-Valat

unread,
Aug 25, 2014, 6:21:55 AM8/25/14
to julia...@googlegroups.com
Le dimanche 24 août 2014 à 17:39 -0700, John Myles White a écrit :
> I completely agree that beginners make a lot of mistakes. I do a lot
> of teaching as well and see firsthand that many parts of R confuse new
> users.
>
> But I see the error-proneness of beginners’ choices as one of the
> strongest arguments for explicitness, because explicitness makes it
> easier for an expert to review a person’s code and see that they’ve
> made mistakes. It’s much harder to catch errors in code that makes
> lots of implicit assumptions.
>
>
> I also agree with you that adopting warnings based on the expectations
> that users have coming from other languages isn’t a good strategy.
It's generally a dubious strategy since languages all differ in subtle
and varied ways. But in the case at hand it seems that all languages
work the same, and Julia be the only outlier. Time will tell, but we may
get tired of writing the same explanation again and again, and prefer to
add a warning when nothing is explicitly said about the intercept.


Regards

Milan Bouchet-Valat

unread,
Aug 25, 2014, 6:40:56 AM8/25/14
to julia...@googlegroups.com
Le lundi 25 août 2014 à 02:20 -0700, Simon Byrne a écrit :
For anyone interested in the background, apparently the R formula notation is (a variant of) what is known as Wilkinson-Rogers notation, which was originally proposed in this article:


http://www.jstor.org/stable/2346786



Unfortunately they don't explain why the intercepts are implicit.


In my mind, the argument for the implicit intercept ultimately comes down to what you consider the "root" model: is it 
 (a) Y = error, or
 (b) Y = constant + error ?


In the ANOVA context (for which it was originally proposed), it is (b): you're trying to explain the "variance" (i.e. everything that isn't constant) by adding factors into the model, and so I think the implicit intercept makes complete sense here.


Of course, statistics has changed a lot in the 40 years since: we now have glms, mixed and hierarchical models, lasso methods, etc. Rather than dividing up sum-of-squares by factor, we now typically think in a more "generative" sense, by constructing models that represent the process which we're modelling. When linear models and glms are taught, they're usually framed in terms of a linear predictor


eta = beta_0 + beta_1 X_1 + ... beta_k X_k


While this makes sense from a linear algebra and sampling theory perspective (the intercept is just another column), there is still something special about the intercept. It's not just that models with an intercept are more common, it's that models without an intercept are almost always nonsensical.


Personally, I always found that linear models and glms always made the most sense when thinking in terms of an affine space, not a linear space: 
* for categorical covariates, you don't care about the absolute coefficients, you care about the relative coefficients (the contrasts): while we're on it, one of the things that really annoys me about R is that it doesn't make the contrasts explicit in the output.
* for numerical covariates, the coefficient describes the change in response per unit change in the covariate.


Moreover, the intercept has special behaviour of its own. For instance,
* it's not even possible to exclude the intercept if you have categorical covariates, as it will be absorbed into covariates of the contrasts. Although the coefficients will have different numerical values, you still get exactly the same model.
Note this isn't the case currently with GLM:
julia> gm1 = fit(GeneralizedLinearModel, Counts ~ Outcome + Treatment, dobson, Poisson())
DataFrameRegressionModel{GeneralizedLinearModel,Float64}:

Coefficients:
                  Estimate Std.Error     z value Pr(>|z|)
(Intercept)        3.04452  0.170899     17.8148   <1e-70
Outcome - 2      -0.454255  0.202171    -2.24689   0.0246
Outcome - 3      -0.292987  0.192742     -1.5201   0.1285
Treatment - 2  5.66414e-16       0.2 2.83207e-15   1.0000
Treatment - 3  1.31354e-18       0.2 6.56771e-18   1.0000

# Note that only two dummies are included for each categorical variable,
# as in model 1
julia> gm2 = fit(GeneralizedLinearModel, Counts ~ -1 + Outcome + Treatment, dobson, Poisson())
DataFrameRegressionModel{GeneralizedLinearModel,Float64}:

Coefficients:
               Estimate Std.Error z value Pr(>|z|)
Outcome - 2    0.762395  0.256078  2.9772   0.0029
Outcome - 3    0.923663  0.248704 3.71391   0.0002
Treatment - 2   2.17826  0.235164 9.26276   <1e-19
Treatment - 3   2.17826  0.235163 9.26278   <1e-19

julia> deviance(gm1)
5.129141077001146

julia> deviance(gm2)
182.45992882517174



In R, both specifications give the same deviance, the first categorical variable gives three dummies instead of two.


Regards



* in lasso models, it is the one beta that you don't penalise.


As a result, I would lean toward keeping an implicit intercept, though don't have particularly strong feelings about it. I don't have much experience with mixed models, so can't really comment on the utility of intercepts there. Also, I've never had to teach it to new users, so can't really say what is easier to learn.


As to how to state "no intercept": in my mind, there's no real need for this in the standard formula interface. What you're really doing is imposing a constraint on the model space, and so should be handled in the same manner you would impose other such constraints (however I don't have any good suggestions for that either...)


-Simon


Adam Kapor

unread,
Aug 25, 2014, 9:11:03 AM8/25/14
to julia...@googlegroups.com
Can I chime in here as a user?  I am an economist and really dislike the implicit intercept.  Every time I want to estimate a model without an intercept I have to go look up the syntax because `y ~ -1 + ...` doesn't make sense to me, and in my experience the implicit intercept in R (and Stata) is also (initially) confusing to the students when teaching econometrics.

Ethan Anderes

unread,
Aug 25, 2014, 1:31:13 PM8/25/14
to julia...@googlegroups.com
As a statistician, I would just like to add my support for an explicit intercept (y ~ 1 + x). 

I remember first learning about the implicit intercept in R and being worried about "what the heck else is implicit that I'm going get an egg in the face with later ". 

Simon Byrne

unread,
Aug 26, 2014, 7:36:34 AM8/26/14
to julia...@googlegroups.com
On Monday, 25 August 2014 11:40:56 UTC+1, Milan Bouchet-Valat wrote:
* it's not even possible to exclude the intercept if you have categorical covariates, as it will be absorbed into covariates of the contrasts. Although the coefficients will have different numerical values, you still get exactly the same model.
Note this isn't the case currently with GLM:

This is an interesting point, and something that should be clarified no matter which course of action we decide. My interpretation of the formula interface is that you're describing either an affine or vector space via the span of a set of vectors, and so adding a categorical variable is equivalent to adding the set of dummy variables for each category, the span of which contains the intercept. This interpretation however would preclude the use of "0 + " or "-1 + " for denoting "intercept coefficient = 0".

 

Stefan Karpinski

unread,
Aug 26, 2014, 8:08:59 AM8/26/14
to julia...@googlegroups.com
It seems like the two coherent approaches are:

1. explicit intercept
2. express lack of intercept as a constraint

In others words the "+ 0" or "- 1" business is a bit of a hack.
--

John Myles White

unread,
Aug 26, 2014, 11:17:49 AM8/26/14
to julia...@googlegroups.com
To make sure we all agree: the effect of adding a categorical variable is to add enough columns to the design matrix to mimic the span of the full set of dummy variables. So, in the presence of an intercept, one level is always left out.

FWIW, I think we should provide a mechanism for working around this. In Bayesian settings, you often want to estimate paramaters for all levels and can because the prior prevents singularity in the design matrix.

 — John

Gray Calhoun

unread,
Aug 26, 2014, 12:25:08 PM8/26/14
to julia...@googlegroups.com
I think so. Approach 1 is obviously much easier to implement. The
Wilkinson and Rogers paper Simon cited earlier [1] has some nice
ideas about setting constraints in this notation too, but dealing
with the current "+0" flag could make them harder to program as
well.

So "1" is probably the right next step even if "2" ultimately
turns out to be the better approach.

[1]: http://www.jstor.org/stable/2346786 (in Section 4)

Simon Byrne

unread,
Aug 27, 2014, 5:25:09 AM8/27/14
to julia...@googlegroups.com
On Monday, 25 August 2014 10:20:04 UTC+1, Simon Byrne wrote:
As a result, I would lean toward keeping an implicit intercept, though don't have particularly strong feelings about it.

Having now looked at the lme4 documentation, I see that explicit intercepts are required in the random components of mixed models, and so for consistency it would make a lot more sense to make them explicit everywhere.

simon 

Ross Boylan

unread,
Aug 27, 2014, 3:37:56 PM8/27/14
to julia...@googlegroups.com
As a statistician I don't like forcing an explicit specification of
the intercept, even though as a computer scientist I see the appeal.
I think my inner statistician is right. Details below.
The (1|x) notation means a random intercept for each level of x. It
is necessary to distinguish this from a random slope associated with
x, e.g. (a|x) means the coefficient on a varies by level of x. In
both cases the distribution of the random coefficients is centered on
0 and the random effects parameters to be estimated are variances.
What I would think of as the intercept of the random term is just the
corresponding fixed effect.

While looking at the formula notation as a system of symbols, devoid
of statistical content, provides some grounds for saying the explicit
1 in the random effect argues for an explicit 1 for fixed effects,
this doesn't strike me a good reason to use explicit 1's when the
domain is statistics. It might be an argument for using a different
notation than (1|x) for the random effects.

I don't think forcing people to specify the intercept explicitly is a
good idea, because it is so unusual to have a model without one. I'm
pretty sure "specify an intercept explicitly" is a rule I would
sometimes forget to follow even if I were aware of it. One could
issue a warning, but that comes with its own drawbacks, as others
pointed out.

The +0 or -1 notation to avoid the intercept is pretty weird, and it
would be nice to have a better way to specify no intercept.

As others have also pointed out, this isn't just an R thing. For
example, in SAS
model y = a b c;
usually means y depends on a, b, c, and an intercept. It you don't
want an intercept you say
model y = a b c /noint;


Unless someone only uses julia, they have to re-learn to specify the
intercept each time they switch back to julia from something else.

Aside from that practical concern, the fact that different pieces of
software all use implicit intercepts is an indication that an implicit
intercept is the natural way to do things.

We could also punt and make it a user-specified option whether or not
explicit intercepts are required. It's not a great option--forcing
too many choices on the user, making the meaning of a formula opaque
since it depends on a magic setting--but it's an option.

Ross Boylan

John Myles White

unread,
Aug 27, 2014, 3:40:53 PM8/27/14
to julia...@googlegroups.com
And we were making so much progress towards agreement on an explicit intercept...

Ross, if you're proposing removing (1 | x), I'd really like to see an alternative formula language proposal.

-- John

Stefan Karpinski

unread,
Aug 27, 2014, 3:41:40 PM8/27/14
to julia-stats
On Wed, Aug 27, 2014 at 3:37 PM, Ross Boylan <ro...@biostat.ucsf.edu> wrote:
We could also punt and make it a user-specified option whether or not
explicit intercepts are required.  It's not a great option--forcing
too many choices on the user, making the meaning of a formula opaque
since it depends on a magic setting--but it's an option.

This is the worst possible choice. Please, please don't do this.

John Myles White

unread,
Aug 27, 2014, 3:45:48 PM8/27/14
to julia...@googlegroups.com
Yeah, a lot of us are very opposed to magic global variables. This is one big difference between a programming language and a program to do statistics like SAS: we need to provide tools for developing complex systems being written by teams of 10's, 100's or 1000's of people. That kind of work is effectively impossible to do in a system that uses global options to control behavior.

 -- John

Stefan Karpinski

unread,
Aug 27, 2014, 3:47:06 PM8/27/14
to julia-stats
The whole discussion comes down to deciding what the formula language *means*. Is it a language for expressing linear relationships or affine relationships? Once you've decided what it means, then the rest is pretty straightforward. The trouble with the syntax in R is that it doesn't have a coherent meaning – it's just a syntax that's grown over time.

Since I don't understand what the (1|x) syntax means, could someone explain it to me? In fact, I'd say that an attempt to explain the full formula syntax and what it means to me – serving as a rubber ducky of slightly higher than average knowledge (for a rubber ducky) – would be a very productive exercise.

John Myles White

unread,
Aug 27, 2014, 3:48:31 PM8/27/14
to julia...@googlegroups.com
(1|x) means that you insert a new column into the design matrix for every level of the factor x -- unles that new level would cause singularity because of redundancy with an existing column.

 -- John

Gray Calhoun

unread,
Aug 27, 2014, 4:48:31 PM8/27/14
to julia...@googlegroups.com
A third option (that I'm not sure I like):

y ~ 1 + x + z ## Intercept, no warning
y ~ 0 + x + z ## No intercept, no warning
y ~ x + z     ## No intercept, glm, etc. may choose to give a warning

Allowing '0' as an explicit no intercept could save a lot of
#= Not a Typo =# type comments, and I'd probably use it as
annotation in my own code even if it were exactly the same
as y ~ x + z. The "0" should work syntactically as a 0,
so y ~ (x + 0) * f has the same meaning as y ~ x * f in every
meaningful way.

I don't like that choice because it keeps "0" as a magic flag
that doesn't have real content, but at least it's a magic
flag that's consistent with the rest of the syntax.

But there may be problems this introduces that I'm missing.

Ross Boylan

unread,
Aug 27, 2014, 4:55:03 PM8/27/14
to julia...@googlegroups.com
On Wed, Aug 27, 2014 at 12:48:27PM -0700, John Myles White wrote:
> (1|x) means that you insert a new column into the design matrix for every level of the factor x -- unles that new level would cause singularity because of redundancy with an existing column.

That might be something done along the way--although more likely for
fixed effects--but that's not what it means. Since my previous
explanation was unclear, let me try again. Suppose you have units
indexed by i and observations within units indexed by j (e.g., i
indexes schools and j students within the school). You think that
some outcome is determined by this process:
y[i,j] = beta0 + b[i] + epsilon[i, j]
where the b[i] are drawn from a distribution that is
Normal(0, sigma) and the epsilon[i,j] are independent random error
terms with standard deviation s.

Then the b[i] are the random effects and sigma is the parameter
associated with them you want to estimate. The b[i] are neither
observable nor individually estimated.

If you think there is a fixed effect b[i] you get estimates of the
individual b[i] (and effectively do need a design matrix with a column
for each i), but with random effects you just get their overall
sigma. There are ways of estimating the posterior distribution of
individual b[i] ("empirical Bayes estimators," somewhat incongrously
since the model is a frequentist one).

There are a large number of elaborations, e.g., relaxing the
independence assumption on the epsilon, having crossed or nested
random effects. random slopes, non-normal random effects ....

Random slopes example:
A model with a covariate A and random slopes is
y[i,j] = beta0 + (beta1+c[i])*a + b[i] + epsilon[i, j]
beta0 and beta1 are "fixed effects" and b and c are "random effects".
They are both mean 0 normal terms, sometimes independent (2 parameters
for sigma to estimate) and sometimes not (3 parameters, including the
correlation). Conventionally greek letters indicate fixed effects and
regular letters are for random effects.

Random slopes without random intercepts are certainly possible, though
somewhat unusual.

Finally, note that the models above can also be estimated as a bayesian
hierarchical model, in which case the individual "random effects" b[i]
are just another coefficient, as the parameter sigma is just another
coefficient. These are "hierarchical" models because one coefficient,
sigma, determines the distribution of others, the b[i]. Bayesians do
not use the fixed/random distinction since all parameters are
characterized by distributions.

It would be good to have a notation that could encompass these
complexities, including the Bayesian models. Some scenarios such as
crossed random effects are impractical to estimate in the frequentist
framework (at least for non-linear models), but quite doable with
Bayesian methods.

In answer to John's earlier question about alternatives to (1|x), I
was mostly trying to make the point that rather than making the
inference (1|x) => change other stuff one could instead conclude that
"other stuff" => change (1|x).

I do not have a good enough command of julia to propose specific
alternatives with much confidence, but one that occurs to me is to say
a+re(x)
means a fixed effect a and a random intercept based on x with levels 1..i.
a+re(x, a)
means a fixed effect a, random intercept and slope for a based on x.
And generally
<fixed terms> + re(x, <random terms>)
where <random terms> has the same rules as <fixed effects> so the
intercept is implicit.

"re" may not be the best choice, since aside from being the initials
of "random effect" it sounds like the operator to get the real part of
complex number

The typical meaning of "random effect for X" is "a random intercept
for X", so the notation re(x) without a formula matches this useage.

Ross

John Myles White

unread,
Aug 27, 2014, 5:40:40 PM8/27/14
to julia...@googlegroups.com
You're right: the situation is complicated by the fact that the 1|x notation doesn't only regulate the design matrix, but also regulates how the model is fit as well.

-- John

Sam Lendle

unread,
Aug 29, 2014, 2:47:03 PM8/29/14
to julia...@googlegroups.com
I cast my vote for (something like) Gary's suggestion of

y ~ 1 + x + z ## Intercept, no warning
y ~ 0 + x + z ## No intercept, no warning
y ~ x + z     ## No intercept, glm, etc. may choose to give a warning

because (tl;dr)
  • The implicit intercept is non-intuitive when you think about how the design matrix is generated based on the formula. 
  • An implicit intercept is the default in every other formula/model specification system that I know of. Probably because as others say it's pretty unusual to want a model without an intercept, and when someone says "I regressed y on x" they mean with an intercept.
  • I like that whether or not you want an intercept, it is obvious from the formula or you get a warning.
When it comes to getting non-technical applied researchers to adopt Julia,  it will be a big mistake to make the "default" formula exclude the intercept without a warning. I'm not worried about myself and probably most current users of Julia who know what the implications of a formula on the design matrix are. A major reason these model specification system exist in R, SAS, STATA, etc. is to allow people who don't know how things work under the hood to a fit a  model easily.  A lot of researchers and students I work with do not ever thing in terms of a design matrix. They think "I want to look at the effect of x on y adjusting for a,b, and c", for example. They may not even really know what it means to fit a model with or without an intercept.  (Not that that is a good thing, but that's reality.)

Issuing a warning when a 0 or 1 isn't specified isn't so elegant but will force users to think about what they actually want and make a decision. It will probably also save me when I inevitably forget to add a 1+ when copying a formula from R or just forget to add 1+.

I'm not sure if allowing "0+" introduces other issues with the formula. If it does, maybe there's some other way of specifying no intercept? One could argue that we shouldn't have a special meaning for 0 that means "do nothing different but suppress the warning." But then one could also argue that 1 shouldn't have a special meaning, and if you want an intercept you should create a column of ones in your DataFrame, so I don't think it's too unreasonable logically.

Anyway that's my $0.02.

Sam
Reply all
Reply to author
Forward
0 new messages