GSoC 2012 Proposal - enhanced contirb.auth
==========================================
Hello, my name is Stratos Moros and I am a Computer Science student in the
University of Piraeus in Greece.
I would like to participate in Google Summer of Code with Django, working
on an enhanced contrib.auth.
Goals
-----
- Long term solution
The proposed replacement doesn't aim at just improving some of the
current limitations of contrib.auth (e.g. increasing the email field
size), but at providing a long term solution.
- Backwards compatibility
The new contrib.auth will be 100% backwards compatible with the current
implementation. Developers updating their existing Django installation
will have the same features available, without having to update their code
or undergo a database migration.
- Separation of concerns
The concepts of identity, authentication and authorization should not be
tightly coupled by default. Developers will be able to mix and match
schemes if they want, or use a more integrated approach otherwise.
- Extensibility
It's impossible to cover all existing and future authentication /
authorization schemes. Developers who have needs that aren't covered by
Django should be able to extend contrib.auth by writing their own or using
third party user models. Other apps that need to interact with users (e.g.
a commenting app) will be able to interact with them without knowing the
specific implementation.
- Batteries included
Developers will not be required to implement their own authentication or
authorization schemes for the common cases. Contrib.auth will have one or
more built in user models ready for production.
- Admin compatibility
Contrib.admin is one of the things that make Django great. It should be
easy for developers to make their user models accessible through the admin
interface.
Terminology
-----------
In order to distinguish between Django's users, a website's users and the
proposed concept of a user, the terms "developer", "visitor" and "user"
are used respectively.
Implementation
--------------
The new contrib.auth will serve three primary functions:
- Assist developers in writing custom user models.
- Provide built in user models.
- Provide a way for developers to interact generically with user models.
The new contrib.auth will allow developers to specify multiple user
models. A new setting will be introduced called `USER_MODELS` which will
be a tuple of strings. Each string's format will be `'%(app).%(model)'`,
similarly to how `AUTH_PROFILE_MODULE` is currently defined. This setting
will look something like:
USER_MODELS = (
'auth.EmailUser',
'myapp.CustomUser',
# ...
)
### Separation of concerns and the user contract
A user model will be a regular model that fulfills a predefined contract.
Developers will be able to write their own user models in their
applications and register them through the `USER_MODELS` setting. Any
model that fulfills the user contract and is specified in `USER_MODELS`
will be considered a valid user model. The contract will be subdivided in
three subcontracts that will be concerned with identity, authentication
and authorization.
#### Identity
The identity contract will answer the "who are you" question for a user
model. This could be anything from a twitter handle to a biometric
identifier, or anything else that can uniquely identify a user in a user
model. To fulfill the identity contract the user model will have to:
- Implement an `identity` method.
This method must return a string that uniquely represents a user in a
user model.
- Implement a `get_by_identity` method on the model's default manager.
This method must accept a string representing a user, as returned by the
`identity` method, and return the either a user object, if it finds one,
or `None` otherwise.
- Implement a `__unicode__` method.
This method must return a string representation of a user that is
suitable for displaying on the frontend. This will not be required to be
unique across all users in a user model.
#### Authentication
The authentication contract will be concerned with verifying that the user
is indeed who he claims to be. Again, this could be anything from a
password to a hardware dongle. To fulfill the authentication contract the
user model will have to:
- Implement an `is_authenicated` method.
This method must accept a request. Contrary to the current
implementation, the `is_authenticated` method will not always return
`True`. It is up to the implementation to decide whether the user is
actually authenticated.
- Implement an `is_authenticated_by` method.
This method must accept any arguments necessary to authenticate the user
and return `True` on success, or `False` otherwise. It must not store the
user in the session or modify the user object's internal or external state
to indicate that he is authenticated.
- Implement an `login` method.
This method must accept a request and any arguments necessary to
authenticate the user. It will return `True` on success, or `False`
otherwise. This method must store the user in the session or modify the
user object's internal or external state to indicate that he is
authenticated.
- Implement a `logout` method.
This method must accept a request. This method must remove the user from
the session or modify the user object's internal or external state to
indicate that he is no longer authenticated.
#### Authorization
The authorization contract will be responsible for keeping track of what a
user model is allowed to do. To fulfill the Authorization contract a user
model will have to:
- Implement a `has_perm` method.
This method must accept a string representing a permission and an
optional object. It will return a boolean value indicating whether the
user has that permission. If an object was passed as well, it must
indicate whether the user has the permission for the specified object. The
string format can be anything and will be implementation specific.
- Implement a `has_perms` method.
This method must accept a list of permissions and an optional object. It
will behave similarly to the `has_perm` method, but it will return `True`
only if the user has all permissions.
- Implement an `add_perm` method.
This method must accept a permission and an optional object and it will
add the permission to the user. If an object was passed as well it will
add the permission only for that specific object.
- Implement a `remove_perm` method.
This method must accept a permission and an optional object and it will
remove the permission from the user. If an object was passed as well it
will remove the permission only for that specific object.
Of course, the user model could have more methods or fields that are
relevant to the specific implementation. For example an authorization
mixin that categorizes permissions per app could also have a
`has_module_perms` method.
### Anonymous users
Since the `is_authenticated` method will not always return `True`, there
is no longer a need for a separate anonymous user model. Any user model
can be used to represent a user that is not logged in. By default,
contrib.auth will use the first specified model in the `USER_MODELS`
setting. If a developer want to change the user model that represents a
user that is not logged in, there will be an appropriate function to do so.
### Writing new User Models
To assist developers in writing new User Models contrib.auth will include
mixins that developers can subclass to get the desired functionality.
These mixins will be implemented as abstract models and will live in
`contrib.auth.mixins`. Specifically, it will contain the following mixins:
- Identity mixins
Contrib.auth will contain a `UsernameIdentityMixin` and an
`EmailIdentityMixin` that will identify a user by a username and an email
respectively.
- Authentication mixins
Contrib.auth will contain two authentication mixins.
- `SessionAuthenticationMixin`
`SessionAuthenticationMixin` will implement the `is_authenticated`,
`login` and `logout` methods. These methods will check, store and remove
the user from the session respectively. If a developer wants to simply
store the user in the session to indicate he is logged in, he can inherit
from this mixin and implement only the `is_authenticated_by` method.
- `PasswordAuthenticationMixin`
`PasswordAuthenticationMixin` will be a subclass of
`SessionAuthenticationMixin` and will implement the `is_authenticated_by`
method to authorize a user by checking password against a hash stored in a
database.
- Authorization mixins
Contrib.auth will contain a `StandardAuthorizationMixin` that will store
each user's permissions as `'%(app).%(perm)'` strings in a database,
similarly to the current contrib.auth implementation.
- Complete Mixins
Contrib.auth will contain two mixins implementing the whole user model
contract, `StandardUserMixin` and `EmailUserMixin`. They will both be
subclasses of the above mixins, the first one using a username for
identification and the second one an email address.
Using the above mixins, a developer will be able to use parts of the built
in mixins and implement the rest of the user contract himself. For
example, a developer that wants to use the built-in authorization with his
own identification and authentication scheme could subclass the
`StandardAuthorizationMixin` and add the methods and fields his
implementation requires.
This would also make writing a user model that uses a custom profile
trivial. As an example:
from django.db import models
from django.contrib.auth import mixins
class CustomUser(mixins.EmailUserMixin):
first_name = models.CharField(max_length=30)
# ...
This will also allow the Django community to fill the needs of developers.
For example, a third party app that provides functionality related to
Facebook authentication could provide related mixins, user models and
forms, in the same way that the current contrib.auth provides them for the
`User` model.
### Built in User Models
Contrib.auth will have at least 2 built in user models.
- `LegacyUser`
This will be the user model that will be used if the `USER_MODELS`
setting isn't set. It will have the same fields and restrictions as the
current user model. This ensures that developers updating their Django
installations will get a user model that behaves exactly as it used to,
without having to modify their code or database.
- `CompatibilityUser`
This user model will be similar to LegacyUser but it will solve most of
the common complaints with the current user model. Specifically, it will
have the following differences:
- The username field will be made optional, its size will be raised to
254 characters and it will be able to contain any character.
- The email field will be made unique, it will have a database index and
its size will be raised to 254 characters
It will be relatively easy for developers that want to go through a
database migration to upgrade their project's auth model to
CompatibilityUser. The auth_user table's schema will have to be altered,
but not the data. Additionally, migration scripts for all the supported
backends will be provided in the documentation.
For compatibility reasons, the first two user models will both be called
`User` and will use the `auth_user` table. Developers will indicate that
they want to use one of them by adding either `'auth.LegacyUser'` or
`'auth.CompatibilityUser'` to the `USER_MODELS` setting. Specifying both
will raise a configuration error.
In addition to the above models, contrib.auth will include additional user
models. For example it could include a `StandardUser` and an `EmailUser`
that uses the above mentioned mixins.
### Interacting with users.
Developers will be interacting with user models the same way they interact
with all other models. If, for example, a visitor submits a 'login with
Facebook' form, the developer will import the relevant model and interact
with it. Similarly, if the developer wants to access some information
about the currently logged in user he will simply grab the user instance
from the request.
That said, there needs to be a way to have a generic key to a user model,
without knowing which specific model is used. This will be provided by a
very thin abstraction on top of Generic Foreign Keys. An abstract model
will be introduced that will serve two functions.
- It will make sure that the model provided as a user is indeed a user
model (specified in `USER_MODELS`)
- It will take care of the boilerplate code that is needed to use a
generic foreign key.
An example implementation could be something like:
class GenericUserRelation(models.Model):
_content_type = models.ForeignKey(ContentType)
_object_id = models.PositiveIntegerField()
_user = GenericForeignKey('_content_type', '_object_id')
def _get_user(self):
return self._user
def _set_user(self, user):
if auth.is_user_model(user):
self._user = user
else:
raise Exception
user = property(_get_user, _set_user)
class Meta:
abstract = True
Using the above mixin, a model that needs a foreign key to a user would be
written like so:
class Comment(GenericUserRelation):
title = models.CharField(max_length=50)
# ...
comment = Comment(title="Hey!", user=some_user)
print comment.user.email
### Helper functions
The contib.auth will also have a few helper functions.
- `is_user_model`
This function will accept a model instance and return True if it's an
instance of a model listed in `USER_MODELS`, or False otherwise.
- `get_by_identity`
This function will accept a string representing a user's identity and
return that user, or None if one is not found. It will do so by traversing
the models found in the `USER_MODELS` setting, calling the
`get_by_identity` method on each model's manager. It will return the user
if it finds one, or `None` otherwise.
- `is_authenticated_by`
This function will accept an identity, *args and **kwargs. It will use
`get_by_identity` to find a user object and it will then call its
`is_authenticated_by` function with *args and **kwargs as arguments. It
will return `True` if it finds a user that can be succesfully
authenticated, or `False` otherwise.
- `get_and_authenticate`
This function will accept an identity, a request *args and **kwargs. It
will use `get_by_identity` to find a user object and it will then log in
the user by calling its `authenticate` method with *args and **kwargs as
arguments. It will return the user if it finds one that can be succesfully
authenticated, or `None` otherwise.
- `set_anonymous_user`
This function will accept a request and a model and set the model as the
user model that represents an anonymous user for this session.
### Deprecated features
The following features of the current contrib.auth will be deprecated
according to Django's deprecation policy
- The `LegacyUser` and `CompatibilityUser` models.
These two models are provided only for backwards compatibility.
Developers will be expected to migrate their users to a new user model
before they are removed from Django.
- Any methods and fields of the `User` model not mentioned above.
While all of them will be provided on the `LegacyUser` and
`CompatibilityUser` models and many of them may be provided on new models,
they will not be part of the user contract. Developers should not expect
them to be available on any user model.
- The `AnonymousUser` model
Since any user model will be able to be used as an anonymous user, the
existing model will be deprecated.
- The `AUTH_PROFILE_MODULE` setting.
Since developers will be able to add any fields in their user model this
setting will be deprecated. It will continue to be used by the
`LegacyUser` and `CompatibilityUser` models.
- The `AUTHENTICATION_BACKENDS` setting and authentication backends in
general.
For the same reasons as with user profiles, authentication backends will
be deprecated. Developers will be able to implement a custom
authentication / authorization scheme directly on the user model.
Contrib.admin compatibility
---------------------------
There are two ways in which contrib.auth must achieve compatibility with
contrib.admin
- Ability to modify users
Since user models are regular models, contrib.admin will be able to
create, modify and delete users normally. If a user model has some special
needs (e.g. a password reset form) it can do so through admin.py, as the
current contrib.auth does.
Additionally, contrib.admin could be made aware of the `USER_MODELS`
setting and have a special view that lets a visitor manage users, but I
don't think this is really necessary.
- Authentication / Authorization
There should be a way to use the contrib.admin with custom user models.
This means that the user should be able to log in with admin's log in form
and admin's permission should work with the user model's permissions
Authenticating with any user model should be easy with very minor
modifications to admin's code, as long as that model can be authenticated
with and identity - key pair. This means that both `EmailUser` and
`StandardUser` could be used out of the box to log in the backend.
Making an authorization scheme admin compatible should be the
responsibility of the authorization scheme developer. As long as the
authorization scheme represents permissions in a way contrib.admin
understands (i.e. `'%(app).%(action)_$(model)'`) admin will be able to use
that custom user model.
Rationale
---------
Since contrib.auth has been the subject of intense debate on
django-developers over the past few weeks, I would like to explain the
reasoning behind some of the decisions I made.
### Multiple user models
There are actually a few reasons why having multiple user models at the
same time makes sense.
- It allows developers to easily deal with different types of users.
For example, imagine a scenario where we want a website's visitors to
login with an email - password pair and we want to keep extensive profile
information about them. On the other hand, we want the staff to login to
the admin with a username and password stored on an LDAP server. While
this is possible to achieve with a single user model it requires some
hacks. The developer would have to store emails on a username field or
make both nullable and store only one of them for each type of users,
moving his validations from the model layer to the view layer.
Using different models for each user makes the above scenario easy to
deal with.
- It is easy to respond to requirement changes
Continuing the above example, lets say that we want to run a promotion
and certain users will be able to log in using just a promo code. Even if
we've made the user model work with the two above cases, we would need to
further modify it by adding even more nullable fields, flags, special case
their permissions etc. With multiple user models we can just add an extra
user model.
### Permission and identity formats
Permissions are defined to be strings. A case could be made for either a
more strict or a looser definition. Requiring permissions to keep their
current format (`'%(app).%(perm)'`) would make permissions more
standardized across different user models but some permissions don't fit
well the app concept. These could be either site-wide permissions or even
permissions from external sources (e.g. oAuth permissions regarding
content on another site).
On the other hand it could be helpful for permissions to be more complex
objects. The current permission format is already two different pieces of
data separated by a period and could perhaps be better represented by a
tuple. That said, allowing a permission to be any object would be a huge
pain for user model developers. They would have to accept any argument as
a permission, manually check its type and fail gracefully if they received
a different data type than they expected.
A simple string was chosen as a middle ground. This way user models can
represent permission as they wish internally, as long as they can provide
a public interface that accepts a string.
The representation of a user's identity faces the same issue. It could be
better represented by a more complex type, or even reuse the concept of
natural keys. As with permissions, this would make it difficult to
interact with a user model if you don't know which specific model was
used, so a string was used as well.
Conclusion
----------
The proposal does not yet include an estimate, since I'm waiting for your
feedback to adjust it accordingly.
Thanks for reading.
Hi Stratos
It's a long proposal, so this is a brain dump of bits that I find
interesting/worrisome.
I'm sure you've followed the recent threads on the topic. The (wildly)
different solutions garnered from those long threads are all listed on
this wiki page:
https://code.djangoproject.com/wiki/ContribAuthImprovements
I don't think this proposal ties in with any of them? Your proposal
involves multiple user models, whilst none of them do.
Login:
Where have auth backends gone in your plan? Why do user objects have a
login method, login should be distinct from user objects, otherwise
login is coupled to a user object, and you cannot log in to the same
user using different authentication techniques.
It is common these days to provide multiple ways of authenticating to
your users, if I authenticate by smartcard, user/password, Facebook
auth, SAML federation, or a "remember me" signed auth cookie, I should
still get the same user object. The choice of authentication is
irrelevant.
More to the point, it should be *my* choice. Forcing authentication
into the user model removes that choice, or requires us to have N user
models per user, one per auth method.
Authentication mixins:
This goes to the above point; if I have to mixin an authentication
class to my user object (adding the required login() + others
methods), it means I can only have one authentication mechanism for a
particular user model.
Deprecating user profiles:
The purpose of user profiles is to provide a place for a pluggable app
to store its own information about a user. Adding a pluggable app
should not mean having to add fields to your user model, but with no
user profiles, that's what is suggested.
Once you have a significant number of pluggable apps, you could have a
User model with a crazy number of fields - I can envisage scenarios
where you have over 100 fields. This makes doing anything with the
user model slower. Furthermore, storage for each field is now required
for every user, whilst with a pluggable profile, it would only exist
if the user utilizes that app.
Cheers
Tom
> On Fri, Mar 30, 2012 at 10:39 AM, Stratos Moros <stm...@gmail.com>
> wrote:
>> You can read the proposal nicely formatted here:
>> https://gist.github.com/8dd9fb27127b44d4e789
>
> Hi Stratos
>
> It's a long proposal, so this is a brain dump of bits that I find
> interesting/worrisome.
Hi Tom.
I'm answering your 3 questions together because they are related to each
other.
In my proposal, a user model represents an authentication/authorization
scheme. This is why you have to mixin the authentication scheme directly
in the model. All other information, such as first_name, last_name etc.
should live in the user's profile.
The idea isn't to remove custom profiles, but to let developers control
where they're used. The example I provided about adding custom profiles
was rather bad. The way I expect most developers to add a fields to a user
is:
class CustomUser(SomeMixins):
profile = models.ForeignKey(UserPorfile)
This way you can allow your frontend users to log in with, say, either
with username & password or openid and have a single profile between them,
while allowing your backend users to login with credentials from an LDAP
server and have a different profile for them.
This is why authentication backends are deprecated. If you can't specify
which authentication mechanism is used for each user model, you can't
easily accomplish the above scenario. You would have to have a single
table to represent the two different profiles and you would somehow make
sure that a user trying to log in with facebook doesn't go through the
LDAP server.
This is also why the identity in the user model is a method and not a
field. If you want a single user to be authenticated in multiple ways, you
can override the identity method and return the same value.
Perhaps the profile field should be made part of the user contract, but
this would be an annoyance in cases where you don't want to store any
additional information about the users, but use them only for
authorization purposes (ie. I don't care about his info, I just want to
know if he can access this page).
Having login and logout be part of the user model was one of the things
that I wasn't sure about. The reason I put them in the user model is to
decouple the whole process from the session. This way an authentication
scheme that doesn't want to use the session to store a logged in user can
do so. The common case would obviously be to store them in the session, so
I included the SessionAuthenticationMixin.
The same goes for authorization. Inheriting from a mixin implementing the
authorization contract would probably add a foreign key to the table
storing the permissions plus a few methods to query it. This way you can
handle different user models' permissions in completely different ways.
Thanks for the feedback. I hope I've answered your questions.
Hi Stratos,
The core team is going to take the lead on the auth.User refactoring
-- specifically, yours truly. :-)
Given that the Summer of Code policy prohibits code contributions from
non-students (right?), I don't think the User refactoring would work
as a Summer of Code project. Sorry to break this news, as you've
clearly done a lot of preparation and thinking about the issue, but of
course we'd love to have you contribute to this particular feature
*outside* the Summer of Code.
Adrian
> Hi Stratos,
>
> The core team is going to take the lead on the auth.User refactoring
> -- specifically, yours truly. :-)
>
> Given that the Summer of Code policy prohibits code contributions from
> non-students (right?), I don't think the User refactoring would work
> as a Summer of Code project. Sorry to break this news, as you've
> clearly done a lot of preparation and thinking about the issue, but of
> course we'd love to have you contribute to this particular feature
> *outside* the Summer of Code.
>
> Adrian
>
Ok. I'd love to contribute either way.
Thanks for the info.