[GSoC 2013] Revamping validation functionality proposal

293 views
Skip to first unread message

Christopher Medrela

unread,
Apr 16, 2013, 1:09:37 PM4/16/13
to django-d...@googlegroups.com
I would like to participate in Google Summer of Code this year. I've written proposal for "revamping validation functionality" idea. It's available as a gist: https://gist.github.com/chrismedrela/82cbda8d2a78a280a129 . Below, I'm pasting the proposal. Feel free to criticize it.

Table of content

1. Abstract 1.1 Drawbacks of the existing framework 1.2 Goals 1.3 Benefits
2. The new framework 2.1 Overview 2.2 Advantages
3. New features 3.1 django-secure 3.2 django-update
4. Schedule and milestones
5. Who am I?

1. Abstract

1.1 Drawbacks of the existing framework

Django currently has a validation framework, but there are a lot of problems with it. First of all, it is monolithic and developers cannot write custom validation (see [#16905]) or modify the existing functionality (see [#12674]); validation lives in a few functions like django.core.management.validation.get_validation_errors [1] or django.contrib.admin.validation.validate [2]. The validation functionality is not separated from other stuff like printing found errors during validating models in get_validation_errors or registering models in admin app [3] (see [#8579]); it is sometimes done during first call to an important method, i. e. CurrentSiteManager [4] is validated in its get_queryset [5] method.

[#12674] https://code.djangoproject.com/ticket/12674
[#16905] https://code.djangoproject.com/ticket/16905
[1] https://github.com/django/django/blob/master/django/core/management/validation.py#L22
[2] https://github.com/django/django/blob/master/django/contrib/admin/validation.py#L14
[3] https://github.com/django/django/blob/master/django/contrib/admin/sites.py#L52

There are few tests of the validation framework and it is not easily testable because validation functions return concatenated error messages instead of list of errors (see django.tests.invalid_models.invalid_models.models [6]). It also lacks some features like warnings (see [#19126]). Due to this disadvantages lots of apps do not have any validation, i. e. they do not check inter-app dependencies.

[6] https://github.com/django/django/blob/master/tests/invalid_models/invalid_models/models.py#L366

1.2 Goals

This proposal is about revamping current validation framework. First of all, we need to write more tests and rewrite existing ones. Then we need an consistent API of validating different kinds of objects like models, fields, managers or whole apps so it will be easy to add new kind of object. Validation functionality should be separated from other stuff and it should minimize dependencies. We should allow developers to add validation to their apps and any other kind of objects, so custom validation is a must. We will not break backward compatibility.

This proposal is not only about refactoring but also introducing new features. Introducing warnings is first feature that allows us to add security warnings based on django-secure [7] app. The second main feature is making switching to newer version of Django easier by developing django-update app which should inspect settings as well as models and predict and warn about some problems.

[7] https://github.com/carljm/django-secure

1.3 Benefits

There are a lot of benefits. Cleaning code, removing unwanted dependencies and adding more tests are the most obvious ones. We will also benefit from long term solution which will be easy to maintain since it is extendable. Switching Django version will be piece of cake. We will improve security of Django projects thanks to security warnings. This also implies that Django will be considered as a safe and secure framework. Better opinion is always desired.

2. The new framework

2.1 Overview

The API is based on Honza Kral idea from his patch [8]. An developer can add new validation functionality by writing a callable piece of code. It will be automatically called during validating whole project (triggered by python manage.py validate) and it must fulfill the following contract: it has no arguments and returns a list of warnings and errors or yields each of them.

[8] https://github.com/HonzaKral/django/compare/master...ticket-12674

The validated object may be a model, a manager, a field or an app. In case of a field or a manager, the callable piece of code is a method. In case of a model, it is a classmethod. In case of an app, we will put functions in validation.py file.

Let's see an example::

class FieldFile(File):
    (lots of stuff)

    def validate_upload_to_attribute(self):
        if not self.upload_to:
            yield Error(obj=self, msg="required 'upload_to' attribute",
                explanation="Django need to know the directory where uploaded files should be stored.",
                solution="Add \"upload_to = 'path/to/directory/on/server'\" to model %(model)s in file %(model_file)s.")

Notice that validation stuff is inherited by child classes. In an uncommon case when some validation stuff should be turned off, you can overwrite the method with "an empty method" to prevent it from executing. "Private" methods starting with _validate are not executed.

validate_upload_to_attribute method and all other validation methods are called from validate method which fulfill the same contract as validate_* methods and calls all validate_* methods by default. By default, validate method is inherited so you do not have to write it. In case of an app, you will have to write it if you are switching from earlier Django version. If you create a new app in Django 1.6, there will be validation.py file created by default and it will contain validate as well as validate_models functions.

Validation chain.

When a developer types python manage.py validate (or any other command which triggers validation) then all apps are loaded and then for each app its validate method from validation.py file is called (for backward compatibility, if the file is missing, then it is fine -- a default validation module, the same as default stub of validation.py, will be used). It calls all validate_* functions. One of them named validate_models calls validate classmethod for each model of this app. Then models validate its fields and managers. That is the "validation chain".

Errors and warnings.

The new framework introduces two new classes called Warning and Error (ValidationError would collide with django.forms.ValidationError). They are really similar, they differ only in their meaning. Their fields are: obj, app, msg, explanation and solution. app is the app where the error or warning was created; the attribute is not set in the example because it is set by default validate function of an app. obj is the invalid object (it may be the same as app). Errors connected with particular model (like in the example -- the invalid object is a field, but the field will be attached to a model) will have additional model attribute thanks to default validate classmethod of models.

I think that Django error messages are often confusing and they often contain neither solution proposals nor explanation of a problem. I think that we should force Django contributors to think about it and separating error message (msg), explanation and possible solution list is a way to do it. We need to make using Django as simple as possible and suggesting solutions and describing a problem in details is a way to do it. It will be really important when contributors and developers start to write more complex validation like checking inter-app dependencies.

Error messages (as well as solution and explanation fields) can be formatted. %(model)s will be replaced with model attribute of the error. %(model_file)s will be path to the file where the model exists.

2.2 Advantages

First of all, the solution is as simple as possible. There is only one concept which developers have to learn -- the contract of validate[_*] methods. You do not have to know about validate method or validation chain to write your first validation piece of code which makes it easier for newbies to play with Django (excluding validating an app if you are switching from earlier Django version, but in that case you are probably an experienced user).

As you can see, the solution is consistent between different kinds of objects and it does not assume that only a fixed set of object types can be validated. I believe that good long term solution should be extendable and the new framework allows us to easily add validation of new type of object -- just modify the validation chain.

One may argue that a developer have to remember when he has to use methods, when classmethods and when functions. He may propose using validator class (like in Honza Kral's patch) -- a validated object would have to point to its validator instance. That would be a progress if it would not cause a lot of other problems. For example, if you have field A inheriting from field B then you have to remember to inherit that A validator should inherit from B validator. It also implies that you have to write a new class even though you want to validate only one small thing.

My proposal solves also a lot of existing problems. This solution plays well with almost all validation stuff in Django, i. e. existing validation of ModelAdmin [9] can be done in admin app (all apps are loaded before validation so all models will be registered before validation starts). What's more, it is also good solution for a lot of other use cases. The new framework allows us to write new kind of apps -- apps containing only validation stuff. This is how switching to newer Django version will be made easier -- there will be new django-update app which will inspect settings file and predict some problems. In the same way new django-secure app will increase security of Django projects.

[9] https://github.com/django/django/blob/master/django/contrib/admin/validation.py#L14

3. New features

That was first part -- mainly refactoring. The second part is about introducing new features like increasing security thanks to django-secure app and making switching Django versions easier thanks to new django-update app.

3.1 django-secure

This feature will be based on existing django-secure [10] project. The project does security checks (like checking if CsrfViewMiddleware is in MIDDLEWARE_CLASSES setting). Existing django-secure uses, among other things, a middleware to check safety of every request (like checking if x-frame is set to DENY). Our validation framework cannot check things like that, so the new django-secure app will be only part of the existing one -- I will base on only check/csrf.py [11], check/djangosecure.py [12], check/sessions.py [13] and tests.py [14] files.

[10] https://github.com/carljm/django-secure
[11] https://github.com/carljm/django-secure/blob/master/djangosecure/check/csrf.py
[12] https://github.com/carljm/django-secure/blob/master/djangosecure/check/djangosecure.py
[13] https://github.com/carljm/django-secure/blob/master/djangosecure/check/sessions.py
[14] https://github.com/carljm/django-secure/blob/master/djangosecure/tests.py

The new django-secure will use only warnings (no errors). There will be no checksecure command (see [#17101]) -- the security checks will be part of validate command.

[#17101] https://code.djangoproject.com/ticket/17101

3.2 django-update

django-update will be an entirely new app. It also will use only warnings. One of them may be warning that sites framework doesn't exist any longer if somebody will try to use it. Another one may check all models if their fields are called hour, minute or second which is not allowed now (because it clashes with new ORM query lookups [15]). If an model has BooleanField without default value, we can warn that the default value was changed [16]. I will prepare list of all predictable problems before starting coding.

django-update should be heavily tested because it will be used only once, just after releasing a new Django version.

4. Schedule and milestones

Before starting coding I would like to do some preparation:
  • Writing and discussing full API.
  • List of new tests for validation of fields (mostly checking clashes with ORM querylookups).
  • List of problems which may be automatically predicted during upgrading Django; the list will be based on the changelog [17].
  • List of security issues that can be predicted; the list will be based on the existing django-secure project.

As you probably noticed, this proposal contains two parts: "required" one which is about creating new validation framework and is mostly refactoring; and the second "optional" one which is set of new features: django-secure and django-update. That affected the schedule.

4.1 Milestone 1 (first half) -- new validation framework

For the first two or three weeks, I will have exams at university so I cannot work 8 hours a day. After 6 July (or after 29 June if I pass all exams quickly) I will have neither job nor trip.
  • (1 week) Writing tests for field validation and rewriting the existing ones. The list of tests will be created before first week.
  • (1 week) Rewriting field validation (now it lives mostly in django.core.management.validation [18]). Introducing Warning and Error classes.
  • (1 week) Tests and implementation of validation of models and managers + tests. Rewrite validation of User model of auth app.
  • (1 week) Tests and implementation of validation of apps.
  • (1 week) Rewriting validation of AdminModel in admin app as well as its tests. The validation lives in django.contrib.admin.validation [19] now.
  • (1 week) Writing documentation.

4.2 Second half -- new features

  • Milestone 2. (2 weeks) django-secure. One week for writing tests and documentation and one for implementation.
  • Milestone 3. (3 weeks) django-update. Updating list of problems which may be predicted after upgrading Django, tests and documentation -- one half week. Implementation -- one and half week.
  • (1 week) Buffer week.
5. Who am I?

My name is Christopher Mędrela and I am student of University of Science and Technology in Krakow (Poland). I program in Python for at least 4 years. I program also in C and Java. I have started contributing Django 16 months ago. I have submitted a lot of patches [20] (this is not list of all patches). This year I have started working at big [#17093] ticket ("Refactor django.template to quarantine global state"). It is not finished, but it shows that I am able to deal with big tasks.

Some time ago I was working for long time (more than one year) on my own project named Scriptcraft [21] that was a programming game, but I suspended this project to focus on Django contribution. It also shows that I am not lazy and I can push myself even though there is no external motivator. :)

My e-mail is chrismedrela+gsoc at gmail.com. You can find me also at #django-dev and #gsoc IRC channels. My nick is chrismed.

Jacob Kaplan-Moss

unread,
Apr 17, 2013, 10:48:10 AM4/17/13
to django-developers
Hi Christopher -

Thanks for the proposal, this is quite good. I really appreciate the
detail; it's clear you've put a lot of thought into this. In general,
I think this is a strong proposal and one I'd support. However, I do
have some comments/feedback:

1. We've had some discussions about bringing django-secure into core
as part of a more general "checkdeploy" command. The idea being
something you can run shortly before deployment that'd check for all
the stuff that django-secure checks for -- but also more (outdated
dependencies, debug mode, exposed admin, etc). I think this dovetails
nicely with your proposal: it seems that all these "checks"
(validation, deployment, security) could use a single discovery and
running mechanism. I'd love to see you think about modifying your
proposal to include this sort of unification as well as the bringing
of django-secure into core.

2. I think your proposal is a bit too big. I'd general prefer to see a
less ambitious proposal with a high probability of success over a high
risk/high reward. I'd like to see you drop the "django-updates" part
of the proposal, and focus on validation and django-secure. If you end
up with extra time, you can always use it to write more checks.

3. You've done a good job breaking up the first half of the project
into one week chucks, which shows me you've thought carefully about
the tasks and steps involved. However, when you get to the second
half, you're a lot more vague. I'd like to see you think more
carefully about your time during that second half.

4. Pet peeve alert: "documentation" shouldn't be an afterthought. I
HATE seeing "week X: documentation" -- it implies that you're planning
on *not* writing documentation as you go, but instead saving it for
last. You've been around long enough to know that's not how we do
things; documentation happens alongside code. You'd make me much
happier if you updated your proposal to not imply that you'd leave
documentation for later.

Again, I want to stress that overall this is a really solid proposal;
don't take my criticism *too* hard. I think it could be an excellent
one, though, so I hope you'll take my suggestions into account. Good
luck!

Jacob
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Russell Keith-Magee

unread,
Apr 20, 2013, 9:36:44 PM4/20/13
to django-d...@googlegroups.com
Hi Christopher,

I'd like to echo Jacobs comments here. This is a strong proposal, and I can see this being one that we'd support. 

Jacob's feedback is also spot on; he's pretty much flagged the exact concerns that I had when I read it.

I'd possibly add one additional point - the potential for confusion between validation of the type you're describing, and "model validation". This isn't a problem of your making - the ambiguity already exists in Django. However, if we're adding API points on fields, which already support the idea of "validators", we need to be careful we don't confuse issues more. To that end, I'd be interested in seeing at least some initial thoughts on how we can structure the API or change the naming so that this issue is controlled.

On a practicality issue, I'd also like to see one more piece of detail in your "why me" section. Since your from Poland, English is (I'm guessing) a second language. Your proposal is very clear and shows you have good communications skills. However, what we don't know is how long it took you to write this proposal, and how comfortable you are working in English. If you're not comfortable working in English in "real time" (e.g., on a spoken phone call, or in IRC), then that will alter the way that you and your mentor will interact - you may wish to only communicate via email, for example; and this may slow down the rate of feedback that you can get from your mentor.

I'd like to emphasise that I am *not* saying that we intend to give preferential treatment to native English speakers. This is an issue of finding me best possible mentor to match you with, and set expectations for how quickly you will be able to make progress on your project.

Yours,
Russ Magee %-) 

Christopher Medrela

unread,
Apr 26, 2013, 3:54:02 AM4/26/13
to django-d...@googlegroups.com
Thank you for your feedback. I really appreciate every comment because that let me improve my proposal.

1. First of all, I noticed that the license of django-secure is copyright. Google forces us to release code under one of approved license [1], so a question to Carl Meyer: do you mind changing license?

On a practicality issue, I'd also like to see one more piece of detail in your "why me" section. Since your from Poland, English is (I'm guessing) a second language. Your proposal is very clear and shows you have good communications skills. However, what we don't know is how long it took you to write this proposal, and how comfortable you are working in English. If you're not comfortable working in English in "real time" (e.g., on a spoken phone call, or in IRC), then that will alter the way that you and your mentor will interact - you may wish to only communicate via email, for example; and this may slow down the rate of feedback that you can get from your mentor.

2. Last year and this year I attended FCE conversation class, so my English level isn't worse than FCE. I guess that I'm not ready to speak in real time, but I can talk in real time at IRC. But you know, it's hard to judge your own skills.

1. We've had some discussions about bringing django-secure into core
as part of a more general "checkdeploy" command. The idea being
something you can run shortly before deployment that'd check for all
the stuff that django-secure checks for -- but also more (outdated
dependencies, debug mode, exposed admin, etc). I think this dovetails
nicely with your proposal: it seems that all these "checks"
(validation, deployment, security) could use a single discovery and
running mechanism. I'd love to see you think about modifying your
proposal to include this sort of unification as well as the bringing
of django-secure into core.

3. I tried to find discussions about the additional functionality expected from django-secure (you mentioned checking outdated dependencies, debug mode and exposed admin), but I didn't succeed. Could you point me to any discussion on this mailing list or in the ticket tracker or anywhere else?

I'd possibly add one additional point - the potential for confusion between validation of the type you're describing, and "model validation". This isn't a problem of your making - the ambiguity already exists in Django. However, if we're adding API points on fields, which already support the idea of "validators", we need to be careful we don't confuse issues more. To that end, I'd be interested in seeing at least some initial thoughts on how we can structure the API or change the naming so that this issue is controlled.

4. It looks like changing validate into validate_all solves the problem. We also have to emphasise the difference between those two mechanisms in documentation.

5. I said that I won't have any trip during GSoC, but I forget about my holiday after September 6. This is not backpacking-trip, I will live in a hotel with net access, but I won't be able to work full time. I hope that you won't disqualify my proposal on this basis -- that can be considered as an advantage because I will be highly motivated to finish before time.

6. I've updated my proposal. I'm not repasting it here -- that would be too much mess. If you want to see the new version, look at the gist [2]. Changes to the proposal in short:
  • Rewriting the second part of the proposal. The proposal is not finished, it lacks the "improving django-secure" part.
  • Changing schedule of the first half -- removing "write documentation" week.
  • Minor changes. Changing solution -> hint. Changing validate -> validate_all. Changed app validation rules: the file validation.py won't be created by default for new apps; if the file exists but it misses validate_all or validate_models functions, then the default ones are used. 

Tom Evans

unread,
Apr 26, 2013, 6:11:26 AM4/26/13
to django-d...@googlegroups.com
On Fri, Apr 26, 2013 at 8:54 AM, Christopher Medrela
<chris....@gmail.com> wrote:
> Thank you for your feedback. I really appreciate every comment because that
> let me improve my proposal.
>
> 1. First of all, I noticed that the license of django-secure is copyright.
> Google forces us to release code under one of approved license [1], so a
> question to Carl Meyer: do you mind changing license?
>

The license is BSD 3-clause:

https://github.com/carljm/django-secure/blob/master/LICENSE.txt

Cheers

Tom

Horst Jäger

unread,
Apr 26, 2013, 9:04:07 AM4/26/13
to django-d...@googlegroups.com

Hi,

I am a complete newbie to Django but not to Python and I have to edit a few lines of code in an existing Django project within the next few days. Of corse, I have to do so not in the production- but in a separate development environmet.

I have:

1. a running Django server - the development environmet
2. an export of the existing existing Django project which I have copied into the django_projects folder

I also followed the tutorial

https://docs.djangoproject.com/en/dev/intro/tutorial01/

. I have created the "mysite" demo project, I have - as far as the console tells me - successfully started it - but I can't see it.

All I have to to is get the existing project to run on another machine change a few lines, check it in the browser and be done with it.

Any help would be very much apprecíated.

Thanks in advance

Horst
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Horst Jäger h.ja...@medienkonzepte.de
Medienkonzepte http://www.medienkonzepte.de/
Schaafenstr. 25, 50676 Köln, Germany
Tel +49 221 93187015 / Fax +49 221 93187029

Karen Tracey

unread,
Apr 26, 2013, 9:06:47 AM4/26/13
to django-d...@googlegroups.com
Please ask questions about using Django on django-users. The topic of this list is the development of Django itself.

Thanks,
Karen


Jacob Kaplan-Moss

unread,
Apr 26, 2013, 11:44:17 AM4/26/13
to django-developers
On Fri, Apr 26, 2013 at 3:54 AM, Christopher Medrela
<chris....@gmail.com> wrote:
> 1. First of all, I noticed that the license of django-secure is copyright.
> Google forces us to release code under one of approved license [1], so a
> question to Carl Meyer: do you mind changing license?

django-secure is licensed under the same BSD license that Django
itself uses, so there's no problem. I've also talked with Carl in the
past about merging django-secure into core, and he's fine with that.
So we're OK both legally and procedurally.

> 3. I tried to find discussions about the additional functionality expected
> from django-secure (you mentioned checking outdated dependencies, debug mode
> and exposed admin), but I didn't succeed. Could you point me to any
> discussion on this mailing list or in the ticket tracker or anywhere else?

I believe that most of these discussions have been in IRC or in
real-life, so there aren't records, sorry. However, I don't think your
proposal needs to included these sorts of new checks -- unless there
are ones you specifically find interesting or exciting. They're more
along the lines of things we could add in the future if we had a good
framework to do so.

> 5. I said that I won't have any trip during GSoC, but I forget about my
> holiday after September 6. This is not backpacking-trip, I will live in a
> hotel with net access, but I won't be able to work full time. I hope that
> you won't disqualify my proposal on this basis -- that can be considered as
> an advantage because I will be highly motivated to finish before time.

That's totally fine. As long as you work your holiday into your
proposed timeline and are realistic about what you can get done given
your schedule, we're fine.

Jacob

Christopher Medrela

unread,
Apr 29, 2013, 6:25:19 AM4/29/13
to django-d...@googlegroups.com
2. I think your proposal is a bit too big. I'd general prefer to see a
less ambitious proposal with a high probability of success over a high
risk/high reward. I'd like to see you drop the "django-updates" part
of the proposal, and focus on validation and django-secure. If you end
up with extra time, you can always use it to write more checks.

Jacob, as you said, I deleted django-updates part, but I ended with a proposal taking only 9 weeks. Did you mean also scheduling more than 6 weeks for the first part of the proposal (writing the new framework) and "stretching" the schedule to 12 weeks? I agree that 6 weeks for writing the framework is slightly too few, but on the other side we have to send evaluations after first six weeks, so I think it's a good idea to have something that works done.

Jacob Kaplan-Moss

unread,
Apr 29, 2013, 10:35:42 AM4/29/13
to django-developers
On Mon, Apr 29, 2013 at 6:25 AM, Christopher Medrela
<chris....@gmail.com> wrote:
> Jacob, as you said, I deleted django-updates part, but I ended with a
> proposal taking only 9 weeks. Did you mean also scheduling more than 6 weeks
> for the first part of the proposal (writing the new framework) and
> "stretching" the schedule to 12 weeks? I agree that 6 weeks for writing the
> framework is slightly too few, but on the other side we have to send
> evaluations after first six weeks, so I think it's a good idea to have
> something that works done.

Look, it's up to you; my goal is to encourage you to come up with a
plan that's realistically deliverable. Nearly ever summer of code
student so far has overestimated how much they can get done. So I'm
trying to help you come up with a proposal that matches what you can
actually do. Don't forget to take Hofstadter's Law
(http://en.wikipedia.org/wiki/Hofstadter%27s_law) into account!

Ultimately, we're much more likely to reject an overly-ambitious
proposal than an underly-ambitious one. Our goal is to see you get
done what you promise to get done, so that starts with having
realistic expectations.

Jacob

Christopher Medrela

unread,
Apr 30, 2013, 10:33:14 AM4/30/13
to django-d...@googlegroups.com
I looked deeper into what I have to do and reestimated my proposal. And you're right -- the result is that the first part (introducing the new validation framework) will take more time (about 8 weeks instead of 6). Merging django-secure will take one week longer. The proposal is still available here [1]. I rewritten "schedule and milestone" part -- that was mainly adding more details, so there is no need to review it again.

I've published my proposal as a gist. Can I point to it in the submit proposal form? Repasting it and formatting again might be boring job.

Reply all
Reply to author
Forward
0 new messages