Proposal: Refactor Form class

158 views
Skip to first unread message

Moritz Sichert

unread,
Mar 3, 2016, 3:00:47 PM3/3/16
to Django developers (Contributions to Django itself)
Hello,

I want to propose a refactor of the Form class to make it
easier to customize and subclass.


Motivation:

Currently the Form class (combined with Field, Widget and
BoundField) has many responsibilities and combines much
functionality into a single class. It currently does:

- data validation (using Field validators and clean*
methods)
- parsing HTTP form data and handling special cases like
file upload (using Widget.value_from_datadict() and
Field.to_python())
- displaying values as HTML input fields (using
Widget.render(), BoundField.value(), Field.prepare_value()
and even the Media class)

Although those three functionalities are closely related one
can easily think of situations where only one is needed.
Following Django's principle of loose coupling it would also
be desirable to be able to exchange components.

For example, it's conceivable that the form data doesn't
come from a HTTP request with x-www-form-urlencoded or
multipart/form-data content but gets transferred as JSON. In
this case it should be easy to replace the HTTP parsing
component with a JSON parser.

Also HTML forms may not be the only place where data
validation is needed. An application may get an untrusted
dict of Python objects that doesn't have to be parsed but
should still be validated. In that case it would be nice to
be able to just use the data validation component.

Currently it is also pretty hard to subclass or customize
the Form class without having to rewrite most of the class.
An example is the handling of "initial" and "files" data.
Those two arguments provide all the data of a form in
conjunction with the "data" argument. Those three attributes
are accessed explicitly all over the classes of django.forms
(i.e. Form, BoundField, Field, Widget). So if one wanted to
change the semantics of "initial" data it would be necessary
to change all of those classes. In the spirit of loose
coupling those classes should rely as little as possible on
the existence of certain attributes like "initial" and
"files".

Therefore I propose splitting the Form class into multiple
components that can be easily exchanged and used without
another.



Rationale:

To give you a better idea about the proposal I already
prepared a general architecture of how I think this could be
implemented. This is of course in no way final and I'll be
happy to receive feedback and suggestions!

Components:

Each of the aforementioned components is implemented in a
class of its own:

- FormValidator:
Implements the data validation as the name suggests. It
uses Field instances that can be declared declaratively as
it is possible with the old Form class. Furthermore it
provides all methods of the old Form class that dealt with
data validation, i.e. clean*(), is_valid(), errors(), etc.
- DataProvider:
Implements the "parsing and providing data" component. It
has only one method "get_value(name)" that must be
overwritten by subclasses and returns the data for a given
field name.
HtmlFormProvider is a subclass of DataProvider that
provides the parsing of HTTP form content by using Widget
objects and their "value_from_datadict()" method in
particular. The widgets of a HtmlFormProvider can be
specified declaratively just like the fields of a
FormValidator.
- HtmlForm:
Implements the "displaying values as HTML input fields"
component. Same as HtmlFormProvider it uses Widget objects
and their "render()" method in particular. It also
features all methods of the old Form class that were used
mainly in templates, like __iter__(), __getitem__(),
__str__(), as_p(), etc.
Furthermore it fetches its data from a given
HtmlFormProvider instance and provides handling of error
messages with a FormValidator instance. It is also
possible to decouple HtmlForm and HtmlFormProvider such
that HtmlForm works with any DataProvider subclass.


Changes to existing classes:

Obviously the Form class does not exist anymore, however
the Field class was also changed slightly. As it isn't used
for any display logic anymore but solely for validation it
loses a few attributes that are better suited to the Widget
class like "label", "help_text" and "show_hidden_inital" and
it isn't possible to define a set a Field's widget. To
compensate those attributes where moved to the Widget class.


New classes:

To demonstrate the new possibilities there are also two new
classes called "SimpleDataProvider" and "JsonProvider". They
take a Python dict and a JSON string respectively instead of
a QueryDict from a HTTP request.


The attached file shows a class diagram of the proposed
classes.



Additional thoughts:

The long standing issue about replacing the Media class
(#22298 and friends) will probably be easier to solve if
this proposal is accepted. As a result of being able to
subclass HtmlForm without having to rewrite large parts of
code it will be easier for third party apps to define mixins
or subclasses of HtmlForm that deal with HTML assets. This
also means that Django doesn't have to specify a single
method on how to deal with assets but leaves that decision
to app authors.



Backwards Compatibility:

Since the Form class is used in basically every Django
project it is important to provide total backwards
compatibility. This is possible by creating a subclass of
"HtmlForm" that becomes the backwards compatible Form class.
It uses a meta class to extract the widgets from the
declared fields and create a corresponding HtmlFormProvider
subclass. This way a Form subclass that was written as
specified in the (old) documentation will continue working.



I realize there are a few things to think about to make this
proposal complete. However I first wanted to hear your
feedback and suggestions before continuing to work on it.

Do you agree with the motivation and the goals of this
proposal?
Do you think the motivation or the rationale have
significant flaws that make it impossible to realize?
Does the rationale look like something you would like to
have in Django or does it violate some basic principles?


I'd really like you to give your opinion!


Moritz
django-forms.png
signature.asc

is_null

unread,
Mar 4, 2016, 8:28:42 PM3/4/16
to Django developers (Contributions to Django itself)
Hi


On Thursday, March 3, 2016 at 9:00:47 PM UTC+1, Moritz S. wrote:
For example, it's conceivable that the form data doesn't
come from a HTTP request with x-www-form-urlencoded or
multipart/form-data content but gets transferred as JSON. In
this case it should be easy to replace the HTTP parsing
component with a JSON parser.

That's the case when using ReactNative for example. 

However, it is easy to make the client do a standard HTTP post request.
 
Also HTML forms may not be the only place where data
validation is needed. An application may get an untrusted
dict of Python objects that doesn't have to be parsed but
should still be validated. In that case it would be nice to
be able to just use the data validation component.

I'd like to see where that goes, it does sound pretty useful to me.

When users do validation without the framework it's not unusual to find security bugs :)

Perhaps you'd have more luck if you'd split your proposal and make a smaller proposal, wait and see ...

Curtis Maloney

unread,
Mar 4, 2016, 8:58:06 PM3/4/16
to django-d...@googlegroups.com
Hi,

On 04/03/16 07:00, 'Moritz Sichert' via Django developers
(Contributions to Django itself) wrote:
> Hello,
>
> I want to propose a refactor of the Form class to make it
> easier to customize and subclass.

I've recently had thoughts in this direction, so seeing your post
intrigued me...

However, I feel some of your assertions are a bit off the mark...

> Motivation:
>
> Currently the Form class (combined with Field, Widget and
> BoundField) has many responsibilities and combines much
> functionality into a single class. It currently does:
>
> - data validation (using Field validators and clean*
> methods)

Correct - this is the main purpose of Forms.

> - parsing HTTP form data and handling special cases like
> file upload (using Widget.value_from_datadict() and
> Field.to_python())

Well, Widgets and Fields are involved in determining how to extract data
from the submitted dict... but they don't actually "parse" the http form
data -- Django does that much further down the stack in the HttpRequest
object.


> - displaying values as HTML input fields (using
> Widget.render(), BoundField.value(), Field.prepare_value()
> and even the Media class)


> Although those three functionalities are closely related one
> can easily think of situations where only one is needed.
> Following Django's principle of loose coupling it would also
> be desirable to be able to exchange components.

> For example, it's conceivable that the form data doesn't
> come from a HTTP request with x-www-form-urlencoded or
> multipart/form-data content but gets transferred as JSON. In
> this case it should be easy to replace the HTTP parsing

As mentioned above, Forms don't handle this parsing. You can pass _any_
dict-like for them to validate. In fact, I (and many others) have used
this for a simple and effective way to import csv files, in combination
with stdlib's csv.DictReader.

I also frequently use Forms for validating JSON data...

> Also HTML forms may not be the only place where data
> validation is needed. An application may get an untrusted
> dict of Python objects that doesn't have to be parsed but
> should still be validated. In that case it would be nice to
> be able to just use the data validation component.

> Therefore I propose splitting the Form class into multiple
> components that can be easily exchanged and used without
> another.

Here we somewhat agree.

My view was Forms involve the following actions:

1. Validation - the most important.
2. Rendering - currently handled by widgets
3. Sourcing - pulling in data, initial, defaults [POST, Form initial,
ModelForm instance, etc]
4. Sinking - pushing out validated data [cleaned_data, ModelForm.save()]

I feel the current validation pattern is very powerful and flexible, and
don't see much need to alter it.

In prior discussions about improving rendering in forms [a long time
bugbear of mine] there are a number of subtleties that are not so
obvious at first blush, where the current implementation blurs the Field
and Widget.

ModelForm's show a reasonable example for how to overlay extensions for
sourcing and sinking, but a more formalised approach could certainly
benefit the wider range of practices in the modern web.

> Rationale:
>
> To give you a better idea about the proposal I already
> prepared a general architecture of how I think this could be
> implemented. This is of course in no way final and I'll be
> happy to receive feedback and suggestions!
>
> Components:
>
> Each of the aforementioned components is implemented in a
> class of its own:
>
> - FormValidator:
> Implements the data validation as the name suggests. It
> uses Field instances that can be declared declaratively as
> it is possible with the old Form class. Furthermore it
> provides all methods of the old Form class that dealt with
> data validation, i.e. clean*(), is_valid(), errors(), etc.

Agreed.

> - DataProvider:
> Implements the "parsing and providing data" component. It
> has only one method "get_value(name)" that must be
> overwritten by subclasses and returns the data for a given
> field name.

So that would be a "get_field_value"?
This scale of backwards incompatibility might scare off a lot of people
from supporting this project. So be sure to provide (a) an easy
migration path, and/or (b) a clear backward compatibility layer.

I'd be happy to further develop these ideas [and code] with you if you like.

--
Curtis

Moritz Sichert

unread,
Mar 5, 2016, 3:04:09 AM3/5/16
to django-d...@googlegroups.com
Hello,

Am 05.03.2016 um 02:57 schrieb Curtis Maloney:
> Well, Widgets and Fields are involved in determining how to extract data from
> the submitted dict... but they don't actually "parse" the http form data --
> Django does that much further down the stack in the HttpRequest object.
>
>
> As mentioned above, Forms don't handle this parsing. You can pass _any_
> dict-like for them to validate. In fact, I (and many others) have used this for
> a simple and effective way to import csv files, in combination with stdlib's
> csv.DictReader.
>
> I also frequently use Forms for validating JSON data...

You are right there, I guess "parsing" is the wrong word for what I mean.
The thing is that for "simple" Widgets like TextInput getting the value out of
the data dict is as simple as data.get(add_prefix(name)) so it is reasonably
easy to just pass in a normal dict as data instead of a request.POST because you
already know how the widget will fetch its data.
However it gets way more complicated with more complex Widgets. Say you have a
DateInput widget that uses three HTML input tags that let you enter the year,
month, and day respectively. If you were to replace request.POST with your own
dict now you would have to account for how this widget layouts its data, i.e.
you would have to do something like:

my_data_dict = {'prefix-thedate-year': 2016, 'prefix-thedate-month': 3,
'prefix-thedate-day': 5}

Even though the "thedate" is only one logical value you would have to "split" it
to get the form to "parse" it correctly. Also you have to know in advance how
your DateWidget works and have to change your code every time you use another
new (complex) widget.
I believe it should be possible to provide input data in whatever format without
needing to know how the widget expects the data.

Another thing I consider part of "parsing" is how "initial" data is handled.
Let's now pretend we know how to correctly replace request.POST with our own
dict regardless of the used widgets. Let's also disregard for now that the
inital dict's format is actually not like the data dict's because it already
contains *logical* values, i.e. stripped off prefix and only one value for each
logical field.
So now we can construct an "inital" dict just like we did for the data dict.
However the thing about the inital data is that Django already has an opinion on
what it is and how it should be handled.
I had a use case were I needed an additional layer of "initial" data that was
called something like "database initial". So I then had three types of data: the
actual "data" dict, "user initial" (similar to what Django considers to be
initial) and "database initial".
Because the concept of "initial" data is ingrained so deeply in the Form class I
had to customize a large part of the Form class internals and even BoundField.

That are the reasons why I believe "parsing", or better "sourcing" as you called
it, should be customizable.


> I feel the current validation pattern is very powerful and flexible, and don't
> see much need to alter it.

I feel the same way. For this reason I want to keep the way one specifies
validators exactly the same as it is now, i.e. by specifying Field instances
declaratively and being able to write clean() and clean_*() methods.


>> - DataProvider:
>> Implements the "parsing and providing data" component. It
>> has only one method "get_value(name)" that must be
>> overwritten by subclasses and returns the data for a given
>> field name.
>
> So that would be a "get_field_value"?

Exactly.


> This scale of backwards incompatibility might scare off a lot of people from
> supporting this project. So be sure to provide (a) an easy migration path,
> and/or (b) a clear backward compatibility layer.

Yes, I want to emphasize again that this is an important part of my proposal.


> I'd be happy to further develop these ideas [and code] with you if you like.

I actually already have a bit of code as a proof of concept, I think I'll open a
branch on my Django fork or even a PR.
However I moved around quite a lot of code so I suspect it will be a bit messy
to review.


Moritz

signature.asc
Reply all
Reply to author
Forward
0 new messages