MarkupField

James Turk

unread,

Feb 22, 2009, 4:46:50 PM2/22/09

to Django developers

I've needed a model that stored the markup/markup_type/rendered-markup
several times and typically just made those fields manually, but just
recently when working on a few projects that all needed this behavior
I decided to wrap up this behavior as a MarkupField and post it on
django snippets (http://www.djangosnippets.org/snippets/1332/)

After writing this I realized that the best place for it IMHO seems to
be django.contrib.markup as it is somewhat useful, wouldn't get in
anyone's way, and seems too simple to justify it's own external app.

Jacob commented on Twitter that this is something that he'd
potentially like to see as well so I opened a ticket:
http://code.djangoproject.com/ticket/10317

I know that this is not the best time to start down the path of a new
feature and realize this is probably something that will wait until
1.2 but I figured I'd start the discussion now even if it has to be
put on hold and picked back up in April.

I've just attached my initial patch to the ticket (tests are on the
way but I figured opening it up to discussion before finishing the
tests was probably wise) I also have a branch going at
http://github.com/jamesturk/django/tree/markupfield

I figure I should also answer here some of the questions that have
already been raised:

What about other types of markup?
Some have suggested supporting more types of markup but to me this
seems outside the scope of contrib.markup which currently only
supports ReST, markdown, and textile. Should this field not meet some
peoples needs the can add their own custom markup field and base it on
this design.

3 database fields?
Someone on the ticket has already questioned this. I'm open to
suggestions but of the various methods I thought of this looks like
the cleanest way as others seem to introduce a number of special cases.

Waylan Limberg

unread,

Feb 22, 2009, 5:16:12 PM2/22/09

to django-d...@googlegroups.com

On Sun, Feb 22, 2009 at 4:46 PM, James Turk <james....@gmail.com> wrote:
>
>
> What about other types of markup?
> Some have suggested supporting more types of markup but to me this
> seems outside the scope of contrib.markup which currently only
> supports ReST, markdown, and textile. Should this field not meet some
> peoples needs the can add their own custom markup field and base it on
> this design.
>

Well, at the vary least, you could make it easier for one to add their
own type of markup. Perhaps make the render-markup function such that
one could provide their own render function without needing to
subclass the field.

Personally, I would use this every time as I will never use the
default markdown. I will always be adding in some of the available
extensions[1]. Currently, the only way to do that is to write my own
render method. It would be nice to just pass that in on declaring the
field without creating my own subclass etc.

[1]: http://www.freewisdom.org/projects/python-markdown/Available_Extensions

--
----
\X/ /-\ `/ |_ /-\ |\|
Waylan Limberg

Marty Alchin

unread,

Feb 22, 2009, 5:28:18 PM2/22/09

to django-d...@googlegroups.com

I haven't looked too much at the patch, so I can't comment much on the
implementation yet, but reading your description and the ticket, I'd
to offer some thoughts, based on some of my own thoughts I've had on
this in the past.

First, I don't think you actually addressed the question mentioned in
the ticket regarding the 3 fields. It seems the question was whether
there should be three attributes on the Python model instance,
regardless of how many columns are stored in the database. On this
note, though, I do have a thought: specify the markup type as an
argument to the MarkupField. You already do this with a
default_markup_type, but I don't see much use in having users specify
their markup type at the time they enter the text.

Essentially, it comes down to the developer choosing a markup type
during development, rather than a user choosing it in a form. Like it
or not, fewer choices generally makes for a better user experience,
especially since offering just one choice means you can supply some
help text alongside it. Trying to supply useful information for all
the various markup options, while also helping the user decide which
one is best for the (usually brief) text they want to enter ... well,
it just doesn't seem it would scale well.

Besides, specifying the markup type as an attribute of the model means
that you can do away with one of the three database fields. Plus, it
means you can specify any markup type, simply by supplying a callable
instead of a string. We make this same type of differentiation when
specifying the upload_to argument of FileFields. The implementation
becomes both simpler and more powerful, all in one swift stroke.

As for the issue Alex really did bring up in the ticket, I think
there's a question as to whether the different field types can be
contained by a single complex object, rather than individual
attributes on the model. Basically, if you had a model like this:

class InfoModel(models.Model):
title = models.CharField(max_length=255)
description = markup.MarkupField(formatter='markdown')

I would personally rather see something like this, when it comes time
to access that field's content:

>>> info = InfoModel.objects.get(id=4)
>>> info.title
u'Django'
>>> info.description
<Markup: *The* web framework for perfectioni...>
>>> info.description.raw
u'*The* web framework for perfectionists with deadlines'
>>>info.description.formatted
u'The web framework for perfectionists with deadlines'
>>> unicode(info.description)
u'The web framework for perfectionists with deadlines'

So essentially, one attribute contains both types of content, with the
default Unicode representation being the formatted output. Since
templates call __unicode__() by default, all you'd have to do is use
{{ info.description }} in a template to get it right. But you could
still use {{ info.description.raw }} to get the original content if
necessary. Or, optionally, you can pass that through a different
processor at display time if you wanted to, using {{
info.description.raw|textile }}

That's just one man's opinion, but hopefully that helps the discussion
a bit anyway.

-Gul

James Turk

unread,

Feb 22, 2009, 5:33:12 PM2/22/09

to Django developers

I'll admit I'm not a markdown user generally and so I had neglected
these extensions with the intent to do something similar to what the
markdown filter does to handle this case but forgot before posting
this patch.

I'll update the patch mirror the way that the template tag works and
provide a markdown_options parameter on the field.

As far as an additional parameter to override the rendering function,
I'm thinking that perhaps the best option is to provide render_markup
as a method on the class and allow for it to be overridden.

-james

On Feb 22, 5:16 pm, Waylan Limberg <way...@gmail.com> wrote:

James Turk

unread,

Feb 22, 2009, 8:40:44 PM2/22/09

to Django developers

I've updated my patch with a way to pass extra options to markdown/
docutils/textile that should handle any of the common cases. I've
also moved render_markup into the class so that it is possible to
override on an inherited class.

> First, I don't think you actually addressed the question mentioned in
> the ticket regarding the 3 fields. It seems the question was whether
> there should be three attributes on the Python model instance,
> regardless of how many columns are stored in the database. On this
> note, though, I do have a thought: specify the markup type as an
> argument to the MarkupField. You already do this with a
> default_markup_type, but I don't see much use in having users specify
> their markup type at the time they enter the text.

I'm fairly attached to the idea of the type being tied to an instance
and not to the field itself as to me this feels much more flexible
(examples of where I'm using this behavior on live projects are on a
multi-user blogging app we use at my office where I tend to write my
posts in ReST, some coworkers prefer raw HTML, and some also use
Markdown). I agree with you about passing this complexity on to an
end user, comments for instance should support one and only one
format, but by setting a default this is possible (yes it is storing
an extra integer per record in the database but this seems
forgivable).

Perhaps I'm the only one for which this matters but it seems like the
current implementation makes it easy to satisfy both cases so I'm not
entirely sure I see the downside as long as the markup_type field
isn't exposed to the average user.

> As for the issue Alex really did bring up in the ticket, I think
> there's a question as to whether the different field types can be
> contained by a single complex object, rather than individual

> attributes on the model. Basically, if you had a model like this: ...

I do like this idea a lot and played with an implementation of it
already, my original concern was that the underlying fields will still
exist on the model and it seemed strange to have two places to access
the same data.

Unless I'm mistaken in my understanding though, adding a descriptor
doesn't mean that there still aren't three (two if markup_type ends up
going away) attributes.

post.body_markup_type == post.body.markup_type
post.body_rendered == post.body.formatted

adding the descriptor still seems like a good idea, even if only to
get the unicode() behavior that you showed in your example

Waylan Limberg

unread,

Feb 22, 2009, 9:58:38 PM2/22/09

to django-d...@googlegroups.com

On Sun, Feb 22, 2009 at 8:40 PM, James Turk <james....@gmail.com> wrote:
>
>> First, I don't think you actually addressed the question mentioned in
>> the ticket regarding the 3 fields. It seems the question was whether
>> there should be three attributes on the Python model instance,
>> regardless of how many columns are stored in the database. On this
>> note, though, I do have a thought: specify the markup type as an
>> argument to the MarkupField. You already do this with a
>> default_markup_type, but I don't see much use in having users specify
>> their markup type at the time they enter the text.
>
> I'm fairly attached to the idea of the type being tied to an instance
> and not to the field itself as to me this feels much more flexible
> (examples of where I'm using this behavior on live projects are on a
> multi-user blogging app we use at my office where I tend to write my
> posts in ReST, some coworkers prefer raw HTML, and some also use
> Markdown). I agree with you about passing this complexity on to an
> end user, comments for instance should support one and only one
> format, but by setting a default this is possible (yes it is storing
> an extra integer per record in the database but this seems
> forgivable).
>

Actually, I think there's room for a few different behaviors. Not sure
that all of them should go in contrib.markup, but I see 4 possible
scenarios:

1. James current implementation where each instance has the formatter
set for that specific instance.

2. Marty's suggestion where the formatter is hard-coded into the model
definition.

3. And a ForeignKey dependent option. Imagine a User or Project
specific setting. Perhaps something like:

class Project(models.Model):
name = models.CharField(max_length=50)
formatter = models.IntegerField(choices=MARKUP_CHOICES)

class Page(models.Model):
project = models.ForeignKey(Project)
body = markup.MarkupField(formatter='project.formatter')

I would imagine the above would work like Option 2, in that whatever
formatter is set for the 'Project' is assumed for all 'Pages' in that
project. No need to store the formatter_type separately in the 'Page'
model.

4. However, in some situations, I could see Option 3 used in
conjunction with option 1. The User sets her default choice in her
User Profile. Then, whenever she creates a new instance, the formatter
defaults to her preferred formatter. However, this particular instance
may use a different type of formatter, so she selects a different
formatter on that instance - which needs to be saved specific to that
instance.

Hmm, guess I'm kind of proposing two different things here aren't I?

1. Per instance formatter or not.

I have a couple thoughts about how to differentiate this one. The
obvious way would to have two different Fields, one for each behavior.

However, what about this crazy idea: only offer one Field, but have to
keywords options: "default_formatter" & "formatter" (or whatever color
you choose). If "default_formatter" is set, then use that as the
default, but give the end user the option of selecting a different
formatter per instance. However, if "formatter" is set instead, then
set that formatter for all instances with no option for the user to
change it. Obviously, it would need to be worked out how to handle
having both set (ignore one, generate error, or something else), which
could get ugly, but I thought I'd throw it out there.

2. ForeignKey dependent default or not.

Again, the obvious way would be with different fields.

But what about checking to see if the string passed in matches an
existing foreignkey on the model, and using that if it does - falling
back to the current behavior if not. Again, this may be a bad idea.
Just throwing it out to generate some thinking on it.

dc

unread,

Feb 23, 2009, 4:50:30 PM2/23/09

to Django developers

I think you are trying to do too many things in one place. Better
concern on storing and retrieving formatted data but not on loading
and configuring markup functions. Accept markup_func as argument and
call it when saving. User will load markup function that he wants, and
will configure it as he wants. You can provide already factored
functions for contrib.markup if you want. But current implementation
of render_markup with hardcoded types, sanitize and extra_args is ugly
and totally unpythonic.

Also, we do not subclass FileField if we want to change upload_to
path. We just provide callable function.

James Turk

unread,

Feb 23, 2009, 5:27:34 PM2/23/09

to django-d...@googlegroups.com

To me there are a few big problems with the markup_func proposal as I see it:

* it removes the ability to provide multiple markup_types on a given field
* what is unpythonic about taking in two extra parameters to customize the field?

My goal here isn't to provide a generic "accepts-any-markup" field as providing something that generic would best be done by overloading save_as on a TextField.

Sometimes in attempting to make something generic one can remove ease of use and limit useful features. I think there is a sound argument that supporting all of the markup types that contrib.markup supports is what makes the most sense. It seems like the argument that this field isn't generic enough is akin to suggesting that django.contrib.markup's templatetags aren't valuable because there are cases where one needs to support a different type of markup.

-James

David Larlet

unread,

Feb 23, 2009, 5:48:07 PM2/23/09

to django-d...@googlegroups.com

Le 23 févr. 09 à 23:27, James Turk a écrit :

> To me there are a few big problems with the markup_func proposal as
> I see it:
>
> * it removes the ability to provide multiple markup_types on a given
> field
> * what is unpythonic about taking in two extra parameters to
> customize the field?

Isn't it possible to define your choices+functions in settings? Like:

MARKUP_CHOICES = {
'markdown': markdown.markdown,
'textile': textile.textile,
and so on
}

This way you can generate your choice field iterating through keys and
render appropriate markup given values.

My 2c,
David

dc

unread,

Feb 23, 2009, 7:09:21 PM2/23/09

to Django developers

> * it removes the ability to provide multiple markup_types on a given field

No if properly implemented. Again, see FileField upload_to for
example.

> * what is unpythonic about taking in two extra parameters to customize the
> field?

All markup functions take different arguments. Also different versions
of one markup take different arguments. The nonsense of your way is
already there in current render_markup: REST doesn't use self.sanitize
and TEXTILE doesn't use self.extra_options.
Even if you provide default behaviour for contrib.markup why not
accept callable for markup_type as Marty suggested?

> It seems like the argument that this field isn't generic enough is
> akin to suggesting that django.contrib.markup's templatetags aren't valuable
> because there are cases where one needs to support a different type of
> markup.

The argument is that this field can became more generic, extendible
and better designed with very few and easy changes. But you just don't
want accept it. Very strange point of view.

Waylan Limberg

unread,

Feb 23, 2009, 8:19:27 PM2/23/09

to django-d...@googlegroups.com

On Mon, Feb 23, 2009 at 7:09 PM, dc <dmm...@gmail.com> wrote:
>
>> * it removes the ability to provide multiple markup_types on a given field
>
> No if properly implemented. Again, see FileField upload_to for
> example.

Or see the django-template-utils app [1]. It provides a nice wrapper
so that all the formatters use the same API. You probably don't need
to do exactly that, but something similar with a dict to map names to
functions. Then pass that dict in as an argument.

[1]: http://code.google.com/p/django-template-utils/source/browse/trunk/template_utils/markup.py

James Turk

unread,

Feb 24, 2009, 12:07:13 AM2/24/09

to django-d...@googlegroups.com

I certainly wasn't trying to come off as against any suggested improvements and apologize if I seemed so. I originally thought that the suggestion was to drop the option of selecting a markup type in favor of a callable or perhaps just saw the two as incompatible for some reason.

I wonder if perhaps some consensus could be reached by providing a markup_choices dict as per David's suggestion which could default to a dict defined as such:

DEFAULT_MARKUP_CHOICES = {
'markdown': markdown.markdown,
'restructuredtext': curry(publish_parts, writer_name='html4css1'),
'textile': curry(textile.textile, encoding='utf-8', output='utf-8')
}

MarkupField then takes an optional markup_choices dict and default_markup_type.

This seems to provide extensibility both in adding new markup types and also in making passing additional options by simply passing in your own markup_choices dict.

dc - do you have a suggestion in mind on how you'd implement a default as a callable or does this satisfy what you're looking for? i suppose that the default_markup_type could easily be a callable instead of a string if people don't find this confusing.

this certainly does feel a bit more extensible, thanks for knocking me around a little on that point. i do think it'd be nice if it was easier to provide parameters to markdown/restructuredtext without having to define a custom dictionary but this isn't the worst trade off (and perhaps someone has a suggestion of cleaner behavior)

James Bennett

unread,

Feb 24, 2009, 2:33:43 AM2/24/09

to django-d...@googlegroups.com

OK, so, time to step back a bit and think big-picture.

How to specify the markup type
==============================

Yes, you and your co-workers like having an internal app where
everybody chooses their own markup format every time they post
something. But if something's going to go into Django it's got to aim,
first and foremost, for the common case, for the 80% people, and
throwing choices in front of users when all they're trying to do is
post a comment on your blog just ain't gonna cut it.

So the first thing I'd recommend is that the per-instance choice of
format go out the window. If you'd like to maintain something which
does that in a third-party library, you should of course feel free,
but the overwhelmingly common use case is not going to need that (and,
in fact, will have its usability hurt by that -- don't make users
think).

The common case (and hence the one to optimize for) is going to be a
developer deciding "I like Textile, this site's going to use Textile"
(or similar, according to the developer's markup whims). This means
that the choice of markup format *also* shouldn't be an argument to
the field: if I hard-code ``format="textile"`` into a field in my app,
and you want to use my app but want Markdown instead, well, you're up
a creek.

Also, there's no real point to trying to store that information on the
model object: once you've rendered some input into HTML, you don't
need to know or care what formatter was used to do that, because
you've got the HTML.

Which brings us to... a setting, which is probably the best way to
handle the choice of markup formats. It's simple, it's easy, it gets
the behavior right for most people (most sites are going to use One
And Only One markup format throughout) right there out of the box.

When I wrote template_utils way back once upon a time, I basically
went through the above discussion in my head, and came to the same
conclusion: template_utils uses a setting called ``MARKUP_FILTER`` to
tell it what to do.

In template_utils I opted for the setting to be a 2-tuple: a (string)
name of a formatter, and a dictionary of keyword arguments to be
passed in (the formatter name was resolved at runtime by a class with
which one could register different formatting functions). That can
work here as well, though it might make more sense to:

1. Have the setting itself be a dict, with keys ``formatter`` and
``kwargs``. This is mostly because people seem to have trouble with
tuples, don't ask me why.

2. Have the formatter function actually be a dotted path of a
formatter function, which will be pulled in at runtime (this avoids
circular-import problems if you decide to use some bit of Django in
writing your formatter function, since that would need settings,
but settings need to import your function...).

So, say, something like this::

MARKUP_FILTER = { 'formatter': 'markdown.markdown',
'kwargs': { 'safe_mode': True }, }

Et voila: now any MarkupField for this install will, by default,
Markdown-ify its text in safe mode on save.

This is, incidentally, basically what I do with my own personal stuff,
although I just use the formatter object out of template_utils. It
works pretty well.

The only problem is that somewhere, eventually, somebody is going to
really need the ability to do multiple types of markup on the same
site. This is where the API needs to get creative: wasting a whole
extra DB column just to handle this is, well, wasteful. But I'll get
to a better solution in a moment.

What the field actually stores and returns
==========================================

Marty's suggestion above of having some sort of custom object returned
by the field, with both "raw" and processed text available (and a
sensible ``__unicode__()`` so that templates Just Work) pretty much
hits it on the nose, I think. This means you will need two columns in
the DB, but you were going to need that anyway: once I've saved my
blog entry written in Markdown, I'm gonna be pretty pissed if you
throw rendered HTML back at me on the edit screen.

So for this part, just do what Marty recommends.

How to handle markup override
=============================

That is, also, the solution to the problem of how to do a one-off (or
more-than-one-but-still-off) override of the default markup type: let
the ``Markup`` object (or whatever it ends up being called, though
that's a good name) expose methods for forcibly saving with a
formatting option of your choice, in much the same way file fields
already let you hand over some file contents and do manual saving
trickery.

So, suppose I write a blog entry::

>>> e = Entry.objects.create(title="Foo", body="Lorem ipsum dolor
sit *amet*")

where ``body`` is a ``MarkupField``. Now, let's say the default on
this site is Markdown, but I really want Textile instead for this
entry::

>>> e.body.save_markup(formatter='textile.textile')

Or some similar API (maybe let it just get the Python callable passed
in, since at that point the import problem doesn't exist).

Putting it all together
=======================

So, to recap, if I were designing this:

``django.contrib.markup.models.MarkupField`` would equate to two
columns in the DB: one for the raw text, one for the resulting
HTML.

It would be represented, at the Python level, by an object which can
spit back either the raw text or the HTML (and whose ``__unicode__()``
returns the HTML, marked safe for templates), and which offers a
method for custom-saving the markup using the formatter of your
choice.

It would be represented, at the form level, by a ``CharField`` +
``Textarea``, and for model forms would spit back the raw un-processed
text as the field's initial value on editing.

And... I'm spent. I think that covers everything, though it's
significantly more than two cents' worth of opinion :)

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Waylan Limberg

unread,

Feb 24, 2009, 9:01:01 AM2/24/09

to django-d...@googlegroups.com

James, you nailed it. This is exactly what we need. Well, with one
minor oversight:

On Tue, Feb 24, 2009 at 2:33 AM, James Bennett <ubern...@gmail.com> wrote:
>
[snip]

> How to handle markup override
> =============================
>
> That is, also, the solution to the problem of how to do a one-off (or
> more-than-one-but-still-off) override of the default markup type: let
> the ``Markup`` object (or whatever it ends up being called, though
> that's a good name) expose methods for forcibly saving with a
> formatting option of your choice, in much the same way file fields
> already let you hand over some file contents and do manual saving
> trickery.
>
> So, suppose I write a blog entry::
>
> >>> e = Entry.objects.create(title="Foo", body="Lorem ipsum dolor
> sit *amet*")
>
> where ``body`` is a ``MarkupField``. Now, let's say the default on
> this site is Markdown, but I really want Textile instead for this
> entry::
>
> >>> e.body.save_markup(formatter='textile.textile')
>

[snip]

This needs to accept kwargs as well. Lets take the use case were
Markdown it the default. And most of the site is used by trusted users
so Markdown is not in safe_mode (we allow raw html). But now, we have
one field (perhaps comments) which is accessable to the general
untrusted public. In that one case, I still want to use Markdown, but
with ``safe_mode = True``. The only way that will work is to accept
kwargs. So, using the above example:

>>> e.body.save_markup(formatter='markdown.markdown',
kwargs={'safe_mode': True})

James Bennett

unread,

Feb 24, 2009, 4:18:49 PM2/24/09

to django-d...@googlegroups.com

> This needs to accept kwargs as well. Lets take the use case were
> Markdown it the default. And most of the site is used by trusted users
> so Markdown is not in safe_mode (we allow raw html). But now, we have
> one field (perhaps comments) which is accessable to the general
> untrusted public. In that one case, I still want to use Markdown, but
> with ``safe_mode = True``. The only way that will work is to accept
> kwargs. So, using the above example:
>
> >>> e.body.save_markup(formatter='markdown.markdown',
> kwargs={'safe_mode': True})

Actually I think I'd just write the method like this::

def save_markup(self, formatter, **kwargs):
markup = formatter(self.raw_text, **kwargs)
# ...etc...

The **kwargs syntax means we don't need to pass an actual dictionary there.

Also, a nice advantage of this is that you could mess around with
particular models, or in particular situations, just by writing a
function which calls the save_markup() method, and hooking it up to a
post_save signal or similar (assuming, of course, that save_markup()
does not itself trigger save()...).

James Turk

unread,

Feb 24, 2009, 4:42:56 PM2/24/09

to django-d...@googlegroups.com

Thanks James Bennett for the thoughtful reply, it sounds like the general consensus is that the choices feature truly is something that the majority of users wouldn't want (and I'm inclined to defer to the wisdom of Marty and James B. -- if you guys think it is mostly useless I'll accept that particular need is outside the 80%).

I've got an implementation that works almost exactly to James B's last spec although I have a few general questions:

It still seems a little strange to me not being able to override the markup_type on a per-field basis but only on a per-instance one.. I understand the justification that in general most models should not set a default markup type (allowing users to set a site-wide default), but if you are overriding particular instances via the save_markup(formatter) method is it not reasonable to still have an optional formatter parameter to MarkupField that does the same thing?

In order for my implementation of Markup.save_markup to alter the fields directly it stores a reference to the instance/field_name/rendered_field_name.. is this the preferred method or are there suggestions on a better implementation of Markup?

----
class Markup(object):
    def __init__(self, instance, field_name, rendered_field_name):
        self.instance = instance
        self.field_name = field_name
        self.rendered_field_name = rendered_field_name

    def _get_raw(self):
        return self.instance.__dict__[self.field_name]
    def _set_raw(self, val):
        setattr(self.instance, self.field_name, val)
    raw = property(_get_raw, _set_raw)

    def _get_rendered(self):
        return getattr(self.instance, self.rendered_field_name)
    rendered = property(_get_rendered)

    def save_markup(self, formatter, **kwargs):
        if not callable(formatter):
            formatter = load_renderer(formatter)
        setattr(self.instance, self.rendered_field_name, formatter(self.raw))
----

-James

Reply all

Reply to author

Forward