[GSoC Proposal] Customizable Serialization

93 views
Skip to first unread message

Vivek Narayanan

unread,
Mar 17, 2011, 3:47:13 AM3/17/11
to Django developers
Hi,

This is my proposal for the customizable serialization idea:

There are two formats - A formatted Google Docs version that's easy on
the eyes ( https://docs.google.com/a/vivekn.co.cc/document/pub?id=1GMWW42sY8cLZ2XRtVEDA9BQzmsqnCNULzskDMwqSUXI
) and a plain text version that follows.

-------------------------------------------------------------------------------------------------------------------
GSoC Proposal: Customizable Serialization for Django

=======
Synopsis
=======
Django provides a serialization framework that is very useful for
loading and saving fixtures, but not very flexible if one wants to
provide an API for a Django application or use a serialization format
different from what is defined by Django. Also the current handling of
foreign keys and many to many relationships is not really useful
outside the context of fixtures.

I propose a solution to this problem through a class based
serialization framework, that would allow the user to customize the
serialization output to a much greater degree and create new output
formats on the fly. The main features of this framework would be:

1. It will enable the users to specify the serialization model as a
class with configurable field options and methods, similar to Django’s
Models API.
2. Specify new output formats and a greater level of control over
namespaces, tags and key-value mappings in XML, YAML, JSON.
3. Add metadata and unicode conversion methods to model fields
through class methods.
4. Better handling of foreign keys and many-to-many fields with a
custom level of nesting.
5. A permission system to provide varying levels of data access.
6. Backward compatibility to ensure the smooth processing of
database fixtures.

=================
Implementation Details
=================
---------------------------------------
Modes and Configurations
---------------------------------------
I would like to provide building block configurations for XML, YAML
and JSON which the user can customize, which would be based more or
less on the existing skeletal structures in core.serialization and
core.serialization.base. Also there will be a new Serializer
configuration called TextSerializer that can represent any arbitrary
format. I will be providing a ‘fixture’ mode to ensure backward
compatibility and the seamless working of the ``loaddata`` and
``dumpdata`` commands.
Adding metadata to a field

The user can define methods beginning with “meta_” to add metadata
about each field. And functions starting with “meta2_” can be used to
add metadata at the model level. Here is an example:

class ExampleSerializer(serializers.Serializer):

...

def meta_foo(self, field):

'''

Extract some metadata from field and return it.

It would be displayed with the attribute ``foo``

'''

Temporarily all mappings between data will be stored in a dict as
string to object/dict mappings and would be converted to the desired
format at the output stage.

In JSON the metadata would be represented inside an object:

"key": {"foo": "bar", "value": value}

instead of

"key": value

In XML, two options would be provided, to represent the metadata as
individual tags or with tag attributes, through a field option in the
class.

class Serializer(XMLSerializer):

metadata_display_mode = TAGS # or ATTRIBUTES

The output would be like:

<field>

<metadata1>..</metadata1>

...

<Value>Value</Value>

</field>

OR

<field name="" md1 = "" ... > Value </field>

To select which fields would have which metadata, the arguments should
be passed in the ``serialize()`` method as:

data = ExampleSerializer.serialize(queryset, fields =
('field1', ('field2',['foo']) )

Each field can be specified in two ways:

1. As a string:-> no metadata will be added.

2. As a 2-element tuple, with the first element a string representing
field name and the second a list of strings representing the metadata
attributes to be applied on that field.

Instead of manually specifying the attributes for each field, the user
can add all metadata functions for all the fields using the
``use_all_metadata`` parameter in ``serialize()`` and setting it to
True.

The existing implementation of ``model.name`` and ``model.pk`` can be
described using “meta2_” functions. These will be provided as
``meta2_name`` and ``meta2_pk`` to facilitate loading and dumping of
fixtures.

---------------------------------------------------
Datatypes and Unicode conversion
---------------------------------------------------

The user can specify the protected types (the types that will be
passed “as is” without any conversion) as a field variable.

The unicode conversion functions for each type can be specified as
methods - “unicode_xxx”, where 'xxx' represents the type name. If no
method is provided for a type, a default conversion function will be
used.

class Example(Serializer):

...

protected_types = (int, str, NoneType, bool)

...

def unicode_tuple(self, object):

# Do something with the object

-------------------------------------------------
Output formatting and conversion
-------------------------------------------------
The user can specify the format of the output , the grouping of
fields, tags, namespaces, indentation and much more. Here are some
examples:

1. For text based serializers a custom template would be provided:

class Foobar(TextSerializer):

field_format = "%(key)s :: { %(value)f, %(meta_d1)s, %
(meta_d2)}"

## Simple string template, meta_xxx would be replaced by
meta_xxx(field) as

## I’ve mentioned above.

#The three parameters below are required for text mode

field_separator = ";"

wrap_begin = "[[" # For external wrapping structure

wrap_end = "]]"

indent = 4 # indent by 4 spaces, each level. Default is 0.

2. For markup based serializers, users can provide strings for the tag
names of fields, field values and models.

class XMLFoo(XMLSerializer):

mode = "xml"

indent = 2

metadata_display_mode = TAGS

field_tag_name = "object" # Now all fields will be rendered as
<object>...</object>

model_tag_name = "model"

value_tag_name = "value"

## if metadata_display_mode is set to ``TAGS``, this sets the
tag name of the value of the
## model field

3. A class field ``wrap_fields`` will be provided to wrap all fields
of a model into a group, as it is done now. If ``wrap_fields`` is set
as “all_fields” for example. Then all the fields would be serialized
inside an object called “all_fields”. If ``wrap_fields`` is not set,
there will be no grouping.
Related models and nesting

I will modify the current “start_object -> handle_object ->
end_object” sequence with a single method for handling a model, so
that related models can be handled easily using recursion. An option
of ``nesting_depth`` would be provided to the user as a field
variable. Default value would be 0, as it is currently. Serializing
only specific fields of related models can be done by using the fields
argument in the call to serialize. A related model would be
represented as “Model_name.field_name” instead of just “field_name”.

Instead of the list - ``_current``, I would be using separate lists
for each level of nesting.

---------------------------------------------------------
New features in the serialize() function
---------------------------------------------------------
Apart from the changes I’ve proposed for the ``fields`` argument of
serialize, I would like to add a couple of features:

• An exclude argument, which would be a list of fields to exclude from
the model, this would also contain the fields to exclude in related
models.

• An extras argument, which would allow properties and data returned
by some methods to be serialized.

-----------------------------------
Permission Framework
-----------------------------------
While creating an API, there may arise a need to give varying levels
of access to data to different people. For this I propose a permission
framework, where the user can choose to restrict data to certain
groups while defining a model. I guess a different name should be
used, so that it is not confused with the “Permission” model used in
contrib.auth and contrib.admin. Here’s an example

class User(models.Model):

name = CharField(max_length=128) # No restrictions

picture_url = URLField(restrict_to = (‘friends’, ‘self’, ‘admins’))

security_question = CharField(max_length = 200, restrict_to = (‘self’,
‘admins’))

security_answer = CharField(max_length = 200, restrict_to =
(‘admins’))

Here different permission groups like ‘self’, ‘friends’ and ‘admins’
are created as a field option. To use this, specify the
permission_level in the call to serialize

data = serializers.serialize(queryset, permission_level = ‘friends’ )

If no permission_level is given, only unrestricted fields will be
serialized.

-----------------------------------------------------------------
Representing the existing serialization model
-----------------------------------------------------------------
Here is an implementation of the existing serialization format in
JSON, this would be the ‘fixture’ mode that I’ve mentioned above.

class JsonFixture(JSONSerializer):

wrap_fields = "fields"

nesting_depth = 0

def meta2_pk(self, model):

...

def meta2_model(self, model):



In XML

class XMLFixture(XMLSerializer):

wrap_fields = "fields"

nesting_depth = 0

metadata_display_mode = ATTRIBUTES

indent = 4



field_tag_name = "field"

model_tag_name = "object"

def meta2_pk(self, model):

...

def meta2_model(self, model):



def meta_type(self, field):

...

===================
Deliverables and Timeline
===================

I would be working for about 40-45 hours each week and I would be
writing tests, exceptions and error messages along with development.
This would more or less be my timeline:

Till May 23

I will familiarize myself with community best practices and the
version control systems used by Django, read the code of all the
relevant modules related to the serialization framework, look at
implementations of other serialization libraries in different
languages and go through all the model and regression tests related to
serialization.

Weeks 1 to 2

I will use this time to set up the basic foundations of the projects
by :

1. Writing the skeletal structure of the serializer, based on the
current implementations in core.serialization.base and
core.serialization.python.
2. Setting up basic configurations for JSON, YAML, XML, Text and
creating the fixture mode.
3. Making changes to loaddata and dumpdata in
core.management.commands to ensure backward compatibility.
4. Using a dict as temporary storage before the final ‘dumping’
stage.
5. Modifying the deserializers to handle custom formats of
serialization and specifying the requirements for deserialization.

Week 3

1. Implementation of the metadata methods at field and model level
using getattr and similar methods.
2. Make changes to the fields argument of ``serialize()``.
3. Representation of output formats of the metadata in JSON/YAML,
XML etc.

Week 4

1. Implementation of the unicode and datatype conversion methods in
a way similar to the metadata methods.
2. Providing the user the choice of ‘protected’ types for the
serialization.

Week 5

1. Provide all the configurable options for output formatting as
discussed above.
2. Add support for string templates and their parsing.
3. Parsing the dict used for temporary storage to generate XML and
custom text outputs

Week 6

1. Implement serialization of related models like foreign keys and
many-to-many fields using recursion.
2. Integrate with ``fields`` argument of ``serialize`` and specify
the format for representing a related model.
3. Implement nesting depth limit feature.

Week 7

1. Implement the ``exclude`` feature in serialize() which will
allow the user to choose fields to exclude while serializing.
2. Adding an ``extras`` argument to serialize(), allowing the user
to specify additional properties of a model, which are not field
variables but derivatives of field variables and defined as methods or
properties in the model by the user.

Week 8

1. Implement the permissions framework , which would give varying
levels of access to data to different users based on their permission
level.
2. Integrate with Models API and field options.
3. Add permission_level argument to serialize().

Weeks 9 - 10

1. Write documentation for the project and provide many examples.
2. Write a few tutorials on how to use the framework.
3. Write some project-level tests, do some extensive final testing
and refine the project.

=====
About
=====

I am Vivek Narayanan, a second year undergrad student at the Institute
of Technology, Varanasi majoring in Electronics Engineering. I’m
really passionate about computers and have been programming for the
past 5-6 years. I have some experience in Python, C/C++, Java,
Javascript, HTML, PHP, Haskell and Actionscript. Python is my favorite
language and I’ve been using it for a couple of years. While working
on a web application, I stumbled upon Django a few months back and I
really love its elegant approach to everything.

I have submitted patches to tickets #15299 [1] , #12489 [2] and #8809
[3] on the Django Trac.

Some of the projects I’ve worked on are:

1. blitzx86: An assembler for the Intel X86 Architecture using lex
and yacc. [4]
2. An assembler for the dlx architecture supporting pipeline
optimizations. [5]
3. mapTheGraph! - A location based social networking application on
web and Android platforms.
4. A social networking game based on forex trading which is under
development.
5. pyzok - A python based LAN chat server. [6]
6. aeroFox - A .NET based open source browser for Windows with
transparent windows that managed over 100000 downloads. [7]

I am a fast learner and can grasp new technologies / languages in a
short period of time. Among other things, I enjoy playing tennis and
reading books on a wide variety of topics.

====
Links
====

[1] http://code.djangoproject.com/ticket/15299

[2] http://code.djangoproject.com/ticket/12489

[3] http://code.djangoproject.com/ticket/8809

[4] https://github.com/vivekn/blitz8086

[5] https://github.com/vivekn/dlx-compiler

[6] https://github.com/vivekn/pyzok

[7] http://sourceforge.net/projects/aerofox/files/aerofox/0.4.8.7/

Vivek Narayanan

unread,
Mar 20, 2011, 3:48:24 AM3/20/11
to Django developers
From a previous discussion on this list (
http://groups.google.com/group/django-developers/browse_thread/thread/2da69b9e24cf3438/17d87e3b27d4395d
) I gather that modifying the field options of a Model is not
desirable due to a loss in orthogonality. Here is a modified
permissions framework for serialization.



Permission Framework
=================
While creating an API, there may arise a need to give varying levels
of access to data to different people. For this I propose a permission
framework, where the user can choose to restrict data to certain
groups while defining a model. I guess a different name should be
used, so that it is not confused with the “Permission” model used in
contrib.auth and contrib.admin. Here’s an example

class User(models.Model):
name = CharField(max_length=128)
picture_url = URLField()
security_question = CharField(max_length = 200)
security_answer = CharField(max_length = 200)

class Meta:
serialize_permissions = {
‘default’: [‘name’],
‘admins’: [‘name’, ‘picture_url’, ‘security_question’,
‘security_answer’ ],
‘friends’: [‘name’, ‘picture_url’],
‘self’: [‘name’, ‘picture_url’, ‘security_question’]
}

Here different permission groups like ‘self’, ‘friends’ and ‘admins’
are created as a field of the Meta class . If no such field is
specified by the user, all fields are included under the default
permission group.

To use this, specify the permission_level in the call to serialize

data = serializers.serialize(queryset, permission_level = ‘friends’ )

=================================================================================================

Andrew Godwin

unread,
Mar 22, 2011, 5:07:35 PM3/22/11
to django-d...@googlegroups.com
On 17/03/11 07:47, Vivek Narayanan wrote:
> Hi,
>
> This is my proposal for the customizable serialization idea:

Hi Vivek - sorry about the long reply-wait on this! My initial thoughts
are below.

> The user can define methods beginning with �meta_� to add metadata
> about each field. And functions starting with �meta2_� can be used to


> add metadata at the model level. Here is an example:
>

> ...


>
> The existing implementation of ``model.name`` and ``model.pk`` can be

> described using �meta2_� functions. These will be provided as


> ``meta2_name`` and ``meta2_pk`` to facilitate loading and dumping of
> fixtures.

I'm unclear about what meta2_ accomplishes - is it for things that are
not fields, but still serialisable? Surely there's a better way to go
about this?

> Permission Framework
> =================


> While creating an API, there may arise a need to give varying levels
> of access to data to different people. For this I propose a permission
> framework, where the user can choose to restrict data to certain
> groups while defining a model.

I'm not entirely sure this is something that should be in the same scope
as the main project - adding user permissions into a serialisation
framework feels a bit ugly, especially when it's relatively easy for
people to implement themselves (with the exclude arguments, etc)

> � An extras argument, which would allow properties and data returned


> by some methods to be serialized.

How, exactly? What do you pass in this argument?

> -----------------------------------------------------------------
> Representing the existing serialization model
> -----------------------------------------------------------------
> Here is an implementation of the existing serialization format in

> JSON, this would be the �fixture� mode that I�ve mentioned above.

Presumably you're planning to leave the existing fixture-loading code as
it currently is, given that there's no mention of it here? Are the
customisable serialisers purely for use by other, non-Django
applications in your plan?

> ===================
> Deliverables and Timeline
> ===================
>
> I would be working for about 40-45 hours each week and I would be
> writing tests, exceptions and error messages along with development.
> This would more or less be my timeline:
>

Are you really going to be able to commit 40-45 hours a week? That's a
significant commitment, and more than many full-time jobs (in addition,
I don't see this being 400 man-hours' worth of work - not that that's a
bad thing, we'd rather it was less, as that's a lot of work to commit to)

I also haven't seen any proposals or examples of how I'd use the API as
an end user - are people going to be able to register serialisers to
models (since they're apparently tied to specific models anyway)?

How about if I just want to customise how the serialiser outputs
DateTimeFields, or tell it how to serialise my new, shiny, custom field
- does your proposal have any way to override things on a field type basis?

Those are my initial reactions on the first reading of the proposal with
your change to authentication added in - don't take any criticism too
harshly, we just have to be thorough.

Andrew

Vivek Narayanan

unread,
Mar 22, 2011, 11:56:46 PM3/22/11
to Django developers
> I also haven't seen any proposals or examples of how I'd use the API as
> an end user - are people going to be able to register serialisers to
> models (since they're apparently tied to specific models anyway)?

There will be different types of serializers like JSONSerializer,
YAMLSerializer, XMLSerializer etc . The end users will have to
subclass these, just like creating a new model.
While they need not be tied to specific models and can be used as
'generic' serializers, its finally down to the user's choice.


> I'm unclear about what meta2_ accomplishes - is it for things that are
> not fields, but still serialisable? Surely there's a better way to go
> about this?

It is for meta data of whole models (those implemented in the database
as "tables" and go through the start_model -> end_model cycle in the
current implementation of the serializer), basically a collection of
other fields, I believe the word 'model' is quite ambiguous in this
context. If there is a meta2_name method defined , the output would be
like this.

{
"name": " Output of meta2_name() here ",
... # fields of the model follow
}

> > An extras argument, which would allow properties and data returned
> > by some methods to be serialized.
>
> How, exactly? What do you pass in this argument?

The extras option allows the user to serialize properties of a model
that are not fields. These properties may be almost any standard
python attribute or method. Say, there is a model , Article, defined
like this:

class Article(models.Model):
headline = models.CharField(maxlength=100, default='Default
headline')
contents = models.TextField()
pub_date = models.DateTimeField()
...
def get_permalink():
#return some absolute URL



Eg: serialize(queryset, extras = ('Article.get_permalink') )

It takes an an iterable of strings in the format
"Object_Type.method_or_attribute_name".


>
> > -----------------------------------------------------------------
> > Representing the existing serialization model
> > -----------------------------------------------------------------
> > Here is an implementation of the existing serialization format in
> > JSON, this would be the fixture mode that I ve mentioned above.
>
> Presumably you're planning to leave the existing fixture-loading code as
> it currently is, given that there's no mention of it here? Are the
> customisable serialisers purely for use by other, non-Django
> applications in your plan?

I would be leaving most of the existing fixture loading code intact
but add support for deserializing fixtures in an arbitrary format.
These fixtures should contain some minimum required data for
generating the model like content type, required fields etc.


> How about if I just want to customise how the serialiser outputs
> DateTimeFields, or tell it how to serialise my new, shiny, custom field
> - does your proposal have any way to override things on a field type basis?

The data in the fields can be retrieved in the form of a python object
using the to_python method of the field as described here (
http://docs.djangoproject.com/en/dev/howto/custom-model-fields/ ). In
the case of the DateTimeField, it will be a datetime.datetime object.
Now, this is where the 'unicode_' methods come into play. By defining
a method "unicode_datetime_datetime()" in the serialiser , the
serialisation process for DateTimeFields can be overridden. The same
holds good for custom fields. If this naming convention based on the
python type is confusing, I can implement it according to the Django
field type.

Russell Keith-Magee

unread,
Mar 24, 2011, 8:43:07 AM3/24/11
to django-d...@googlegroups.com

What if you need to support both? e.g.,

<field foo="the foo value">
<bar>the bar value</bra>
</field>

It seems to me that you would be better served providing a way to
annotate each individual metadata value as (and I'm bikeshedding a
name here) 'major' or 'minor'. JSON would render all metadata as
key-values, and XML can make the distinction and render minor metadata
as attributes, and major metadata as tags.

I think I see where you're going here. However, I'm not sure it
captures the entire problem.

Part of the problem with the existing serializers is that they don't
account for the fact that there's actually two subproblems to
serialization:

1) How do I output the value of a specific field
2) What is the gross structure of an object, which is a collection of
fields plus, plus metadata about an object, plus

So, for a single object the JSON serializer currently outputs:

{
"pk": 1,
"model": "myapp.mymodel",
"fields": {
"foo": "foo value",
"bar": "bar value"
}
}

Implicit in this format is a bunch of assumptions:

* That the primary key should be rendered in a different way to the
rest of the fields
* That I actually want to include model metadata like the model name
* That the list of fields is an embedded structure rather than a list
of top-level attributes.
* That I want to include all the fields on the model
* That I don't have any non-model or computed metadata that I want to include

When you start dealing with the XML serializer, you have all these
problems and more (because you have the attribute/tag distinction for
each of these decisions, too -- for example, I may want some fields to
be rendered as attributes, and some as tags.

When you start dealing with foreign keys and m2m, you have an
additional set of assumptions --

* How far should I traverse relations?
* Do I traverse reverse relations?
* How do I represent traversed objects? As FK values? As embedded objects?
* If they're embedded objects, how do I represent *their* traversed values?
* What happens with circular relations?
* If I have two foreign keys on the same model, are they both
serialized the same way?

And so on.

There are some promising aspects to your proposal -- for example, the
datatype conversion and field output ideas seem sound (although as
Andrew noted, they may need a little more elaboration with regards to
non-simple datatypes -- especially datetimes and Geo values).

However, I'm not sure you've fully captured the gross serialization
structure problem. This is the real driving reason for introducing a
broader serialization framework -- to give complete flexibility of the
serialization process to the end user.

> ---------------------------------------------------------
> New features in the serialize() function
> ---------------------------------------------------------
> Apart from the changes I’ve proposed for the ``fields`` argument of
> serialize, I would like to add a couple of features:
>
> • An exclude argument, which would be a list of fields to exclude from
> the model, this would also contain the fields to exclude in related
> models.
>
> • An extras argument, which would allow properties and data returned
> by some methods to be serialized.

For me, the goal should be to deprecate these sorts of arguments. The
decision to include (or exclude) a particular field is a feature of
serialization that is intimately tied to the serialization format, not
something that is an external argument.

> -----------------------------------
> Permission Framework
> -----------------------------------

I'm not sure I see the value in this bit -- at least, not as a
baked-in feature of the serialization framework. A serialization
format encompasses "what should I output"; if you've defined a
sufficiently flexible framework, it should be possible to introduce
permission checks without needing to embed them into the base
serialization framework -- they should just be a set of specific
decisions made by a specific serializer.

In fact -- this may be a good test of your proposed API: Could a third
party write a serializer that prohibited serialization of certain
attributes, or modified the serialization of certain attributes, based
on a check of Django's permissions? Personally, I don't see this as a
core requirement, but demonstrating that it is possible in principle
would be a compelling argument for your API.

> -----------------------------------------------------------------
> Representing the existing serialization model
> -----------------------------------------------------------------
> Here is an implementation of the existing serialization format in
> JSON, this would be the ‘fixture’ mode that I’ve mentioned above.

I think these examples demonstrate what I said earlier -- your
proposed framework allows me to customize the name given to a field in
XML, but doesn't allow me to change the parent of that field within
the broader XML structure.

> ===================
> Deliverables and Timeline
> ===================
>
> I would be working for about 40-45 hours each week and I would be
> writing tests, exceptions and error messages along with development.
> This would more or less be my timeline:

Broadly, this timeline looks like a good start. It's certainly
provides enough detail to demonstrate that you've thought about your
project and it's needs and dependencies.

One suggestion -- what isn't clear from this timeline is when we will
start to see concrete evidence of your progress. From a broad project
management perspective, it would be good to see some concrete
deliverables in your timeline -- e.g., at the end of week 2, it will
be possible to serialize a simple object with integer and string
attributes into a configured JSON structure; by week 4, it will be
possible to use the same structure with XML; and so on.

So -- in summary -- this is a promising start. You've clearly given
the problem some serious thought, but some more serious thought is
needed. I look forward to seeing what you can do with the next
iteration.

Yours,
Russ Magee %-)

Vivek Narayanan

unread,
Mar 25, 2011, 6:03:41 AM3/25/11
to Django developers
Hi Russ,
Thanks for the long reply and all the suggestions. My comments are
inline.

> What if you need to support both? e.g.,
>
> <field foo="the foo value">
>     <bar>the bar value</bra>
> </field>
>
> It seems to me that you would be better served providing a way to
> annotate each individual metadata value as (and I'm bikeshedding a
> name here) 'major' or 'minor'. JSON would render all metadata as
> key-values, and XML can make the distinction and render minor metadata
> as attributes, and major metadata as tags.
>

I think that's a great idea, this can be implemented with decorators
on the methods like @tag or @attribute while setting one of them as
default when no decorator is applied.

> I think I see where you're going here. However, I'm not sure it
> captures the entire problem.
>
> Part of the problem with the existing serializers is that they don't
> account for the fact that there's actually two subproblems to
> serialization:
>
>  1) How do I output the value of a specific field
>  2) What is the gross structure of an object, which is a collection of
> fields plus, plus metadata about an object, plus
>
> So, for a single object the JSON serializer currently outputs:
>
> {
>     "pk": 1,
>     "model": "myapp.mymodel",
>     "fields": {
>         "foo": "foo value",
>         "bar": "bar value"
>     }
>
> }
>
> Implicit in this format is a bunch of assumptions:
>
>  * That the primary key should be rendered in a different way to the
> rest of the fields
>  * That I actually want to include model metadata like the model name
>  * That the list of fields is an embedded structure rather than a list
> of top-level attributes.
>  * That I want to include all the fields on the model
>  * That I don't have any non-model or computed metadata that I want to include

I believe that my model of using a recursive method and storing
temporary data in 'levels' would address most of these concerns. The
method for handling a model would consist of the following steps,
roughly:

* Get the list of fields to be serialized.
* Now serialize each field , after checking for circular references
(see below), (using handle_field and apply all metadata, formatting
options etc) to a temporary python object, most probably as a key-
value in a dict.
* If an FK or M2M is encountered, check for nesting restrictions
and then recursively apply the handle_model method.
* Add model level metadata, formatting options.
* Process reverse relations, if required. (see below)
* Store the model in some container in the serializer object.
* Clear temp data in the current 'level'.

Finally, when all models are processed, dump the data from the
container into the required format.

> When you start dealing with foreign keys and m2m, you have an
> additional set of assumptions --
>
>  * How far should I traverse relations?

The user can specify a limit to the levels of nesting through
variable ``max_nesting_depth``.

>  * Do I traverse reverse relations?

In my opinion, traversing reverse relations can get really ugly at
times, especially when there are M2M fields, foreign keys or circular
relations involved. But there are some scenarios where the data is in
a relatively simpler format and serializing them would be useful. To
support this, I thought of something like this:

class Srz(Serializer):
...
reverse_relations = [ (from_model_type, to_model_type), ... ]

But this should be used with caution and avoided when possible.

>  * How do I represent traversed objects? As FK values? As embedded objects?

As embedded objects, if the nesting depth limit is reached, then as FK
values.

>  * If they're embedded objects, how do I represent *their* traversed values?

Their traversed values would be represented just as a normal model
would be, with field-value mappings. The user can choose which fields
to dump.

>  * What happens with circular relations?

For all model type objects, like the base model in the query set and
all FK and M2M fields, some uniquely identifying data (like the
primary key, content type) will be stored in a list as each one of
them is processed. Before serializing a model, it would be checked if
the model is already on the list or not. If it is there, it is a
circular reference and that model would be ignored .

>  * If I have two foreign keys on the same model, are they both
> serialized the same way?

Yes.


> When you start dealing with the XML serializer, you have all these
> problems and more (because you have the attribute/tag distinction for
> each of these decisions, too -- for example, I may want some fields to
> be rendered as attributes, and some as tags.
>

For XML, I thought of using an intermediary container for a node that
would store all these details.

> > ---------------------------------------------------------
> > New features in the serialize() function
> > ---------------------------------------------------------
> > Apart from the changes I’ve proposed for the ``fields`` argument of
> > serialize, I would like to add a couple of features:
>
> > • An exclude argument, which would be a list of fields to exclude from
> > the model, this would also contain the fields to exclude in related
> > models.
>
> > • An extras argument, which would allow properties and data returned
> > by some methods to be serialized.
>
> For me, the goal should be to deprecate these sorts of arguments. The
> decision to include (or exclude) a particular field is a feature of
> serialization that is intimately tied to the serialization format, not
> something that is an external argument.
>
Initially, I thought the goal was not to tie down a serializer to any
model, I can integrate these features into the serializer class then.

> > -----------------------------------
> > Permission Framework
> > -----------------------------------
>
> I'm not sure I see the value in this bit -- at least, not as a
> baked-in feature of the serialization framework. A serialization
> format encompasses "what should I output"; if you've defined a
> sufficiently flexible framework, it should be possible to introduce
> permission checks without needing to embed them into the base
> serialization framework -- they should just be a set of specific
> decisions made by a specific serializer.
>
> In fact -- this may be a good test of your proposed API: Could a third
> party write a serializer that prohibited serialization of certain
> attributes, or modified the serialization of certain attributes, based
> on a check of Django's permissions? Personally, I don't see this as a
> core requirement, but demonstrating that it is possible in principle
> would be a compelling argument for your API.
>
I need to take a deeper look at things before I can comment on this,
though, on the surface it looks possible, by binding the output to
views.

> > -----------------------------------------------------------------
> > Representing the existing serialization model
> > -----------------------------------------------------------------
> > Here is an implementation of the existing serialization format in
> > JSON, this would be the ‘fixture’ mode that I’ve mentioned above.
>
> I think these examples demonstrate what I said earlier -- your
> proposed framework allows me to customize the name given to a field in
> XML, but doesn't allow me to change the parent of that field within
> the broader XML structure.
>

I'm not sure that I follow this, It would be great if you could give
an example. As I mentioned earlier, there will be an option to
'flatten' the nested models , provide alternate names to the fields,
and wrap fields into a group. Initially I had thought of adding this
to the external ``fields`` argument in ``serialize()``, but I can add
them to the specification object too.

> One suggestion -- what isn't clear from this timeline is when we will
> start to see concrete evidence of your progress. From a broad project
> management perspective, it would be good to see some concrete
> deliverables in your timeline -- e.g., at the end of week 2, it will
> be possible to serialize a simple object with integer and string
> attributes into a configured JSON structure; by week 4, it will be
> possible to use the same structure with XML; and so on.
>

I will restructure the timeline keeping this and all the other changes
you've suggested in mind and update it soon.

Russell Keith-Magee

unread,
Mar 30, 2011, 4:08:24 AM3/30/11
to django-d...@googlegroups.com

For a suitably relaxed definition of "field". Remember, serialized
data doesn't necessarily have to come from the model -- it could come
from a related model, or be a constant, or be a computed field, or
many other options.

>> When you start dealing with foreign keys and m2m, you have an
>> additional set of assumptions --
>>
>>  * How far should I traverse relations?
>
> The user can specify a limit to the levels of nesting through
> variable ``max_nesting_depth``.

A simple "nesting depth" approach won't work. You really need to
handle this on a per-model basis; Mode

It might be possible to automate some of this with a simple nesting
depth definition, but there will always be a need to define the exact
rollout of a tree of serialization options.

This is also a case where being explicit makes your life easier. If
you stop looking at "depth" as a single number specified at the top of
the tree, it becomes a lot easier to handle recursive or

>>  * Do I traverse reverse relations?
>
> In my opinion, traversing reverse relations can get really ugly at
> times, especially when there are M2M fields, foreign keys or circular
> relations involved. But there are some scenarios where the data is in
> a relatively simpler format and serializing them would be useful. To
> support this, I thought of something like this:
>
> class Srz(Serializer):
>   ...
>   reverse_relations = [ (from_model_type, to_model_type), ... ]
>
> But this should be used with caution and avoided when possible.
>
>>  * How do I represent traversed objects? As FK values? As embedded objects?
>
> As embedded objects, if the nesting depth limit is reached, then as FK
> values.

My point is that this is a serialization option. You're dictating a
policy here, rather than allowing it to be a configuration option.

>>  * If they're embedded objects, how do I represent *their* traversed values?
>
> Their traversed values would be represented just as a normal model
> would be, with field-value mappings. The user can choose which fields
> to dump.

Again -- you're dictating a policy, not allowing the user to define one.

>>  * What happens with circular relations?
>
> For all model type objects, like the base model in the query set and
> all FK and M2M fields, some uniquely identifying data (like the
> primary key, content type) will be stored in a list as each one of
> them is processed. Before serializing a model, it would be checked if
> the model is already on the list or not. If it is there, it is a
> circular reference and that model would be ignored .
>
>>  * If I have two foreign keys on the same model, are they both
>> serialized the same way?
>
> Yes.

Why should this be the case? Again, you are dictating policy, not
allowing policy to be defined.

>> When you start dealing with the XML serializer, you have all these
>> problems and more (because you have the attribute/tag distinction for
>> each of these decisions, too -- for example, I may want some fields to
>> be rendered as attributes, and some as tags.
>>
>
> For XML, I thought of using an intermediary container for a node that
> would store all these details.
>
>> > ---------------------------------------------------------
>> > New features in the serialize() function
>> > ---------------------------------------------------------
>> > Apart from the changes I’ve proposed for the ``fields`` argument of
>> > serialize, I would like to add a couple of features:
>>
>> > • An exclude argument, which would be a list of fields to exclude from
>> > the model, this would also contain the fields to exclude in related
>> > models.
>>
>> > • An extras argument, which would allow properties and data returned
>> > by some methods to be serialized.
>>
>> For me, the goal should be to deprecate these sorts of arguments. The
>> decision to include (or exclude) a particular field is a feature of
>> serialization that is intimately tied to the serialization format, not
>> something that is an external argument.
>>
> Initially, I thought the goal was not to tie down a serializer to any
> model, I can integrate these features into the serializer class then.

My point is that it should be *possible* to define a "generic"
serialization strategy -- after all, that's what Django does right
now. If arguments like this do exist, they should essentially be
arguments used to instantiate a specific serialization strategy,
rather than something baked into the serialization API.

>> > -----------------------------------------------------------------
>> > Representing the existing serialization model
>> > -----------------------------------------------------------------
>> > Here is an implementation of the existing serialization format in
>> > JSON, this would be the ‘fixture’ mode that I’ve mentioned above.
>>
>> I think these examples demonstrate what I said earlier -- your
>> proposed framework allows me to customize the name given to a field in
>> XML, but doesn't allow me to change the parent of that field within
>> the broader XML structure.
>
> I'm not sure that I follow this, It would be great if you could give
> an example. As I mentioned earlier, there will be an option to
> 'flatten' the nested models , provide alternate names to the fields,
> and wrap fields into a group. Initially I had thought of adding this
> to the external ``fields`` argument in ``serialize()``, but I can add
> them to the specification object too.

The recurring theme in my comments is that you are dictating policy,
not allowing users to define policy. We already have the former; we
need the latter.

Heres an example of the problem as I see it. Consider the
serialization of a "book" model.

Option 1: Django's current serializer:

{
"pk": 1
"model": "library.book"
"fields": {
"authors": [3, 4]
"editor": 5
"title": "Django for Dummies"
}
}

No real surprises here.

Option 2: the format required for a particular book publishing API

{
"book details": {
"book title": "Django for dummies"
"authors": [
"John Smith",
"Bob Jones"
],
}
"author_count": 2,
"editor": {
"firstname": "Alice"
"lastname": Watson",
"coworkers": [1, 5],
"contact_phone": "98761234",
"company": {
"name": "Mega publishing corp",
"founded": 2010
}
},
}

Notable features:
* Authors and editor are both "Person" objects, but
- We need to serialize editors in detail, including recursive calls
- Authors are serialized using their combined first+last name
* "book title" is a rename of the native model field
* "author_count" isn't on the model at all.
* "book details" doesn't reflect any aspect of model structure --
it's entirely decoration.

Now - show me how both of these serializers are defined using your
proposed API.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Mar 30, 2011, 4:21:40 AM3/30/11
to django-d...@googlegroups.com
On Wed, Mar 30, 2011 at 4:08 PM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> On Fri, Mar 25, 2011 at 6:03 PM, Vivek Narayanan <ma...@vivekn.co.cc> wrote:
>>> When you start dealing with foreign keys and m2m, you have an
>>> additional set of assumptions --
>>>
>>>  * How far should I traverse relations?
>>
>> The user can specify a limit to the levels of nesting through
>> variable ``max_nesting_depth``.
>
> A simple "nesting depth" approach won't work. You really need to
> handle this on a per-model basis; Mode
>
> It might be possible to automate some of this with a simple nesting
> depth definition, but there will always be a need to define the exact
> rollout of a tree of serialization options.
>
> This is also a case where being explicit makes your life easier. If
> you stop looking at "depth" as a single number specified at the top of
> the tree, it becomes a lot easier to handle recursive or

Damn... pressed send before I finished editing my thoughts here.

A simple "nesting depth" approach won't work. You really need to be
able to handle this on a per-model basis.

It might be possible to automate some of this with a simple nesting
depth definition, but there will always be a need to define the exact
rollout of a tree of serialization options.

This is also a case where being explicit makes your life easier.

Resolving loops and cycles on a tree of arbitrary depth is hard;
resolving loops and cycles independently on each level of a
serialization structure is much easier.

Yours,
Russ Magee %-)

Vivek Narayanan

unread,
Mar 31, 2011, 7:38:19 AM3/31/11
to Django developers
Hi Russ,

Thanks for the suggestions once again, I've thought of changing the
model for handling nested fields.

Each model can have a no of serializers, and they can be plugged in to
other serializers
and in this way nested models can be handled instead of cycling
through a tree of arbitrary depth.
Each serializer object would have a ``dump_object()`` method that
would return a python object.
This method is called when a serializer is plugged into another in a
nested model.

>My point is that it should be *possible* to define a "generic"
>serialization strategy -- after all, that's what Django does right
>now. If arguments like this do exist, they should essentially be
>arguments used to instantiate a specific serialization strategy,
>rather than something baked into the serialization API.

Yes, I'll be adding arguments ``attrs`` and ``exclude`` to the
__init__ method of the serializer.

Here is how the API can be used for generating the formats in those 2
examples:

"""
For the first option.
"""

class OriginalOutput(JSONSerializer):
wrap_fields = "fields"
indent = 4

def meta2_pk(self, obj):
#get pk
def meta2_model(self, obj):
#get model name

"""
The second option.
"""

class Base(JSONSerializer):
"""
A base serializer with formatting options.
"""
indent = 4

class Editor(Base):
"""
This can be used to serialize a Person model.
Note the company field, if no such Serializer object is specified,
the foreign key is returned.

By specifying the ``attrs`` argument while initializing the
object, one can restrict the fields
being serialized. Otherwise, all available fields and metadata
will be serialized.
"""
company = Company(from_field = 'company', attrs = ['name',
'founded'])

class Author(Base):
"""
The dump_object method is called when a serializer is plugged into
another and returns a python object,
this is normally a dict but this can be overridden.
"""
def dump_object(self, obj):
return obj.first_name + ' ' + obj.last_name


class BookDetails(Base):
"""
This is the serializer that will yield the final output.

Since the 'authors' field is M2M, the Author serializer will be
applied over the list of authors.

Aliases is a dict that maps attribute names to their labels during
serialization.

"""
wrap_all = "book details"
authors = Author(from_field = 'authors')
editor = Editor(from_field = 'editor', attrs = ['firstname',
'lastname', 'coworkers', 'phone', 'company'])
aliases = {
'title': 'book_title',
'acount': 'author_count'
}

def meta2_acount(self, obj):
# return count of authors


The serialized data can be obtained by calling BookDetails.serialize()
or OriginalOutput.serialize().

Some of the options I didn't cover in the above example are:

* A ``fieldset`` option, which is a list of fields to be serialized.
If it is not set, all fields are part of the
serializer. This can be additionally restricted during initialization
as shown above.

* Reverse relations can be used in the same way as other pluggable
serializers by
specifying the ``from_field`` argument as the related name during
initialization.

I hope this will add more flexibility.

Russell Keith-Magee

unread,
Apr 5, 2011, 10:49:33 AM4/5/11
to django-d...@googlegroups.com

So, by my count, here's a list of your implicit assumptions in this
syntactical expression:

* the fields are embedded in a substructure within the main model serialization
* ... and this is something that is sufficiently common that it
deserves first-class representation in your serializer syntax
* The pk is serialized as a top-level attribute
* ... and it's serialized using the attribute name 'pk'
* The model name is serialized as a top-level attribute
* ... and it's serialized using the attribute name 'model'
* indentation is something that has been defined as an attribute of
the serializer, rather than as a configurable item decided at
rendering time (like it is currently)

There are also a bunch of implicit rules about the ways foreign keys,
m2ms and so on are rendered. These aren't so concerning, since there
needs to be some defaults if the syntax is going to be even remotely
terse. However, it bears repeating that they are implicit.

> """
> The second option.
> """
>
> class Base(JSONSerializer):
>    """
>    A base serializer with formatting options.
>    """
>    indent = 4
>
> class Editor(Base):
>    """
>    This can be used to serialize a Person model.
>    Note the company field, if no such Serializer object is specified,
> the foreign key is returned.
>
>    By specifying the ``attrs`` argument while initializing the
> object, one can restrict the fields
>    being serialized. Otherwise, all available fields and metadata
> will be serialized.
>    """
>    company = Company(from_field = 'company', attrs = ['name',
> 'founded'])

ok - so:
* Where is "Company" defined?
* Why aren't the attributes of Company embedded one level deep?
* Why is the attribute named company, and then an argument is passed
to the Company class with the value of 'company'?

> class Author(Base):
>    """
>    The dump_object method is called when a serializer is plugged into
> another and returns a python object,
>    this is normally a dict but this can be overridden.
>    """
>    def dump_object(self, obj):
>        return obj.first_name + ' ' + obj.last_name

* Why is there a difference between a serializer that returns a list,
a serializer that returns a dictionary, and a serializer that returns
a string?

I'm sorry, but this seems extremely confused to me.

You have fields with names like "wrap_all" and "wrap_fields". These
are presumably fixed names that will be interpreted by the parser in a
particular way.

You have fields like "author' and "editor" in the same namespace that
correspond to model attributes.

In the same namespace, you also have functions like meta2_account --
names derived from model attributes.

You also have functions like dump_object().

So - four different types of attribute, all in the same namespace, and
all with different behaviors. What happens with name collisions (e.g.,
a model field named "aliases")? What precedence rules exist?

This proposal had a promising start, but to me, it seems to have gone
massively off the rails. At it's core, serialization should be a very
simple process:

* I have an object.
* That object has attributes.
* Turn that list of attributes into a list of serialized properties
* Each of those serialized properties may be:
- a flat value
- a nested value whose structure is determined by the object itself
- a nested value whose structure is determined by a related object.

And thats it. There are some minor complications in dealing with the
fact that XML has two different ways of rendering attributes, but that
should be a fairly minor extension of the "get me a list of
attributes" use case. I don't see the mapping between the examples
you've given, and this basic set of principles.

It feels to me like you've looking at a specific example that I've
given you, and every time you've found an exception, you've invented a
new keyword or access mechanism to resolve that use case. What I don't
see is a coherent whole -- a thread of similarity in the way you've
approached the problem.

The deadline for GSoC applications is rapidly approaching; while we
don't need to see an absolutely final API proposal, we do at least
need to see promise that you're moving in the right direction. If you
want to pursue this project, I suggest you take another pass at the
test case I gave you, and try to dramatically simplify your proposed

Reply all
Reply to author
Forward
0 new messages