Customizable Serialization

68 views
Skip to first unread message

Vivek Narayanan

unread,
Feb 24, 2011, 9:24:56 AM2/24/11
to Django developers
Hi,

I am Vivek Narayanan, an undergrad student at IIT, Varanasi in India
and am interested in participating in this year's SoC

Problem
------------
Django provides a serialization framework that is very useful for
loading and saving fixtures, but not very flexible if one wants to
provide an API for a Django application or use a serialization format
different from what is defined by Django. Also the current handling of
foreign keys and many to many relationships is not really useful
outside the context of fixtures.

Solution
------------
I propose a class based Serializer, extending/refactoring the current
base class which would be entirely configurable by the user. I would
also like to provide some configurations for XML and JSON/YAML, that
can serve as building blocks for writing other serializers. The class
would have the following configurable options:

• A basic structure, markup or template representing each field an
object and also representing the object.

• An external wrapping structure, a structure for an array, delimiters
and assignment symbols.

• Arbitrary level of nesting depth, Which fields of related models are
to be represented? etc. This nesting can be handled by recursion. The
data about related models can be extracted in the start_object or
end_object methods.

• Choosing which datatypes to dump as is, which ones to convert to
something else. The user can provide a mapping between types in the
form of a dict and conversion functions when needed.

• Adding metadata about each field, like data-type or content-length;
this can be represented as additional attributes in a tag in XML or
arrays in a key-value representation. This can be done by adding some
class methods.

• The level of indentation.

The idea is to store them as a python list or list of dicts till the
final 'dumping' stage. This way we can still use existing libraries
like SimpleXML, SimpleJSON etc. This 'dumping' method would be
overrideable. The user would have to choose between using a standard
library and specifying a format. I would also like to add a couple of
features:

• An exclude argument, which would be a list of fields to exclude from
the model, this would also contain the fields to exclude in related
models. I would like to extend the fields argument in the same way.

• An extras argument, which would allow properties and data returned
by some methods to be serialized.

While this is by no means a complete proposal, I was looking for some
feedback on the idea and would be happy to incorporate your
suggestions.

Russell Keith-Magee

unread,
Feb 26, 2011, 6:54:03 PM2/26/11
to django-d...@googlegroups.com
On Thu, Feb 24, 2011 at 10:24 PM, Vivek Narayanan <ma...@vivekn.co.cc> wrote:
> Hi,
>
> I am Vivek Narayanan, an undergrad student at IIT, Varanasi in India
> and am interested in participating in this year's SoC

Hi Vivek, and thanks for your interest in the GSoC!

My feedback at this point is that you've been very verbose, but not
especially clear. You've covered a lot of ground here, which shows
you're aware of the broad problems that exist -- but you haven't
really provided enough detail for us to work out if you're on the
right track.

Of course, this is your first post on the topic, so this may have been
intentional -- i.e., use this post as a 'taster' for the broad issues,
which you refine later. However, if you want to get accepted as a SoC
participant, we're going to need to have a very clear idea of what it
is that you're going to implement. History has shown us that students
that begin the SoC with a vague description don't end the SoC with a
completed project.

The suggestion I have made in the past is this: As a proof of concept,
show how you would define Django's existing serialization format using
your definition language. This will be a requirement of the final
deliverable anyway, so you might as well show how it can be done.

Once you've done that, provide a couple of other examples -- showing
both the definition, and the resulting output. The more examples you
provide of specific edge cases, the better we will be able to
understand your proposal.

Yours,
Russ Magee %-)

sebastien piquemal

unread,
Feb 28, 2011, 8:45:46 AM2/28/11
to Django developers
Hi

I just stumbled across this discussion ... even though I have always
been very interested about contributing to Django, I never took the
step yet ! But this is actually a topic that interests me very much
(even though for the moment it is Vivek's).

For several projects I am working on (a lot of APIs), I needed highly
customized serialization (and sometimes on objects that are not Models
at all). After a few attempts at writing directly some serialization
functions (a lot of very repetitive code), I decided to take a wider
view of the problem, and I wrote a generic serialization library for
Python : https://bitbucket.org/sebpiq/spiteat/. There is still quite a
lot of work to make the whole thing simpler, the docs are not well
organized ... so I am sorry for that. And also, that might be a little
overkill for django's built-in serialization framework (I even
implemented some debug logging functionalities for my serializers),
and it is probably too generic ... however it might give you
interesting ideas, and some inspiration on this problem !

The base idea are the following (https://bytebucket.org/sebpiq/spiteat/
wiki/doc_pages/conception.html):

1. treat all serialization/deserialization operations as recursive : a
serializer just divide its work, finds other serializers to delegate
sub-operations to, and then combines the results.
2. serialize/deserialize to a pivot format (a python dictionary
{<attr_name>: <attr_value>}), which allows to separate the heavy
(de)serialization work from the conversion to/from one or another
serial format. Then chain the result to another serializer for
example :
pivot = pivot_srz.serialize(queryset)
serialized = emitter_srz.serialize(pivot)

With SpitEat's system of settings which makes the serializers very
easily configurable, it has proven quite powerful for me. For example
say you have a model:

class MyModel(models.Model):
some_fk = models.ForeignKey(OtherModel)
some_field = models.TextField()
# ...

Say you have a serializer class ModelSrz that is used by default for
all model instances ... delegating the work would mean for this
serializer to find another serializer for every field ... by default,
for 'some_fk', ModelSrz would be used as well. But say you can
configure your serializers like this (using SpitEat's syntax):

#create new serializer that does really crazy stuff for my foreign key
class SomeFkSrz(Srz):
def serialize(self, inpt):
return "A super duper transformation %s"

def deserialize(self, inpt):
return a_super_duper_way_of_getting_my_object(inpt)

#then building my serializer for MyModel
my_model_srz = ModelSrz(attr_srz_map={'some_fk': SomeFkSrz()})

So basically, treating that as a recursive operation, allows a huge
flexibility while allowing to reuse a lot of code (if you provide good
mechanisms to plug-in your custom serialization at any level), because
you re-use ModelSrz, but you plug-in your own custom serializer for
'some_field'.

By doing like that, you can address all the requirements in
http://code.djangoproject.com/wiki/SummerOfCode2011 very easily, and
provide a lot of flexibility to the end-user.
Then, of course, this post doesn't address the problem of registering
your custom serializers to Django, nor registering your custom
emitters... but it probably ain't the complicated part.

I would be happy to provide any help on that topic, though I stopped
to be a student a few months ago :'-(

Cheers,

Sébastien

PS : There is an attempt at writing a SpitEat serializer for django
there : https://bitbucket.org/sebpiq/django-spiteat/src/312cb47ab200/serializers.py
it is veeeeery ugly, quite complicated (on purpose, because I needed
to handle a lot of things : MTI, GenericForeignKeys, ...), but it
should work fine for manytomany, fks, etc etc ...

On Feb 26, 6:54 pm, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

Vivek Narayanan

unread,
Mar 1, 2011, 3:20:50 AM3/1/11
to Django developers
@Sebastien, Thank you for your suggestions, that's exactly what I had
considered.

As I've mentioned earlier, I would like to start with providing basic
XML, JSON, YAML and text serializers, that would be built on the
existing base structure with a few modifications, as building blocks.
But before I start with the class structure, let me describe a feature
that I would be adding for the purpose of metadata.

Metadata Methods
---------------------------
The user can define methods beginning with “meta_” to add metadata
about each field. And functions starting with “meta2_” can be used to
add metadata at the model level. Here is an example:

class ExampleSerializer(serializers.Serializer):
...
def meta_foo(self, field):
'''
Extract some metadata from field and return it.
It would be displayed with the attribute ``foo``
'''

In JSON the metadata would be represented inside an object as
"key": {"foo": "bar", "value": value}
instead of
"key": value

In XML, two options would be provided, to represent the metadata as
individual tags or with tag attributes, through a field option in the
class.
metadata_display_mode = TAGS #or ATTRIBUTES

TAGS
---------
<field>
<metadata1>..</metadata1>
...
<Value>Value</Value>
</field>

ATTRIBUTES
---------------------
<field name="" metadata1 = "" ... > Value </field>

To select which fields would have which metadata, the arguments should
be passed in the ``serialize()`` method as
data = ExampleSerializer.serialize(queryset, fields = ('field1',
('field2',['foo']) )

Each field can be specified in two ways:

1. As a string:-> no metadata will be added.

2. As a 2-element tuple, with the first element a string representing
field name and the second a list of strings representing the metadata
attributes to be applied on that field.

Instead of manually specifying the attributes for each field, the user
can add all metadata functions for all the fields using the
``use_all_metadata`` parameter in ``serialize()``
use_all_metadata = True

The existing implementation of ``model.name`` and ``model.pk`` can be
described using “meta2_” functions. These will be provided as
``meta2_name`` and ``meta2_pk`` to facilitate loading and dumping of
fixtures.



Basic Structure
------------------------
Now coming to the basic structure of the fields. This need not be
specified for JSON/YAML as this will be handled by the libraries.

For text based serializers a custom template would be provided:

class TextSerializer(Serializer):
mode = "text"
field_format = "%(key)s :: { %(value)f, %(meta_d1)s, %(meta_d2)}"
# Simple string template, meta_xxx would be replaced by
meta_xxx(field) if meta_xxx is callable

#The three parameters below are required for text mode
field_separator = ";"
wrap_begin = "[[" # For external wrapping structure
wrap_end = "]]"

indent = 4 #indent by 4 spaces, each level. Default is 0. Used for
text and xml modes only


For markup based serializers, users can provide strings for the tag
names of fields, field values and models.

class XMLSerializer(Serializer):
mode = "xml"
indent = 2
metadata_display_mode = TAGS

field_tag_name = "object" # Now all fields will be rendered as
<object>...</object>
model_tag_name = "model"
value_tag_name = "value" # if metadata_display_mode is set to
``TAGS``, this sets the tag name of the value of the model field


A class field ``wrap_fields`` will be provided to wrap all fields of a
model into a group, as it is done now. If ``wrap_fields`` is set as
“all_fields” for example. Then all the fields would be serialized
inside an object called “all_fields”. If ``wrap_fields`` is not set,
there will be no grouping.


Nesting and Related Models
------------------------------------------
I will modify the current “start_object -> handle_object ->
end_object” sequence with a single method for handling a model, so
that related models can be handled easily using recursion. An option
of ``nesting_depth`` would be provided to the user as a field
variable. Default value would be 0, as it is currently. Serializing
only specific fields of related models can be done by using the fields
argument. A related model would be represented as
“Model_name.field_name” instead of just “field_name”.

Datatypes and conversion
----------------------------------------
The user can specify the protected types (the types that will be
passed “as is” without any conversion) as a field variable.

The unicode conversion functions for each type can be specified as
methods - “unicode_xxx”, where 'xxx' represents the type name. If no
method is provided for a type, a default conversion function will be
used.

class Example(Serializer):
...
protected_types = (int, str, NoneType, bool)
...
def unicode_tuple(self, object):
# Do something with the object


Representing the existing serialization format
------------------------------------------------------------------
Here is an implementation of the existing serialization format in
JSON:

class JsonSerializer(Serializer):
mode = "json"
wrap_fields = "fields"
nesting_depth = 0

def meta2_pk(self, model):
'''This method is not required to be overridden as a default
method would be provided'''
def meta2_model(self, model):
...


In XML

class XMLSerializer(Serializer):
mode = "xml"
wrap_fields = "fields"
nesting_depth = 0
metadata_display_mode = ATTRIBUTES
indent = 4

field_tag_name = "field"
model_tag_name = "object"

def meta2_pk(self, model):
...
def meta2_model(self, model):
...
def meta_type(self, field):
...

Sincerely,

Vivek Narayanan

sebastien piquemal

unread,
Mar 3, 2011, 9:15:17 AM3/3/11
to Django developers
Ok ... I have to admit I was not very clear. Here is what I meant,
illustrated with some code examples :

http://readthedocs.org/docs/django-serializers-draft/en/latest/index.html

These are my "dream" django-serializers !

Vivek Narayanan

unread,
Mar 6, 2011, 1:41:38 AM3/6/11
to Django developers
@Sebastien: I got your point about using a dict as an intermediate
structure and the use of recursion, and I looked at your
implementation which is somewhat similar to what I have in mind.

Well, here is a list of deliverables for the project:

• Investigate existing structure of the serializer, make changes,
refactor to suit needs. (1 week)

• Implement metadata methods, change the ``fields`` argument of
serialize(), write unit tests. (2 weeks)

• Implement structures and templates parsing for custom serialization,
configurations for XML/JSON/YAML etc. Also, write tests for this. (2
weeks)

• Handling of nested and related models. (1 week)

• Investigate the changes to be made at deserialization side and
implement them. (1 week)

• More tests and write documentation. (2 weeks)

This is a conservative estimate and am keeping 3 weeks as a cushion.



On Mar 3, 7:15 pm, sebastien piquemal <seb...@gmail.com> wrote:
> Ok ... I have to admit I was not very clear. Here is what I meant,
> illustrated with some code examples :
>
> http://readthedocs.org/docs/django-serializers-draft/en/latest/index....

Russell Keith-Magee

unread,
Mar 6, 2011, 1:54:53 AM3/6/11
to django-d...@googlegroups.com
On Sun, Mar 6, 2011 at 2:41 PM, Vivek Narayanan <ma...@vivekn.co.cc> wrote:
> @Sebastien: I got your point about using a dict as an intermediate
> structure and the use of recursion, and I looked at your
> implementation which is somewhat similar to what I have in mind.
>
> Well, here is a list of deliverables for the project:
>
> • Investigate existing structure of the serializer, make changes,
> refactor to suit needs.  (1 week)
>
> • Implement metadata methods, change the ``fields`` argument of
> serialize(), write unit tests. (2 weeks)
>
> • Implement structures and templates parsing for custom serialization,
> configurations for XML/JSON/YAML etc. Also, write tests for this. (2
> weeks)
>
> • Handling of nested and related models. (1 week)
>
> • Investigate the changes to be made at deserialization side and
> implement them. (1 week)
>
> • More tests and write documentation. (2 weeks)
>
> This is a conservative estimate and am keeping 3 weeks as a cushion.

Here's some advice: If this is what your final plan looks like, I
would expect that your proposal would be rejected. Here's why:

* We prefer projects to have a clear design in mind before
implementation begins. It's ok if refinements happen along the way,
but "investigation" periods (and you have 2 of them) are not something
that should be required. You investigate while you develop your
proposal.

* Testing isn't an activity that can be clearly separated. It's an
integral part of code development. Having a "more tests" activity
indicates you either haven't allocated enough time for testing during
development, or you're trying to pad your timeline.

* Padding with a 3 week cushion gives the impression that you haven't
thought about the effort required. 3 weeks of full time development is
a long time.

* I'm sceptical of any plan that consists of "2 week" estimates.
Again -- a week is a long time. If you can't clearly express what will
be developed, tested and delivered in a week long timeframe, then I
don't think you've thought about the problem hard enough -- at least,
not hard enough for us to recommend that Google give you $4k, and
someone from the project spend many hours mentoring you.

Yours,
Russ Magee %-)

Message has been deleted

Vivek Narayanan

unread,
Mar 7, 2011, 11:28:16 AM3/7/11
to Django developers


On Mar 6, 11:54 am, Russell Keith-Magee <russ...@keith-magee.com>
wrote:
I sent a message to this thread yesterday, I'm not sure why it didn't
appear here, so I'm sending it again.

First of all, thanks a lot for taking the time to go through the
proposal and I really appreciate your feedback. I must concede that
I've not been very clear and ended up sounding different than what I
really meant. I will be testing as I code, I just referred to some bug
fixing in the final stages. Here is a revised timeline:


In the run up to May 23rd, I'll be familiarizing myself with the
codebase and community practices of Django, examining all the
integration points and looking at the best practices of serialization.

Week 1: I'll be implementing a basic skeletal structure for the
serializer, which will set the stage for the rest of the project.
Week 2: Implementation of the deserializer.
Week 3: I'll add support for the metadata methods as discussed above.
Week 4: Support for datatype conversion and the unicode conversion
class methods.
Week 5: Add support for string templates and output formats.
Week 6: Add support for 'modes' as in JSON, XML, YAML etc, complete
the deserializer.
Week 7: Implement serialization of related models, M2M fields, foreign
keys and nesting depth.
Week 8: Add support for ``extras`` and ``exclude`` parameters in the
call to serialize(). Modify ``fields`` parameter as described above.
Weeks 9-10: Check for bugs, fix them and write documentation with many
examples.
Weeks 11-12: Refine the project and its documentation.

I'll be spending 40-45 hours a week on average.

Sincerely,
Vivek Narayanan

Tom Evans

unread,
Mar 8, 2011, 5:14:24 AM3/8/11
to django-d...@googlegroups.com, Vivek Narayanan
On Mon, Mar 7, 2011 at 4:28 PM, Vivek Narayanan <ma...@vivekn.co.cc> wrote:
> ...

> In the run up to May 23rd, I'll be familiarizing myself with the
> codebase and community practices of Django, examining all the
> integration points and looking at the best practices of serialization.
>
> Week 1: I'll be implementing a basic skeletal structure for the
> serializer, which will set the stage for the rest of the project.
> Week 2: Implementation of the deserializer.
> Week 3: I'll add support for the metadata methods as discussed above.
> Week 4: Support for datatype conversion and the unicode conversion
> class methods.
> Week 5: Add support for string templates and output formats.
> Week 6: Add support for 'modes' as in JSON, XML, YAML etc, complete
> the deserializer.
> Week 7: Implement serialization of related models, M2M fields, foreign
> keys and nesting depth.
> Week 8: Add support for ``extras`` and ``exclude`` parameters in the
> call to serialize(). Modify ``fields`` parameter as described above.
> Weeks 9-10: Check for bugs, fix them and write documentation with many
> examples.
> Weeks 11-12: Refine the project and its documentation.
>
> I'll be spending 40-45 hours a week on average.
>
> Sincerely,
> Vivek Narayanan
>

No offence meant here Vivek, but when I'm speccing something out or
reading a spec, if there is a block of work that says 'Implement
<blah>, 1 week', then I know this hasn't been thought out completely,
and that '1 week' could be anything from 45 minutes to 3 months.

The reason why software projects take regularly take longer than
anticipated is often because the design and thought behind the design
was not complete.

Ideally you can start breaking down into discrete tasks, each of which
shouldn't take longer than 4 hours, which is the largest block of time
you should deal with in my experience. Looking at it like that, and
assuming a 40 hour week, and 12 weeks of GSoC, you've got 120 units to
can account for.
Splitting down your project into small chunks will also demonstrate to
people reading your proposal that you understand the subject matter,
and they can have a high confidence of the project being delivered.

Weeks 9-10 made me smile though - no bug fixes allowed in the other weeks? :)

Cheers

Tom

Vivek Narayanan

unread,
Mar 8, 2011, 6:30:39 AM3/8/11
to Django developers


On Mar 8, 3:14 pm, Tom Evans <tevans...@googlemail.com> wrote:

> Splitting down your project into small chunks will also demonstrate to
> people reading your proposal that you understand the subject matter,
> and they can have a high confidence of the project being delivered.

Thanks, I didn't know this and I don't have much experience writing a
spec, I'll keep this in mind when writing the final proposal.

> Weeks 9-10 made me smile though - no bug fixes allowed in the other weeks? :)

Well, there will be bug fixes every week, its just a final dash of
fixing bugs.
Reply all
Reply to author
Forward
0 new messages