More on Customizable Serialization

550 views
Skip to first unread message

Tom Christie

unread,
Apr 27, 2012, 12:44:59 AM4/27/12
to django-d...@googlegroups.com
Seeing another proposal for Customizable Serialization for the GSoC this year
prompted me to dust off the bits of work I've done along similar lines.
I'd really like to see this get properly addressed in core and I thought it
was about time I helped to make it happen.

I've made a fairly comprehensive start, and pushed the results of what I have
to date as the `django-serializers` project.  It's available on PyPI, and the
source is here:

There are some docs up on the GitHub site, but in brief it gives you:

* A Declarative Serializer/Field API that mirrors the Form/Field API.
* A Serializer class which can deal with arbitrary python objects.
* A ModelSerializer class which can handle model instances and query sets.
* A DumpDataSerializer class which mimics the existing dumpdata behaviour.
* Supports flat or nested ouput, and deals with recursive structures.
* Handles FK, M2M, OneToOne, and reverse relationships, model fields, and non-database attributes/properties.
* Serialization structure decoupled from output encoding concerns.
* Currently supports JSON, YAML, and XML.
* Decent test suite of the above.

It's not yet at the point where it could be a proposal for core - Right now the
API isn't backwards compatible with the existing serializers, the dump
data serializer is still missing some tweaks such as dealing with natural keys,
and there's some work to do on the XML output.  It is a pretty comprehensive
start though, and I'm really happy with the shape of the API.

Given that Piotr's GSoC proposal has now been accepted, I'm wondering what the
right way forward is?  I'd like to continue to push forward with this, but I'm
also aware that it might be a bit of an issue if there's already an ongoing
GSoC project along the same lines?

Having taken a good look through the GSoC proposal, it looks good, and there
seems to be a fair bit of overlap, so hopefully he'll find what I've done
useful, and I'm sure I'll have plenty of comments on his project as it
progresses.

I'd consider suggesting a collaborative approach, but the rules of the GSoC
wouldn't allow that right?

Piotr Grabowski

unread,
Apr 27, 2012, 4:14:16 AM4/27/12
to django-d...@googlegroups.com
Hi!

I'm Piotr Grabowski, student from University of Wroclaw, Poland
In this Google Summer of Code I will deal with problem of customizable
serialization in Django.

You can find my proposal here https://gist.github.com/2319638

It's obviously not a finished idea, it's need to be simplified for sure.
My mentor Russel Keith Magee told me to look at Tom Christie's
serialization API. I found it similar to my proposal, there is a lot in
common - declarative fields, same approach to various aspect of
serialization , but his API is simpler and it feels better.

Since Tom already post on group about his project I can refer to it:

W dniu 27.04.2012 06:44, Tom Christie pisze:
> ...
>
> Given that Piotr's GSoC proposal has now been accepted, I'm wondering
> what the
> right way forward is? I'd like to continue to push forward with this,
> but I'm
> also aware that it might be a bit of an issue if there's already an
> ongoing
> GSoC project along the same lines?
>
> Having taken a good look through the GSoC proposal, it looks good, and
> there
> seems to be a fair bit of overlap, so hopefully he'll find what I've done
> useful, and I'm sure I'll have plenty of comments on his project as it
> progresses.
>
> I'd consider suggesting a collaborative approach, but the rules of the
> GSoC
> wouldn't allow that right?
>
> --
Like I said above, your work will be very useful for me. I must read
GSoC regulations carefully but for sure collaboration with code writing
is impossible. I don't know that I could use your existing code base but
I think it's also impossible. However sharing ideas and discuss how the
API should look and work it will be very desirable.


My plan for next few weeks is to meet Django contribution requirements,
solve ticket to prove I now the process off doing it, and what's most
important have discussion about serialization API. I hope community
will be interested in this feature.

After weekend I will post my proposal with updates from Tom's API.

--
Piotr Grabowski


Anssi Kääriäinen

unread,
Apr 27, 2012, 4:36:36 AM4/27/12
to Django developers
On Apr 27, 11:14 am, Piotr Grabowski <grabowski...@gmail.com> wrote:
> Hi!
>
> I'm Piotr Grabowski, student from University of Wroclaw, Poland
> In this Google Summer of Code I will  deal with problem of customizable
> serialization in Django.
>
> You can find my proposal here https://gist.github.com/2319638

I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization. It is not the number one priority item,
but it might be worth it to write a couple of benchmarks (preferably
to djangobench [1]) and check that there are no big regressions
introduced by your work. If somebody already has good real-life
testcases available, please share them...

- Anssi

[1] https://github.com/jacobian/djangobench/

Piotr Grabowski

unread,
Apr 27, 2012, 5:11:56 AM4/27/12
to django-d...@googlegroups.com
W dniu 27.04.2012 10:36, Anssi K��ri�inen pisze:
I didn't think about performance a lot. There will be regressions.
Now serialization is very simple: Iterate over fields, transform it into
string (or somethink serializable), serialize it with json|yaml|xml.
In my approach it is: transform (Model) object to Serializer object,
each field from original object is FieldSerializer object, next (maybe
recursively) get native python type object from each field, serialize it
with json|yaml|xml.
I can do some optimalizations in this process but it's clear it will
take longer to serialize (and deserialize) object then now. It can be
problem with time taken by tests if there is a lot of fixtures.
I will try to write good, fast code but I will be very glad if someone
give me tips about performance bottlenecks in it.

--
Piotr Grabowski

Anssi Kääriäinen

unread,
Apr 27, 2012, 6:38:06 AM4/27/12
to Django developers
On Apr 27, 12:11 pm, Piotr Grabowski <grabowski...@gmail.com> wrote:
> I didn't think about performance a lot. There will be regressions.
> Now serialization is very simple: Iterate over fields, transform it into
> string (or somethink serializable), serialize it with json|yaml|xml.
> In my approach it is: transform (Model) object to Serializer object,
> each field from original object is  FieldSerializer object, next  (maybe
> recursively) get native python type object from each field, serialize it
> with json|yaml|xml.
> I can do some optimalizations in this process but it's clear it will
> take longer to serialize (and deserialize) object then now. It can be
> problem with time taken by tests if there is a lot of fixtures.
> I will try to write good, fast code but I will be very glad if someone
> give me tips about performance bottlenecks in it.

One possibility is to have a fast-path for simple cases. But,
premature optimization is the root of all evil, so lets first see how
fast the code is, and then check if anything needs to be done.

I still think it is a good idea to actually check how fast the new
serialization code is, not just assume it is fast enough. So, please
include some simple benchmarks in your project.

I hope users who have a need for fast serialization will participate
in this discussion by telling their use cases.

- Anssi

Tom Christie

unread,
Apr 27, 2012, 6:39:00 AM4/27/12
to django-d...@googlegroups.com
Hey Piotr,

  Thanks for the quick response.

> However sharing ideas and discuss how the API should look and work it will be very desirable.

That'd be great, yup.  I've got a couple of comments and questions about bits of the API, but I'll wait until you've had a chance to post your proposal to the list before starting that discussion. 

> I quickly skimmed the proposal and I noticed speed/performance wasn't 
mentioned. I believe performance is important in serialization and 
especially in deserialization.

Right.  Also worth considering is making sure the API can deal with streaming large querysets,
rather than loading all the data into memory at once.

- Tom.

On Friday, 27 April 2012 10:11:56 UTC+1, Piotr Grabowski wrote:
W dniu 27.04.2012 10:36, Anssi K��ri�inen pisze:

Piotr Grabowski

unread,
Apr 27, 2012, 7:28:14 AM4/27/12
to django-d...@googlegroups.com
W dniu 27.04.2012 12:39, Tom Christie pisze:
> Hey Piotr,
>
>
> > I quickly skimmed the proposal and I noticed speed/performance wasn't
> mentioned. I believe performance is important in serialization and
> especially in deserialization.
>
> Right. Also worth considering is making sure the API can deal with
> streaming large querysets,
> rather than loading all the data into memory at once.
> (See also https://code.djangoproject.com/ticket/5423)
>
> - Tom.
>
Maybe it can be done with chain of two black box generators. First
generator input are queryset (iterable sequence) and user defined
Serializer class contains how to transform single object and output is
python primitive type objects. Second is feed with this objects and
outputs serialized_string. What with nested objects - more generators?
Generators are good because we can also reuse Serializer objects ==
better performance. But like Anssi said - optimize after the code is
written, not before :)

--
Piotr Grabowski

Russell Keith-Magee

unread,
Apr 27, 2012, 11:05:17 PM4/27/12
to django-d...@googlegroups.com
Hi Tom,

On Friday, 27 April 2012 at 12:44 PM, Tom Christie wrote:
> Seeing another proposal for Customizable Serialization for the GSoC this year
> prompted me to dust off the bits of work I've done along similar lines.
> I'd really like to see this get properly addressed in core and I thought it
> was about time I helped to make it happen.
>
> I've made a fairly comprehensive start, and pushed the results of what I have
> to date as the `django-serializers` project. It's available on PyPI, and the
> source is here:
> http://github.com/tomchristie/django-serializers
>
> There are some docs up on the GitHub site, but in brief it gives you:
>
> * A Declarative Serializer/Field API that mirrors the Form/Field API.
> * A Serializer class which can deal with arbitrary python objects.
> * A ModelSerializer class which can handle model instances and query sets.
> * A DumpDataSerializer class which mimics the existing dumpdata behaviour.
> * Supports flat or nested ouput, and deals with recursive structures.
> * Handles FK, M2M, OneToOne, and reverse relationships, model fields, and non-database attributes/properties.
> * Serialization structure decoupled from output encoding concerns.
> * Currently supports JSON, YAML, and XML.
> * Decent test suite of the above.
>
> It's not yet at the point where it could be a proposal for core - Right now the
> API isn't backwards compatible with the existing serializers, the dump
> data serializer is still missing some tweaks such as dealing with natural keys,
> and there's some work to do on the XML output. It is a pretty comprehensive
> start though, and I'm really happy with the shape of the API.

Thanks for letting us know about the prior art -- I know we discussed serialisation at DjangoCon last year, and I'm kicking myself that I didn't provide more feedback at the time. Hopefully having this as a GSoC project will be enough to kick me into action and bring this project to completion.
>
> Given that Piotr's GSoC proposal has now been accepted, I'm wondering what the
> right way forward is? I'd like to continue to push forward with this, but I'm
> also aware that it might be a bit of an issue if there's already an ongoing
> GSoC project along the same lines?
>
> Having taken a good look through the GSoC proposal, it looks good, and there
> seems to be a fair bit of overlap, so hopefully he'll find what I've done
> useful, and I'm sure I'll have plenty of comments on his project as it
> progresses.
>
> I'd consider suggesting a collaborative approach, but the rules of the GSoC
> wouldn't allow that right?

Unfortunately, the GSoC rules don't allow for collaboration - the work that is submitted needs to be that of the student alone. However, they do allow for others to contribute by providing code reviews, design feedback, and so on. Given that we're building a user-facing API, there's also plenty of room to provide assistance by testing -- i.e., hunting down obscure serialisation use cases, and making sure that the API we've got can cover them.

If you've got the time and enthusiasm, I'd certainly appreciate you hanging around in a "co-mentor"-like capacity. You've clearly spent a lot of time thinking about on this problem, and I'm sure your input would be extremely valuable. You're also in a slightly more helpful timezone for Piotr, so if he needs some feedback when I'm not available, it would be nice to have someone he can call on that is familiar with the problem and his progress.

Yours,
Russ Magee %-)




Tom Christie

unread,
Apr 28, 2012, 11:55:10 AM4/28/12
to django-d...@googlegroups.com
> If you've got the time and enthusiasm, I'd certainly appreciate you hanging around in a "co-mentor"-like capacity.

Sure, I'd be more than happy to help out wherever possible.
Piotr, if you need to get in touch with me off-list for any reason please feel free to do so - t...@tomchristie.com

  - Tom

Piotr Grabowski

unread,
May 4, 2012, 4:08:14 PM5/4/12
to django-d...@googlegroups.com
Hi,

During this week I have a lot of work so I didn't manage to present my
revised proposal in Monday like i said. Sorry. I have it now:
https://gist.github.com/2597306

Next week I hope there will be some discussion about my proposal. I will
also think how it should be done under the hood. There should be some
internal API. I should also resolve one Django ticket. I think about
this https://code.djangoproject.com/ticket/9279 There will be good for
test cases in my future solution.

I should write my proposal on this group? In github I have nice
formatting and in this group my Python code was badly formatted.

--
Piotr Grabowski

Russell Keith-Magee

unread,
May 6, 2012, 4:45:40 AM5/6/12
to django-d...@googlegroups.com
On Sat, May 5, 2012 at 4:08 AM, Piotr Grabowski <grabow...@gmail.com> wrote:
> Hi,
>
> During this week I have a lot of work so I didn't manage to present my
> revised proposal in Monday like i said. Sorry. I have it now:
> https://gist.github.com/2597306

Hi Piotr,

At a high level, I think you're headed in the right direction. I like
the way you've separated Field and Serializer, and I like the way that
Serializer represents on "nesting level" of the final output (so if
you want complex formats for a single object, such as with the way
Django's JSON serializer has id, model and fields at the top level,
you nest Serializers to suit).

Here's some specific feedback:

* I can see that ModelSerializer will play an important part in your
proposal. However, some of your API proposals seem a little
unnecessary -- or are unclear why they're needed. Some areas that need
clarification:

- I'm not sure I follow how class_name would be used in practice. The
act of deserialization is to take a block of data, and process it to
populate an object.

In the simplest case, you could provide an empty instance (or factory)
that is then populated by deserialization. In this case, no class name
is required -- it's provided explicitly by the object you provide.

A more complex case is to use the data itself to determine the type of
object to create. This seems to be the reason you have "class_name",
but I'm not sure it's that simple. Consider a case where you're
deserializing a thing of objects; if the data has a "name" attribute,
create a "Person" object, otherwise create a "Thing" object. The
object required is well defined, but not neatly available in a field.

There's also no requirement that deserialization into an object is
handled by a ModelSerializer. ModelSerializer should just be a
convenient factory for populating a Serializer based on attributes of
a model -- so anything you do with ModelSerializer should be possible
to do manually with a Serializer. If class_name is tied to
ModelSerializer, we lose this ability.

- I'm not sure I see the purpose of "aliases" -- or, why this role
can't be played by other parts of the system. In particular, I see
Field() as a definition for how to fill out one 'piece' of a
serialised object. Why doesn't Field() contain the logic for how to
extract it's value from the underlying object?

- Why is preserve_field_ordering needed? Can't ordering be handled by
the explicit order of field definitions, or the ordering in the
"fields" attribute?

* As a matter of style, serializer_field_value and
deserialize_field_value seem excessively long as names. Is there
something wrong with serialize and deserialize?

* I don't think getattr() works quite how you think it does. In
particular, I don't think:

getattr(instance, instance_field_name) = getattr(obj, field_name)

will do what you think it does. I think you're looking for setattr() here.

* Can you elaborate some more on the XML attribute syntax in your
proposal? One of your original statements (that I agree with) is that
the "format" is independent of the syntax, and that a single set of
formatting rules should be able to be used for XML or for JSON. The
big difference between XML and JSON is that XML allows for values to
be packed as attributes. I can see that you've got an 'attribute'
argument on a Field, but it isn't clear to me how JSON would interpret
this, or how XML would interpret:

- A Field that had multiple sub-Fields, all of which were attribute=True
- A Field that had multiple sub-Fields, several of which were attribute=False
- The difference between these two definitions by your formatting rules:

<key attr1="val1" attr2="val2">
<subkey>subval</subkey>
</key>

<key attr1="val1" attr2="val2">main value</key>

In particular, why is the top level structure of the JSON serializer
handled with nested Serializers, but the structure of the XML
serializer is handled with nested Fields?

> Next week I hope there will be some discussion about my proposal. I will
> also think how it should be done under the hood. There should be some
> internal API. I should also resolve one Django ticket. I think about this
> https://code.djangoproject.com/ticket/9279 There will be good for test cases
> in my future solution.

I would suggest that you don't spend *too* much time on this. It's
certainly a good idea to get to know our committing procedures, and
historically we've encouraged students to get to use working on a
small ticket as a way to do this. However, your project is unusual in
that you've been accepted without a firm API proposal. Given that you
won't really be able to work on the GSoC without an accepted proposal,
I'd suggest that your API should take precedence in your pre-GSoC
plans.

> I should write my proposal on this group? In github I have nice formatting
> and in this group my Python code was badly formatted.

It's up to you; however, the problem with posting to a Gist (or
similar) is that it's very hard to comment on specific parts of your
proposal. I know code formatting is a pain in Google groups, but it is
a much better discussion forum.

Yours,
Russ Magee %-)

Tom Christie

unread,
May 7, 2012, 2:13:24 PM5/7/12
to django-d...@googlegroups.com
Hey Piotr,

Here's a few comments...

You have 'fields' and 'exclude' option, but it feels like it's missing an 'include' option - How would you represent serializing all the fields on a model instance (without replicating them), and additionally including one other field?  I see that you could do that by explicitly adding a Field declaration, but 'include' would seem like an obvious addition to the other two options.  

I'd second Russell's comment about aliases.  Defining a label on the field would seem more tidy.

Likewise the comment about 'preserve_field_order'  I've still got this in for 'django-serializers' at the moment, but I think it's something that should become private.  (At an implementation level it's still needed, in order to make sure you can exactly preserve the field ordering for json and yaml dumpdata, which is unsorted (determined by pythons dict key ordered). 

Being able to nest serializers inside other serializers makes sense, but I don't understand why you need to be able to nest fields inside fields.  Shouldn't serializers be used to represent complex outputs and fields be used to represent flat outputs?

When defining custom fields it'd be good if there was a way of overriding the serialization that's independent of how the field is retrieved from the model.  For example, with model relation fields, you'd like to be able to subclass between representing as a natural key, representing as a url, representing as a string name etc... without having to replicate all the logic that handles the differences between relationship, multiple relationships, and reverse relationships.

The "class_name" option for deserialization is making too many assumptions.  The class that's being deserialized may not be present in the data - for example if you're building an API, the class that's being deserialized might depend on the URL that the data is being sent too. eg "http://example.com/api/my-model/12"

In your dump data serializer, how do you distinguish that the 'fields' field is the entire object being serialized rather than the 'fields' attribute of the object being serialized?  Also, the existing dumpdata serialization only serializes local fields on the model - if you're using multi-table inheritance only the child's fields will be serialized, so you'll need some way of handling that.

Your PKFlatField implementation will need to be a bit more complex in order to handle eg many to many relationships.  Also, you'll want to make sure you're accessing the pk's from the model without causing another database lookup.

Is there a particular reason you've chosen to drop 'depth' from the API?  Wouldn't it sometimes be useful to specify the depth you want to serialize to?

There's two approaches you can take to declaring the 'xml' format for dumpdata, given that it doesn't map nicely to the json and yaml formats.  One is to define a custom serializer (as you've done), the other is to keep the serializer the same and define a custom renderer (or encoder, or whatever you want to call the second stage).  Of the two, I think that the second is probably a simpler cleaner approach.
When you come to writing a dumpdata serializer, you'll find that there's quite a few corner cases that you'll need to deal with in order to maintain full byte-for-byte backwards compatibility, including how natural keys are serialized, how many to many relationships are encoded, how None is handled for different types, down to making sure you preserve the correct field ordering across each of json/yaml/xml.  I *think* that getting the details of all of those will end up being awkward to express using your current approach.
The second approach would be to a dict-like format, that can easily be encoded into json or yaml, but that can also include metadata specific to particular encodings such as xml (or perhaps, say, html).  You'd have a generic xml renderer, that handles encoding into fields and attributes in a fairly obvious way, and a dumpdata-specific renderer, that handles the odd edge cases that the dumpdata xml format requires.  The dumpdata-specific renderer would use the same intermediate data that's used for json and yaml.

I hope all of that makes sense, let me know if I've not explained myself very well anywhere.

Regards,

  Tom

Piotr Grabowski

unread,
May 7, 2012, 2:23:07 PM5/7/12
to django-d...@googlegroups.com
W dniu 06.05.2012 10:45, Russell Keith-Magee pisze:
>
> - I'm not sure I follow how class_name would be used in practice. The
> act of deserialization is to take a block of data, and process it to
> populate an object.
>
> In the simplest case, you could provide an empty instance (or factory)
> that is then populated by deserialization. In this case, no class name
> is required -- it's provided explicitly by the object you provide.

I have this functionality with class_name
serializers.deserialize("json", data,
deserializer=UserSerializer(class_name=User))

>
> A more complex case is to use the data itself to determine the type of
> object to create. This seems to be the reason you have "class_name",
> but I'm not sure it's that simple. Consider a case where you're
> deserializing a thing of objects; if the data has a "name" attribute,
> create a "Person" object, otherwise create a "Thing" object. The
> object required is well defined, but not neatly available in a field.
If we have homogeneous list of object there is no problem. We can use
same construction as above, or depends (by class_name) on some field in
object. But if list is heterogeneous and we havn't information about
type - it's difficult then. There is need for feature like that? My
first thought is to have method in Serializer class like:

get_class(self, data):
# data is object (dict?) produced by first phase of deserialization)
# user can search some field in it and return class for object creation
if 'name' in data:
return Person
return Thing

it can be more like internal API which can be default set to:

get_class(self, data):
if self._meta.class_name is not None:
if isinstance(self._meta.class_name, str):
return object_from_string(data['self._meta.class_name'])
else:
return self._meta.class_name
raise Exception('No class for deserialization provided')

So if someone has simple needs he can use simple functionality like
class_name=Profile, but if there is need to find class by duck typing
philosophy using overwriting get_class will be suitable.

>
> There's also no requirement that deserialization into an object is
> handled by a ModelSerializer. ModelSerializer should just be a
> convenient factory for populating a Serializer based on attributes of
> a model -- so anything you do with ModelSerializer should be possible
> to do manually with a Serializer. If class_name is tied to
> ModelSerializer, we lose this ability.
Yes, I make a mistake - where I wrote ModelSerializer options I should
wrote Serializer options because ModelSerializer is just Serializer
which understands difference about fields in object (m2m, fk ...)

>
> - I'm not sure I see the purpose of "aliases" -- or, why this role
> can't be played by other parts of the system. In particular, I see
> Field() as a definition for how to fill out one 'piece' of a
> serialised object. Why doesn't Field() contain the logic for how to
> extract it's value from the underlying object?
Previously I used it with additional meaning -> if aliases[x] =
aliases[y] then x = [value[x], value[y]], but now it's only shortcut for
writing:
1) fname = Field(label=first_name)

2) aliases = {'fname' :'first_name'}

It's redundant but I think this can be helpful
>
> - Why is preserve_field_ordering needed? Can't ordering be handled by
> the explicit order of field definitions, or the ordering in the
> "fields" attribute?
I agree, ordering in the 'fields' attribute (like in Forms) will be better.

> * As a matter of style, serializer_field_value and
> deserialize_field_value seem excessively long as names. Is there
> something wrong with serialize and deserialize?
For now I want reserve serialize and deserialize names because I think
these names would be more appropriate for methods that will return
python native datatypes after first phase of serialization. If user
overwrite it he can do anything he want and must return native datatypes.
But sure, (de)serializer_field_value seems to be too long. Any other
propositions? Maybe get_value (because it must get value from object
field for serialization) and set_value (it sets value ob object field in
deserialization) ?
>
> * I don't think getattr() works quite how you think it does. In
> particular, I don't think:
>
> getattr(instance, instance_field_name) = getattr(obj, field_name)
>
> will do what you think it does. I think you're looking for setattr() here.
Oops :) Definitely setattr should be there.

>
> * Can you elaborate some more on the XML attribute syntax in your
> proposal? One of your original statements (that I agree with) is that
> the "format" is independent of the syntax, and that a single set of
> formatting rules should be able to be used for XML or for JSON. The
> big difference between XML and JSON is that XML allows for values to
> be packed as attributes. I can see that you've got an 'attribute'
> argument on a Field, but it isn't clear to me how JSON would interpret
> this, or how XML would interpret:

I consider this a lot. I have two ideas. JSON will drop fields with
attribute(True) or JSON will treat it like any other. Second is better
in my opinion.
>
> - A Field that had multiple sub-Fields, all of which were attribute=True
> - A Field that had multiple sub-Fields, several of which were attribute=False
> - The difference between these two definitions by your formatting rules:
>
> <key attr1="val1" attr2="val2">
> <subkey>subval</subkey>
> </key>
key = KeyField()

class KeyField(Field):
attr1 = A1Field(attribute=True)
attr2 = A2Field(attribute=True)

def field_name(self, obj, field_name):
return 'subkey'

def serialize_field_value(self. obj, field_name):
return 'subval'

Will work in xml and json.
>
> <key attr1="val1" attr2="val2">main value</key>
class KeyField(Field):
attr1 = A1Field(attribute=True)
attr2 = A2Field(attribute=True)


def serialize_field_value(self. obj, field_name):
return 'main_value'

Work in xml but fail in json

key : {
attr1 : 'val1',
attr2 : 'val2',
? : 'main_value'
}
Must raise an exception
I don't know if this is acceptable - same Field will work in xml and
fail in json. This is not the fault of xml attribute. We can fix that by
drop attributes in JSON and ensure that
if subfields in field are declared (and attribute=False in at least one
of them) then there must be also field_name declared
>
> In particular, why is the top level structure of the JSON serializer
> handled with nested Serializers, but the structure of the XML
> serializer is handled with nested Fields?
I don't understand you. XML serializer is also handled with Serializer:
class XMLDumpDataSerializer(YJDumpDataSerializer)
YJDumpDataSerialzier is JSON serializer and this is Serializer
>
> Yours, Russ Magee %-)


--
Piotr Grabowski

Piotr Grabowski

unread,
May 7, 2012, 6:22:19 PM5/7/12
to django-d...@googlegroups.com
W dniu 07.05.2012 20:13, Tom Christie pisze:
> Hey Piotr,
>
> Here's a few comments...
>
> You have 'fields' and 'exclude' option, but it feels like it's missing
> an 'include' option - How would you represent serializing all the
> fields on a model instance (without replicating them), and
> additionally including one other field? I see that you could do that
> by explicitly adding a Field declaration, but 'include' would seem
> like an obvious addition to the other two options.
Default all model fields will be serialized and additional all fields
adding by Fields declaration. If 'fields' is set then only fields
present in 'fields' and additional fields added by Fields declaration
will be serialized. To many fields :). If exclude is set then all model
fields except fields set in exclude will be serialized and additional
fields added by explicit declaration. I think it's like in ModelForm
declaration. Did I'm missing some case?
>
> I'd second Russell's comment about aliases. Defining a label on the
> field would seem more tidy.
>
> Likewise the comment about 'preserve_field_order' I've still got this
> in for 'django-serializers' at the moment, but I think it's something
> that should become private. (At an implementation level it's still
> needed, in order to make sure you can exactly preserve the field
> ordering for json and yaml dumpdata, which is unsorted (determined by
> pythons dict key ordered).
I answer Russell about that
>
> Being able to nest serializers inside other serializers makes sense,
> but I don't understand why you need to be able to nest fields inside
> fields. Shouldn't serializers be used to represent complex outputs
> and fields be used to represent flat outputs?
At first I think Serializer should be tied with object (one Serializer =
one object). But then I figured out that Serializer can work with object
passed in upper level Serialized (so 'source' field isn't needed). Maybe
nested serializers and flat field is better approach. I must consider this.

>
> The "class_name" option for deserialization is making too many
> assumptions. The class that's being deserialized may not be present
> in the data - for example if you're building an API, the class that's
> being deserialized might depend on the URL that the data is being sent
> too. eg "http://example.com/api/my-model/12"
I wrote about class_name in answer to Russell. If model class is in url
then we can do something like that:
serializers.deserialize("json", data_from_response,
deserializer=UserSerializer(class_name=model_from_url(url)))
> In your dump data serializer, how do you distinguish that the 'fields'
> field is the entire object being serialized rather than the 'fields'
> attribute of the object being serialized?
fields = ModelFieldsSerializer(...) will be feed with object to
serialize and name 'fields'. I'm only interested at output from it. It
must be python native datatype and I do something like
serialized_dict['fields'] = output_of_mode_fields_serializer
ModelFieldsSerializer knows what do with object.
> Also, the existing dumpdata serialization only serializes local fields
> on the model - if you're using multi-table inheritance only the
> child's fields will be serialized, so you'll need some way of handling
> that.
>
> Your PKFlatField implementation will need to be a bit more complex in
> order to handle eg many to many relationships. Also, you'll want to
> make sure you're accessing the pk's from the model without causing
> another database lookup.
Thanks for point that. Have to think about it.
>
> Is there a particular reason you've chosen to drop 'depth' from the
> API? Wouldn't it sometimes be useful to specify the depth you want to
> serialize to?
Sometimes maybe. But in most cases no. And there are some other ways to
do that. In my opinion going (globally) more than one level depth almost
never be needed. If there is need to go deeper in only one (or few but
not all) fields 'depth' is unusable.
>
> There's two approaches you can take to declaring the 'xml' format for
> dumpdata, given that it doesn't map nicely to the json and yaml
> formats. One is to define a custom serializer (as you've done), the
> other is to keep the serializer the same and define a custom renderer
> (or encoder, or whatever you want to call the second stage). Of the
> two, I think that the second is probably a simpler cleaner approach.
> When you come to writing a dumpdata serializer, you'll find that
> there's quite a few corner cases that you'll need to deal with in
> order to maintain full byte-for-byte backwards compatibility,
> including how natural keys are serialized, how many to many
> relationships are encoded, how None is handled for different types,
> down to making sure you preserve the correct field ordering across
> each of json/yaml/xml. I *think* that getting the details of all of
> those will end up being awkward to express using your current approach.
> The second approach would be to a dict-like format, that can easily be
> encoded into json or yaml, but that can also include metadata specific
> to particular encodings such as xml (or perhaps, say, html). You'd
> have a generic xml renderer, that handles encoding into fields and
> attributes in a fairly obvious way, and a dumpdata-specific renderer,
> that handles the odd edge cases that the dumpdata xml format requires.
> The dumpdata-specific renderer would use the same intermediate data
> that's used for json and yaml.
I can't agree with that. There are too big differences between existing
xml and json serializer output format. There is field 'fields' in json
and 'field' in xml. Xml has attributes and json not. It's only
presentation and these two cases could be handled in second phase (in
renderer). But there is one big difference - xml has additional fields
'to', 'rel', 'type' and these are not presentation. These are informations.

The next (and maybe most important) thing to consider is what user
should know about formats to be able to serialize his data. In your's
approach user should be familiar with for example SimpleXMLGenerator
because if he want

xml
<object>
<item>...</item>
<item>...</item>
</object>

and json
{
items : [ ..., ...],
}

then he must wrote at least one renderer to transform 'items' to 'item'
like you did in DumpDataXMLRenderer in django-serializers. I can't
accept that. Don't get me wrong, I adopt a lot of your's ideas from
django-serializers and I think is very good project. You shouldn't force
users to know anything about generating xml or any other format. Maybe
you should create some metalanguage for user to speak about what he want
like:

"I want that field 'items' will be transform to 'item' in xml (but I
don't know how to do it)" ->

class DumpDataSerializer(ModelSerializer):
"""
A serializer that is intended to produce dumpdata formatted structures.
"""
renderer_optons = {
'xml': { 'transform' : {'fields' : 'field'}} ,
}

It's ugly but I hope you understand my idea.

>
> I hope all of that makes sense, let me know if I've not explained
> myself very well anywhere.
>
> Regards,
>
> Tom
>

--
Piotr Grabowski

Piotr Grabowski

unread,
May 12, 2012, 11:15:26 AM5/12/12
to django-d...@googlegroups.com
Hi,

This week I think about internal API for Serializer. I want that
developers can eventually use it for better customization of their
solutions.

Next week I must learn for my exams so I suppose I will not do much with
serialization project. I will try to resolve some issues about my API
that Tom Christie pointed.

I know that I didn't do much but at the end of semester I have many
tasks related to my studies. After end of May I will have much more time.

--
Piotr Grabowski

Piotr Grabowski

unread,
May 20, 2012, 6:34:46 PM5/20/12
to django-d...@googlegroups.com
Hi,

During this week I was focused on my exams. Now I have more time for
serialization project. Sadly API isn't finished yet. 21 May in gsoc
calendar is time for start coding. Tomorrow I will send updates to API
proposal and I will present idea of algorithm (maybe list of steps will
be better name) used for serialization. Wednesday 23 May I want start
coding and Saturday 27 may I will write next check in and present my
initial code.

First thing I want to code is basis for serializers.serializer method,
Serializer and Field class. After two first weeks I want to be able to
serialize very simple objects to json. Like I wrote in my first
proposal I'm ready to spend 20 hours per week on this. In two first
weeks it will be less due to my studies tasks.


--
Piotr Grabowski

Piotr Grabowski

unread,
May 21, 2012, 6:59:40 PM5/21/12
to django-d...@googlegroups.com
I do some changes to my previous API: (https://gist.github.com/2597306
<- change are included)

* which fields of object are default serialized. It's depend on
include_default_field but opposite to Tom Christie solution it's default
value is True so all fields (eventually specified in Meta.model_fields)
are present
.
* follow_object attribute. In short - on which object should work
Serializer's child Serializer. Tom wrote about this in previous mail but
I didn't fully understand the problem so I gave him bad answer. It's
better described in algorithm I present.

* get rid of aliases and preserve_field_ordering fields

* change class hierarchy
class Serializer(object) # base class for serializing
class Field(Serializer) # class for serializing fields in objects
class ObjectSerializer(Serializer) # class for serializing objects
class ModelSerializer(Serializer) # class for serializing Django
Models.


I prepare list of steps for first phase of serialization. It's written
in English-Python pseudo code :) Hope indentation will be preserved.
Serializer.serialize is function that for object will return dict with
python native datatypes.

(Object|Model)Serializer.serialize(object, field_name (can be None),
**options)
1. Get object
1.1. if object is iterable then do this algorithm for all elements
and return list of returned values
1.2. if field_name for object is set from upper level we have
object Obj:
1.2.1. if Meta.follow_object == True then work on object
Obj.field_name
1.2.2. else work on Obj

2. Find all fields Fs that should be serialized
2.1. Get all fields declared in Serializer
2.2. Get all fields from Meta.fields
2.3. If Meta.include_default_fields = True then get all fields where
type is valid in Meta.model_fields and not in Meta.exclude

3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. If F is declared in Serializer then serializer is
explicit declared
3.1.2. Else get serializer for F type (m2m related etc)
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to
dictionary A[__attributes__][field_name] = serializer_value

4. Return A


Field.serialize(object, field_name (can be None), **options)
1. Get object
1.1. if it is iterable then do this algorithm for all elements
1.2. work on object Obj passed from upper level

2. Find all fields Fs that should be serialized
2.1. Get all fields from declared fields

3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. F is in declared fields so serializer is explicit declared
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to
dictionary A[__attributes__][field_name] = serializer_value

4. Resolve function serialized_value
4.1. If Fs (and A) is empty:
4.1.1. If function field_name returns None then return
serialized_value
4.1.2. Else return {field_name() : serialized_value()}
4.2. Else
4.2.1. If function field_name returns None then raise Exception
4.2.2. Else A.update({field_name() : serialized_value()})

5. Return A

We have dict (list of dicts) from first phase of serialization. Next
__attributes__ must be resolve (depends on format and strategy).


Deserialization: (it's early idea)

SomeSerializer.deserialize(D - python_native_datetype_objects (dict or
list of dict), instance=None, field_name=None, class_name=None, **options)

1. Get object instance # Resolving this may be more complicated than I
wrote below (e.g. base on D fields - duck typing)
1.1. If instance is not None then use it
1.2. Else try resolve class_name
1.2.1. If class_name is class object instantiate it.
1.2.2. If class_name is string then find string value for this
key in D and instantiate it
1.2.3. If class_name is None raise Exception

2. Find all fields in D and find fields in Serializer for deserializing them
2.1. Resolve label attribute for fields

3. Pass instance, data D and field_name to all fields Serializers

4. Return instance


I'm aware that there will be lot of small issues but I believe that
ideas are good.

--
Piotr Grabowski

Piotr Grabowski

unread,
May 27, 2012, 12:37:08 PM5/27/12
to django-d...@googlegroups.com
Hi,

This week I started coding my project. It' available on branch
soc2012-serialization on https://github.com/grapo/django.

I'm not very familiar with git so I'm not suer that I do it right:
* I forked django repo from github
* clone it to my computer
* create new branch soc2012
* work in this branch
* push it to origin

When I want to synchronize my branch with django trunk I will fetch
master from upstream (django/django) and merge master to my branch.
It's this flow good?

Until now I coded base for Serializers and Fields. I don't include any
test or documentation so it can be hard to try it. I am pretty sure that
writing appropriate docstring will be a challenge for me :) I copied
some metaclass code from django forms and models. You can instantiate
ObjectSerializer and try to serialize some simple python objects with
it. It will serializer all fields presented in object.__dict__ and
return python native datatype. The code is still in early phase so it's
not polished and need for some refactor but if You have some tips for me
I will be very grateful.

Next week I will fix some issues, code ModelSerializer and write
documentation and test for what I done so far. I must also think about
renaming some functions so the API will be more convenient.

--
Piotr Grabowski

Anssi Kääriäinen

unread,
May 27, 2012, 1:18:13 PM5/27/12
to Django developers
On May 27, 7:37 pm, Piotr Grabowski <grabowski...@gmail.com> wrote:
> Hi,
>
> This week I started coding my project. It' available on branch
> soc2012-serialization onhttps://github.com/grapo/django.
>
> I'm not very familiar with git so I'm not suer that I do it right:
> * I forked django repo from github
> * clone it to my computer
> * create new branch soc2012
> * work in this branch
> * push it to origin
>
> When I want to synchronize my branch with django trunk I will fetch
> master from upstream (django/django) and  merge master to my branch.
> It's this flow good?

I think that is a good way to go. It might be the branch will need
some history rewriting when it is otherwise ready for commit, but
until then keeping your history intact so that others can easily
follow you work is good. One advice I have seen is that you should not
merge upstream changes too often, it will just mess up the history.
You can easily enough create another branch where you test how your
work interacts with master branch. Only merge your soc2012 branch if
upstream changes are such that your work needs major changes by them.
Trivial merge conflicts do not require merging upstream back.

Another option is rebase workflow for the branch, but in this case you
should make it absolutely clear that others should not consider your
github branch as anything else than a convenient way to publish pa
your work as patch-series. The good thing about this way of working is
that your changes will be on top of the commit log all the time, and
thus it is very easy to see what you have done in your branch.

- Anssi

Russell Keith-Magee

unread,
May 28, 2012, 8:28:48 PM5/28/12
to django-d...@googlegroups.com
Hi Piotr;

Apologies for the delay in responding to your updated API.

On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski <grabow...@gmail.com> wrote:
> I do some changes to my previous API: (https://gist.github.com/2597306 <-
> change are included)
>
>  * which fields of object are default serialized. It's depend on
> include_default_field but opposite to Tom Christie solution it's default
> value is True so all fields (eventually specified in Meta.model_fields) are
> present

Field options:
~~~~~~~~~~

* There's a complication here that doesn't make sense to me.
Following your syntax, the following would appear to be legal:

class FieldA(Field):
def serialize(…):
def deserialize(…):

class FieldB(Field):
to = FieldA()

def serialize(…):
def deserialize(…):

class FieldC(Field):
to = FieldB(attribute=True)

def serialize(…):
def deserialize(…):

i.e., if Field allows declaration style definitions, and Field can be
*used* in declaration style definitions, then it's possible to define
them in a nested fashion -- at which point, it isn't clear to me what
is going to be output.

It seems to me that "attribute" shouldn't be an option on a field
declaration; it should either be something that's encompassed in a
similar way to serialise/deserialize (i.e., either additional
input/output from the serialise methods, or a parallel pair of
methods), or the use of a Field as a declarative definition implies
that it is of type attribute, and prevents the use of field types that
themselves have attributes.

Field methods:
~~~~~~~~~~~

* serialize_value(), deserialize_value(); this is bike shedding, but
is there any reason to not use just "serialize() and deserialize()"?

ObjectSerializer methods:

* Why does ObjectSerializer have options at all? How can it be "meta"
operating on a generic object? Consider -- if you pass in an instance
of an object, you'll need to use obj.field_name to access fields; if
you pass in a dictionary, you'll need to use obj['field_name']. And if
you're given a generic object what's the list of default fields to
serialize?

Like I said last time, ObjectSerializer should be completely
definition based. Look at Django's Form base class - it has no "meta"
concept -- it's fully declaration based. Then there's ModelForm, which
has a meta class; but the output of the ModelForm could be completely
manually generated using a base Form.

* I mentioned this last time -- why is class_name a meta option,
rather than a method on the base class with a default implementation?
Having it as an Meta attribute

* I'm not wild about the way related_serializer seems to work,
either. Again, like class_name, it seems like it should be a method,
not an option. By making it an option, you're assuming that it will
have a single obvious value, which definitely won't be true -- e.g., I
have an object with relations to users, groups and permissions; I want
to output users as a list of nested objects, permissions as a list of
natural keys, and groups as a list of primary keys.

* I'm not sure I see why include_default_fields is needed. Isn't this
implied by the values for "fields" and "exclude"? i.e., if fields or
exclude is defined, you're not including everything by default;
otherwise you are. Why the additional setting? What's the interaction
of include_default_fields with fields and exclude?

* I don't understand what follow_object is trying to do. Isn't the
issue here whether you use a serializer that just outputs a primary
key, or an object that outputs field values? And if it's the latter,
the sub-serializer determines how deep things go?

ModelSerializer options:

* I'm really not a fan of model_fields. This seems like a short cut
that will make the implementation a whole lot more complex, and
ultimately is much less explicit than just naming the fields that you
want to serialize.

> I'm aware that there will be lot of small issues but I believe that ideas
> are good.

I'm still optimistic, but there's still some fundamental issues here
-- in particular, the existence of Meta on ObjectSerializer, and the
way that attributes on XML tags are being handled. I don't think we've
hit any blockers, but we need to get these sorted out before you start
producing too much code.

Yours,
Russ Magee %-)

Piotr Grabowski

unread,
May 30, 2012, 6:52:00 PM5/30/12
to django-d...@googlegroups.com
W dniu 29.05.2012 02:28, Russell Keith-Magee pisze:
> Hi Piotr;
>
> Apologies for the delay in responding to your updated API.
>
> On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski<grabow...@gmail.com> wrote:
>> I do some changes to my previous API: (https://gist.github.com/2597306<-
>> change are included)
>>
>> * which fields of object are default serialized. It's depend on
>> include_default_field but opposite to Tom Christie solution it's default
>> value is True so all fields (eventually specified in Meta.model_fields) are
>> present
> Field options:
> ~~~~~~~~~~
>
> * There's a complication here that doesn't make sense to me.
> Following your syntax, the following would appear to be legal:
>
> class FieldA(Field):
> def serialize(�):
> def deserialize(�):
>
> class FieldB(Field):
> to = FieldA()
>
> def serialize(�):
> def deserialize(�):
>
> class FieldC(Field):
> to = FieldB(attribute=True)
>
> def serialize(�):
> def deserialize(�):
>
> i.e., if Field allows declaration style definitions, and Field can be
> *used* in declaration style definitions, then it's possible to define
> them in a nested fashion -- at which point, it isn't clear to me what
> is going to be output.
>
> It seems to me that "attribute" shouldn't be an option on a field
> declaration; it should either be something that's encompassed in a
> similar way to serialise/deserialize (i.e., either additional
> input/output from the serialise methods, or a parallel pair of
> methods), or the use of a Field as a declarative definition implies
> that it is of type attribute, and prevents the use of field types that
> themselves have attributes.
In example that You present I thought about raising an exception when the FieldC is defined. Another option is to define class as being attribute:

class FieldB(Field):
to = FieldA()

def serialize(�):
def deserialize(�):

class Meta:
attribute=True

Then raise an exception when FieldB is defined because of 'to' field. Still one of my principle is to have one Serializer for all formats (or at least possibility to serialize Serializer in each format) and attribute is something really problematic.

About value returns by Field.serialize (Serializer.serialize in general) - now it is dict with key __attribute__, maybe better will be to return tuple (dict/field_value, attributes_dict) because of issues if there is no field_name and attributes are present.



>
> Field methods:
> ~~~~~~~~~~~
>
> * serialize_value(), deserialize_value(); this is bike shedding, but
> is there any reason to not use just "serialize() and deserialize()"?
I'm using serialize and deserialize in my code.
Serializer.serialize(...) returns native python datatype. It's matter
of naming but in my opinion serialize is method that should return
serialized Field/ObjectSerializer not only part of result
(serialized_value returns only part of data needed for Field serialization)

>
> ObjectSerializer methods:
>
> * Why does ObjectSerializer have options at all? How can it be "meta"
> operating on a generic object? Consider -- if you pass in an instance
> of an object, you'll need to use obj.field_name to access fields; if
> you pass in a dictionary, you'll need to use obj['field_name']. And if
> you're given a generic object what's the list of default fields to
> serialize?
>
> Like I said last time, ObjectSerializer should be completely
> definition based. Look at Django's Form base class - it has no "meta"
> concept -- it's fully declaration based. Then there's ModelForm, which
> has a meta class; but the output of the ModelForm could be completely
> manually generated using a base Form.
Ok, I think I get this idea finally. Before I think about class Meta
more like options for class where it is. ObjectSerializer now is more
like ModelForm than like Form. I have idea how to rewrite it and I will
notice You when it will be done.
> * I mentioned this last time -- why is class_name a meta option,
> rather than a method on the base class with a default implementation?
> Having it as an Meta attribute
I answered You last time, I should add this to proposal. Probably I
don't understand the issue.

get_class(self, data):
if self._meta.class_name is not None:
if isinstance(self._meta.class_name, str):
return object_from_string(data['self._meta.class_name'])
else:
return self._meta.class_name
raise Exception('No class for deserialization provided')

If someone wants more sophisticated class from data resolving then he
can override get_class.

When I rewrite ObjectSerializer it will be different than this but my
idea is to have class_name as short cut for writing method get_class.

>
> * I'm not wild about the way related_serializer seems to work,
> either. Again, like class_name, it seems like it should be a method,
> not an option. By making it an option, you're assuming that it will
> have a single obvious value, which definitely won't be true -- e.g., I
> have an object with relations to users, groups and permissions; I want
> to output users as a list of nested objects, permissions as a list of
> natural keys, and groups as a list of primary keys.
related_serialized is default way to serialized related field:

class MySerializer(Serializer):
users = NestedSerializer()
permissions = NaturalKeyField()

class Meta:
related_serializer=PkField()

In Your example where there are only 3 related fields is unnecessary but
if I have object with many related fields, each of them should be
serialized as pk and only few as for example natural keys?
related_serializer is quite useful in that case.

In Serializer it is actually method for getting field serializer. (Of
course it will be expanded)

def get_serializer_for_field(self, field_name): # for all fields that
has not defined serializers
return self.opts.field_serializer()


>
> * I'm not sure I see why include_default_fields is needed. Isn't this
> implied by the values for "fields" and "exclude"? i.e., if fields or
> exclude is defined, you're not including everything by default;
> otherwise you are. Why the additional setting? What's the interaction
> of include_default_fields with fields and exclude?
Yes, you right. I will change that.
>
> * I don't understand what follow_object is trying to do. Isn't the
> issue here whether you use a serializer that just outputs a primary
> key, or an object that outputs field values? And if it's the latter,
> the sub-serializer determines how deep things go?
Tom Christie asked me a question:
> In your dump data serializer, how do you distinguish that the 'fields'
> field is the entire object being serialized rather than the 'fields'
> attribute of the object being serialized?
It was related to:
class YJDumpDataSerializer(ModelSerializer):
pk = PkField(attribute=True)
model = ModelNameField(attribute=True)
fields = ModelFieldsSerializer()

I don't understand him and gave wrong answer.
Suppose YJDumpDataSerializer is serializing object X. The question is on
which object should ModelFieldSerializer work - on object returned by
X.fields or on X itself. Default it's X.fields. In this case
ModelFieldsSerialize should work on X. How can I tell
ModelFieldsSerializer that it should work on X itself?
follow_object=False is for that. Maybe it should be given in
instantiation: ModelFieldSerializer(follow_object=False) ?

> ModelSerializer options:
>
> * I'm really not a fan of model_fields. This seems like a short cut
> that will make the implementation a whole lot more complex, and
> ultimately is much less explicit than just naming the fields that you
> want to serialize.
>
I thought it will be useful short cut but now I can agree with you. I
will change that
>> I'm aware that there will be lot of small issues but I believe that ideas
>> are good.
> I'm still optimistic, but there's still some fundamental issues here
> -- in particular, the existence of Meta on ObjectSerializer, and the
> way that attributes on XML tags are being handled. I don't think we've
> hit any blockers, but we need to get these sorted out before you start
> producing too much code.
>
> Yours,
> Russ Magee %-)
>

--
Piotr Grabowski

Piotr Grabowski

unread,
Jun 4, 2012, 4:21:08 PM6/4/12
to django-d...@googlegroups.com
Hi,

Sorry for being late with weekly update. Due to some issues with Meta
and my wrong understanding of metaclasses that Russell pointed I spend
time on enhance my knowledge about this. I rewrote also some part of
code that I have written week before.
This week I will do what I was suppose to do last week - initial tests,
documentations. After this week serialization should work with simple
objects.


--
Piotr Grabowski

Piotr Grabowski

unread,
Jun 11, 2012, 3:12:55 PM6/11/12
to django-d...@googlegroups.com
Hi!

This week I managed to write deserialization functions and tests.

Issues with deserialization
Working on deserialization give me a lot thoughts about previous concepts. I rewrite Field class so now Field can't be nested. Field can only have subfields if subfields are attributes:
class ContentField(Field):
��� title = Field(attribute=True) # valid
��� content = Field() # invalid -> raise exception in class declaration time

���� def serialized_value(...):
���� ...

Of course if ContentField is initialized as attribute and have subfields exception is raised (when ContentField is initialized)

I changed python datatype format returned from serializer.serialize method. Previously it was dict with serialized fields (label or field name as key) and special key __attributes__ with dict of attributes. Now it is tuple (native, attributes) where native is dict with serialized fields (or generator of dicts)

serializer.deserialize always return object instance

After first phase of serialization, python_serialized_object will be serialized by NativeFormat instance. Each format (json, xml, yaml, ...) have one NativeFormat that will translate python_serialized_object to serialized_string. I want to be able to do this:
object -> python_serial = object_serializer.serialize(object) -> string_serial = native_format.serialize(python_serial) -> python_deserial = native_format.deserialize(string_serial) -> object2 = object_serializer.deserialize(python_deserial)
object2 has same content as object

Now I have:
object -> python_serial = object_serializer.serialize(object) ->� object2 = object_serializer.deserialize(python_deserial)

Tests
I wrote some tests (NativeSerializersTests) for ObjectSerializer in django/tests/modeltests/serializers/tests.py but I'm not sure this is good place for them. I used model (Article) defined in models.py but I used it like normal object. Relation fields aren't serialized in proper way.

Until now I tested the most important functions of ObjectSerializer. Creating custom fields, attributes, rename fields (using labels).

Next I want to resolve issues with:
  • Instance creation when deserialize. I have create_instance method and Meta.class_name. I must do some public API from them.
  • Ensure that Field serialize method returns always simple native python datatypes
  • Write NativeFormat for (at least) json
  • Find better names for already defined classes, methods and files
  • More tests and documentation

When I do this serialization and deserialization will be more or less done for (non model) python objects.


--
Piotr Grabowski





Piotr Grabowski

unread,
Jun 19, 2012, 4:48:37 PM6/19/12
to django-d...@googlegroups.com
Hi!

This week I wrote simple serialization and deserialization for json format so it's possible now to encode objects from and to json:


import django.core.serializers as s

class Foo(object):
��� def __init__(self):
������� self.bar = [Bar(), Bar(), Bar()]
������� self.x = "X"

class Bar(object):
��� def __init__(self):
������� self.six = 6

class MyField2(s.Field):
��� def deserialized_value(self, obj, instance,� field_name):
������� pass

class MyField(s.Field):
��� x = MyField2(label="my_attribute", attribute=True)

��� def serialized_value(self, obj, field_name):
������� return getattr(obj, field_name, "No field like this")

��� def deserialized_value(self, obj, instance,� field_name):
������� pass

class BarSerializer(s.ObjectSerializer):
��� class Meta:
������� class_name = Bar

class FooSerializer(s.ObjectSerializer):
��� my_field=MyField(label="MYFIELD")
��� bar = BarSerializer()
��� class Meta:
������� class_name = Foo


foos = [Foo(), Foo(), Foo()]
ser = s.serialize('json', foos, serializer=FooSerializer, indent=4)
new_foos = s.deserialize('json', ser, deserializer=FooSerializer)


There are cases that I don't like:
  • deserialized_value function with empty content - what to do with fields that we don't want to deserialize. Should be better way to handle this,
  • I put list foos but return generator new_foos, also bar in Foo object is generator, not list like in input. Generators are better for performance but if I put list in input I want list in output, not generator. I don't know what to do with this.


Next week I will handle rest of issues that I mentioned in my last week check-in and refactor json format (de)serialization - usage of streams and proper parameters handling (like indent, etc.)

--
Piotr Grabowski




Tom Christie

unread,
Jun 20, 2012, 7:50:45 AM6/20/12
to django-d...@googlegroups.com
if I put list in input I want list in output, not generator

I wouldn't worry about that.  The input and output should be *comparable*, but it doesn't mean they should be *identical*.
A couple of cases for example:

*) You should be able to pass both lists and generator expressions to a given serializer, but they'll end up with the same representation - there's no way to distinguish between the two cases and deserialize accordingly. 
*) Assuming you're going to maintain backwards compatibility, model instances will be deserialized into django.core.serializer.DeserializedObject instances, rather than deserializing directly back into complete model instances.  These match up with the original serialized instances, but they are not identical objects. 

deserialized_value function with empty content

Are you asking about how to be able to differentiate between a field that deserializes to `None`, and a field that doesn't deserialize a value at all?  I'd suggest that the deserialization hook for a field needs to take eg. the dictionary that the value should be deserialized into, then it can determine which key to deserialize the field into, or simply 'pass' if it doesn't deserialize a value.

> I changed python datatype format returned from serializer.serialize method.  Now it is tuple (native, attributes)

I'm not very keen on either this, or on the way that attributes are represented as fields.
To me this looks like taking the particular requirements of serializing to xml, and baking them deep into the API, rather than treating them as a special case, and dealing with them in a more decoupled and extensible way.

For example, I'd rather see an optional method `attributes` on the `Field` class that returns a dictionary of attributes.  You'd then make sure that when you serialize into the native python datatypes prior to rendering, you also have some way of passing through the original Field instances to the renderer in order to provide any additional metadata that might be required in rendering the basic structure.

Wiring up things this way around lets you support other formats that have extra information attached to the basic structure of the data.  As an example use-case - In addition to json, yaml and xml, a developer might also want to be able to serialize to say, a tabular HTML output.  In order to do this they might need to be able attach template_name or widget information to a field, that'd only be used if rendering to HTML.

It might be that it's a bit late in the day for API changes like that, but hopefully it at least makes clear why I think that treating XML attributes as anything other than a special case isn't quite the right thing to do.  - Just my personal opinion of course :)

Regards,

  Tom

Piotr Grabowski

unread,
Jun 20, 2012, 11:28:51 AM6/20/12
to django-d...@googlegroups.com
W dniu 20.06.2012 13:50, Tom Christie pisze:

deserialized_value function with empty content

Are you asking about how to be able to differentiate between a field that deserializes to `None`, and a field that doesn't deserialize a value at all?
No :) I had this problem before and I managed to resolve it - default deserialized_value don't returns anything. It sets the field value.
    def deserialized_value(self, obj, instance, field_name):
        setattr(instance, field_name, obj)

It is the way I am doing deserialization - pass instance to subfields, retrieve it from them (should be same instance, but in specific cases eg. immutable instance, I can imagine that another instance of same class is returned)  and return it.

If I don't declare deserialized_value function then function from base class is taken. It's expected behavior. So how to say "This field shouldn't be deserialized".  Now I declare:
    def deserialized_value(self, obj, instance, field_name):
        pass
For true, I can do anything in this function excepting set some value to instance, but declaring function only to say "do nothing"  isn't good solution for me.



> I changed python datatype format returned from serializer.serialize method.  Now it is tuple (native, attributes)

I'm not very keen on either this, or on the way that attributes are represented as fields.
To me this looks like taking the particular requirements of serializing to xml, and baking them deep into the API, rather than treating them as a special case, and dealing with them in a more decoupled and extensible way.

For example, I'd rather see an optional method `attributes` on the `Field` class that returns a dictionary of attributes.  You'd then make sure that when you serialize into the native python datatypes prior to rendering, you also have some way of passing through the original Field instances to the renderer in order to provide any additional metadata that might be required in rendering the basic structure.

Wiring up things this way around lets you support other formats that have extra information attached to the basic structure of the data.  As an example use-case - In addition to json, yaml and xml, a developer might also want to be able to serialize to say, a tabular HTML output.  In order to do this they might need to be able attach template_name or widget information to a field, that'd only be used if rendering to HTML.

It might be that it's a bit late in the day for API changes like that, but hopefully it at least makes clear why I think that treating XML attributes as anything other than a special case isn't quite the right thing to do.  - Just my personal opinion of course :)

Regards,

  Tom


You right that I shouldn't treated attributes so special. I have idea how to fix this. Where I returned (native, attributes) I will return (native, metainfo). It's only matter of renaming but metainfo will be more than attributes. In xml metainfo can contains attributes for field, in html it can be template_name or widget for rendering. If I don't use metainfo in my serializer class then it's still universal - can be used for serialization to any format.

How to create metainfo? Have a method `metainfo' in `Field` class that returns a dictionary seems to be good idea. And it is for this use-cases for html. But what to do with xml attributes again? :) They aren't only field meta informations but they can also contains instance information valuable in deserialization (like instance pk in current django solution) so they should be treated as fields, should have access to instance in serialization and deserialization.

 My last thought is that attributes should be treated as normal fields and be in tuple's native object and in metainfo there will be information for xml which fields in native should be rendered as attributes.
After first phase:
native =={
    'field_1' : value1,
    'field_2' : value2,
    'field_3' : value3,
}
metainfo == {
    'as_attributes' : ['field_2', 'field_3'],
    'template_name' : 'my_template'
}

So if we use json in second phase field_2 and field_3 will be render same way as field_1 because json don't read metainfo. Xml will  render fields according to metainfo['as_attributes']. Html will render native dict using my_template.

--
Piotr Grabowski
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/XwdU_QQYDmAJ.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.


Tom Christie

unread,
Jun 26, 2012, 5:52:27 AM6/26/12
to django-d...@googlegroups.com
> default deserialized_value don't returns anything. It sets the field value.

Cool, that's exactly what I meant.

> but declaring function only to say "do nothing" isn't good solution for me.

<shrug> Declaring a method to simply 'pass' seems fine to me if you want to override it to do nothing.

> It is the way I am doing deserialization - pass instance to subfields

Seems fine.  It's worth keeping in mind that there's two ways around of doing this.

1. Create an empty instance first, then populate it with the field values in turn.
2. Populate a dictionary with the field values first, and then create an instance using those values.

The current deserialization does something closer to the second.
I don't know if there's any issues with doing things the other way around, but you'll want to consider which makes more sense.

> Where I returned (native, attributes) I will return (native, metainfo). It's only matter of renaming but metainfo will be more than attributes.

Again, there's two main ways around I can think of for populating metadata such as xml attributes.

1. Return the metadata upfront to the renderer.
2. Include some way for the renderer to get whatever metadata it needs at the point it's needed.

This is one point where what I'm doing in django-serializers differs from your work, in that rather than return extra metadata upfront, the serializers return a dictionary-like object (that e.g. can be directly serialized to json or yaml), that also includes a way of returning the fields for each key (so that e.g. the xml renderer can call field.attributes() when it's rendering each field.)

Again, you might decide that (1) makes more sense, but it's worth considering.

As ever, if there's any of this you'd like to talk over off-list, feel free to drop me a mail - t...@tomchristie.com

Regards,

  Tom
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to django-developers+unsubscribe@googlegroups.com.

Piotr Grabowski

unread,
Jun 28, 2012, 10:38:46 AM6/28/12
to django-d...@googlegroups.com
W dniu 26.06.2012 11:52, Tom Christie pisze:
> > It is the way I am doing deserialization - pass instance to subfields
>
> Seems fine. It's worth keeping in mind that there's two ways around
> of doing this.
>
> 1. Create an empty instance first, then populate it with the field
> values in turn.
> 2. Populate a dictionary with the field values first, and then create
> an instance using those values.
>
> The current deserialization does something closer to the second.
> I don't know if there's any issues with doing things the other way
> around, but you'll want to consider which makes more sense.
>
Second approach assume that every field returns some value. But what if
we don't want to deserialize some field? In my deserialization instance
is passed to field and field will eventually fill it with some value.
def deserialize_value(self, obj, instance, field_name):
setattr(instance, field_name, obj)

If we don't want to deserialize field we simply do nothing in
deserialize_value.
If second approach is used we must return value. Some idea is to mark
field as not deserializable:
class MyField(Field):
deserializable = False


> > Where I returned (native, attributes) I will return (native,
> metainfo). It's only matter of renaming but metainfo will be more than
> attributes.
>
> Again, there's two main ways around I can think of for populating
> metadata such as xml attributes.
>
> 1. Return the metadata upfront to the renderer.
> 2. Include some way for the renderer to get whatever metadata it needs
> at the point it's needed.
>
> This is one point where what I'm doing in django-serializers differs
> from your work, in that rather than return extra metadata upfront, the
> serializers return a dictionary-like object (that e.g. can be directly
> serialized to json or yaml), that also includes a way of returning the
> fields for each key (so that e.g. the xml renderer can call
> field.attributes() when it's rendering each field.)
>
> Again, you might decide that (1) makes more sense, but it's worth
> considering.
>
> As ever, if there's any of this you'd like to talk over off-list, feel
> free to drop me a mail - t...@tomchristie.com
>
> Regards,
>
> Tom
>
I rewrite this so it's more similar to django-serializers.
But from the beginning - what I do in this week? :)
I agreed that xml attributes in my solution are overstated. So I want
to modify it. Attributes in xml are one of (two) ways of presenting
information. I still want to have field for attributes, but doing it in
this way:

class MyField(Field):
attr1 = Field()
attr2 = Field()

def serialized_value(self, obj, field_name):
return field_value

def metainfo(self):
return {'attributes' : ['attr1', 'attr2']}


JSON will skip attributes at all:
some_field : field_value

XML will render it:
<some_field attr1=val1 attr2=val2>
field_value
</some_field>

If metainfo won't return dict with attributes XML will render this:
<some_field>
<attr1>val1</attr1>
<attr2>val2</attr2>
field_value
</some_field>

I code it like django-serializers's DictWithMeta but I added one more
functionality to represent Field that have subfields and one extra
value. I'm still not convicted it is good solution, so I rewrite it
several times but always end up with something like that :)
I will push code tomorrow because I still want to do some tweaks.

--
Piotr Grabowski






Piotr Grabowski

unread,
Jul 10, 2012, 8:18:59 PM7/10/12
to django-d...@googlegroups.com
Hi,

It is time to midterm evaluation of my participation in gsoc so I want
to summarize in this check-in what I have done in last month.
https://gist.github.com/3085250 - here is something that can be
"documentation". I wrote some examples of ModelSerializer usage and how
it should work.
https://github.com/grapo/django - in branch soc2012-serialization is
code that I wrote.

There is still problem with API and how to do some things but in my
opinion it's going in right direction.

Serialization and deserialization of Python objects is almost done.
There is quite stable API, i used some ideas (and little code) from
https://github.com/tomchristie/django-serializers
Objects are serialized to metadicts which are dicts with additional
data. this additional data can be used by format serializer to change
presentation of data (e.g. attributes in xml)

Serialization of Django models is started. I don't know what fields of
model should be serialized by default: for sure all declared in model
fields. What with pk field, reverse related fields?

Json dumpdata serializer is more or less written - I have not done
fields sorting yet.

I am sure that I can finish all this work until gsoc end.

Sadly not all is going well. Especially my communication in this list
and with my mentor should be improved. It's all by my fault. I should
wrote check-ins more regularly and meet the deadlines that I set. I am
not very satisfied with progress I have made. It can be done much more
in about one and a half month.

Regards,
Piotr Grabowski





Russell Keith-Magee

unread,
Jul 11, 2012, 8:04:13 AM7/11/12
to django-d...@googlegroups.com
On Wed, Jul 11, 2012 at 8:18 AM, Piotr Grabowski <grabow...@gmail.com> wrote:
> Hi,
>
> It is time to midterm evaluation of my participation in gsoc so I want to
> summarize in this check-in what I have done in last month.
> https://gist.github.com/3085250 - here is something that can be
> "documentation". I wrote some examples of ModelSerializer usage and how it
> should work.
> https://github.com/grapo/django - in branch soc2012-serialization is code
> that I wrote.

It's good that you're starting to work on some documentation -- my
feedback is that you need to think about the purpose of this
documentation -- I can discover the API myself with Python's
interactive shell; what that won't tell me is what output I will
expect.

For example, you give an example of how to defined a 'metadata'
method, but you don't show the effect of adding that declaration on
the output serialised object. In fact, there doesn't seem to be a
single example of serialised *output* in the whole docs.

Giving lots of code examples of input doesn't really help me unless I
know how that input will shape the output. This is especially
important when we're dealing with serializers.

> There is still problem with API and how to do some things but in my opinion
> it's going in right direction.

Generally, I agree. I still have some concerns however; mostly around
the things that you're putting onto the Meta class.

related_serializer, for example -- Why is this a single attribute in
the meta, rather than a method? By using an attribute, you're saying
that on any given serializer, *all* related objects will be serialised
the same, and I don't see why that should be the case.

The same argument goes for class_name (which I think I've mentioned
before), field_serializer, and so on. The only fields that I can see
that *should* be declarative are 'fields' and 'exclude' -- and if
you've been tracking django-dev recently, there's been a discussion
about whether the idea of 'exclude' should be deprecated from Django
APIs (due to potential security issues -- explicit inclusion is safer
than implicit inclusion, because you can accidentally forget to
exclude sensitive data from an output list)

Some other API questions:

Why is deserialized_value decoupled from set_object? It isn't obvious
to me why this separation exists.

I see where you're going with metainfo on fields (and that's a
reasonably elegant way of tackling the problem of XML needing
additional info to serialize), but what is the purpose of metadata on
Serializers?

> Serialization and deserialization of Python objects is almost done. There is
> quite stable API, i used some ideas (and little code) from
> https://github.com/tomchristie/django-serializers
> Objects are serialized to metadicts which are dicts with additional data.
> this additional data can be used by format serializer to change presentation
> of data (e.g. attributes in xml)
>
> Serialization of Django models is started. I don't know what fields of model
> should be serialized by default: for sure all declared in model fields. What
> with pk field, reverse related fields?

Your goal here should be to exactly replicate Django's existing
serializers. That means serialising all local model fields, with the
PK being handled as a special case; reverse related fields aren't
included.

> Json dumpdata serializer is more or less written - I have not done fields
> sorting yet.
>
> I am sure that I can finish all this work until gsoc end.
>
> Sadly not all is going well. Especially my communication in this list and
> with my mentor should be improved. It's all by my fault. I should wrote
> check-ins more regularly and meet the deadlines that I set. I am not very
> satisfied with progress I have made. It can be done much more in about one
> and a half month.

My sincere apologies for not responding as often as I should. I
haven't been a very good mentor for this project. I'll try and improve
for the second half of the GSoC.

I can see you've been getting some feedback from Tom Christie; the
good news is that I'm generally in agreement with the feedback he's
been giving you, so he hasn't been leading you astray :-)

If you ever want to get my attention for a solid block of time to kick
around an idea, you can alway grab me on IRC. I lurk in #django-dev
most of the time.

Yours,
Russ Magee %-)

Piotr Grabowski

unread,
Jul 12, 2012, 8:59:01 AM7/12/12
to django-d...@googlegroups.com
W dniu 11.07.2012 14:04, Russell Keith-Magee pisze:
>> There is still problem with API and how to do some things but in my opinion
>> it's going in right direction.
> Generally, I agree. I still have some concerns however; mostly around
> the things that you're putting onto the Meta class.
>
> related_serializer, for example -- Why is this a single attribute in
> the meta, rather than a method? By using an attribute, you're saying
> that on any given serializer, *all* related objects will be serialised
> the same, and I don't see why that should be the case.
Not *all* related objects but only those that aren't declared in class
definition. I think related_serializer attribute is useful when you want
to serialize all related object in one way: to their's primary key
value, to their's natural key value, to dumpdata format. If you want to
do exception for some fields then you declare it in class definition.


class MySerializer(ModelSerializer):
special_object = SpecialSerializer()
class Meta:
related_serializer = PkSerializer

In this case all related objects except special_object will be
serialized to pk value.

What you will do more with a related_serializer method? If you want to
serialize some related objects by one serializer and some by another the
simplest way to do it is declare this in class definition.
I see only two examples when method will be needed. If you want to get
serializer by some pattern in field name or if you want to get
serializer by related object type (m2m, fk). Then you can override
get_object_field_serializer(self, obj, field_name) method to do it.
Default this method return related_serializer or field_serializer based
on field type. Maybe good idea will be to split this method to two, one
for related object and one for non related. Then overriding it will be
very similar to set attribute in Meta, but I think attributes are more
"declarative".
>
> The same argument goes for class_name (which I think I've mentioned
> before), field_serializer, and so on.
And there is method for that :)

def create_instance(self, serialized_obj):
if self.opts.class_name is not None:
if isinstance(self.opts.class_name, str):
return _get_model(serialized_obj[self.opts.class_name])()
else:
return self.opts.class_name()
raise base.DeserializationError(u"Can't resolve class for object
creation")

Maybe it isn't proper way to do this - there is two ways to doing same
operation, but I think this is simplest solution for end user.

> The only fields that I can see
> that *should* be declarative are 'fields' and 'exclude' -- and if
> you've been tracking django-dev recently, there's been a discussion
> about whether the idea of 'exclude' should be deprecated from Django
> APIs (due to potential security issues -- explicit inclusion is safer
> than implicit inclusion, because you can accidentally forget to
> exclude sensitive data from an output list)
I have read this discussion. I'm +1 to deprecate 'exclude' :) Personally
I almost never use it.

>
> Some other API questions:
>
> Why is deserialized_value decoupled from set_object? It isn't obvious
> to me why this separation exists.
It's possible that I overcomplicated this. There is three methods:
set_object, deserialize and deserialize_value. When you want to
deserialize object then you should:
* Ensure that this is proper object not list of objects or dict (dict in
deserialization is another problem - I will present it below) -
'deserialization' method will handle this - it recursively deserialize
lists and dicts.
* Do some processing on object you get ( e.g. change string to int)
'deserialize_value' method will handle this
* Set this object to upper level object. 'set_object' method will handle
this. There shouldn't be reason to override it very often.

I think deserialize_value will be method that user would most often
needed to override.
I would be acquiescent to merge deserialize and deserialize_value. But
set_object should be left as is.

Problem with deserializing dict:
In current implementation in deserialization there is no way to guess
that given dict is serialized object or it is dict of objects. So it
might be better to don't automatically serialize dicts but leave it to
the user decision?

>
> I see where you're going with metainfo on fields (and that's a
> reasonably elegant way of tackling the problem of XML needing
> additional info to serialize), but what is the purpose of metadata on
> Serializers?
>
> Yours, Russ Magee %-)

Because Serializer should also have possibility to give additional info
to format serializer. For example which fields should be treat as
attributes (pk and model in dumpdata).


--
Piotr Grabowski

Piotr Grabowski

unread,
Aug 6, 2012, 7:13:25 PM8/6/12
to django-d...@googlegroups.com
Hi,

In the past 3 weeks, my project has changed a lot. First of all I
changed output of first phase of serialization. Previously it was python
native datatypes. At some point I added dictionary with metadata to it.
Metadata was used in second phase of serialization. Now after first
phase I returned ObjectWithMetadata which is wrapping for python native
datatypes. It's a bit hackish so I don't know it is good solution:

class ObjectWithMetadata(object):
def __init__(self, obj, metadata=None, fields=None):
self._object = obj
self.metadata = metadata or {}
self.fields = fields or {}

def get_object(self):
return self._object

def __getattribute__(self, attr):
if attr not in ['_object', 'metadata', 'fields', 'get_object']:
return self._object.__getattribute__(attr)
else:
return object.__getattribute__(self, attr)

# there is a few more methods like this (for acting like a
MutableMapping and Iterabla) and all are similar
def __getitem__(self, key):
return self._object.__getitem__(key)

...

Thanks to this solution, ObjectWithMetadata is acting like object stored
in _object in almost all cases (also at isinstance tests), and there is
place for storing additional data.

I didn't change deserialization so in output there are python native
datatypes without wrapping. I don't know if this is good because there
is no symmetry in this:
Django object -> python native datatype packed in ObjectWithMetadata ->
json -> python native datatype -> Django object


I have all dumpsdata formats working now (xml, json, yaml). All tests
pass, but there is problem with order of fields in yaml. It will be
fixed soon.
I make new format new_xml which is similar to json and yaml. It's easier
to parsing it.

Old:
<object pk="1" model="serializers.article">
<field to="serializers.author" name="author"
rel="ManyToOneRel">1</field>
<field to="serializers.category" name="categories"
rel="ManyToManyRel">
<object pk="1"></object>
<object pk="2"></object>
</field>
</object>

New:
<object pk="1" model="serializers.article">
<fields>
<author to="serializers.author" rel="ManyToOneRel">1</author>
<categories to="serializers.category" rel="ManyToManyRel">
<object>1</object>
<object>2</object>
</categories>
</fields>
</object>

There is also problem with json and serialization to stream because json
is using extensions written in C (_json) for performance and this leads
to exceptions when ObjectWithAttributes is used, so before pass objects
to json.loads these objects should be unpacked from ObjectWithMetadata.


Probably there is no chance to achieve one of most important requirement
which I have specify - using only one Serializer to serialize Django
Models to multiple formats:
serializers.serialize('json', objects, serializer=MySerializer)
serializers.serialize('xml', objects, serializer=MySerializer)

Trouble is with xml (like always ;). In xml every (model) field must be
converted to string before serializing in xml serializer. In json and
yaml if field have protected type (string, int, datetime etc.) then
nothing is done with it. Converting is done in first phase because only
there is access to field.value_to_string - field method that is used to
convert field value to string. It can be override by user so simple
doing smart_unicode in second phase instead isn't enough.


Most important tasks in TODO:
handling natural keys
tests
x correctness
x performance (I suspect my solution will be worse than actual used
in Django, but how much?)
documentation

https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers
--
Piotr Grabowski

Piotr Grabowski

unread,
Aug 22, 2012, 8:14:26 PM8/22/12
to django-d...@googlegroups.com
Hi,

Google Sumer of Code is almost ended. I was working on customizable
serialization. This project was a lot harder than I expected, and sadly
in my opinion I failed to do it right. I want to apologize for that and
especially for my poor communication with this group and my mentor. I
want to improve it after midterm evaluation but it was only worse.

I don't think my project is all wrong but there is a lot things that are
different from how I planned. How it looks like (I wrote more in
documentation)
There is Serializer class that is made of two classes: NativeSerializer
and FormatSerializer.
NativeSerializer is for serialization and deserialization python objects
from/to native python datatypes
FormatSerializer is for serialization and deserialization python native
datatypes to/from some format (xml, json, yaml)

I want NativeSerializer to be fully independent from FormatSerializer
(and vice versa) but this isn't possible. Either NativeSerializer must
return some additional data or FormatSerializer must give
NativeSerializer some context. For exemple in xml all python native
datatypes must be serialized to string before serializing to xml. Some
custom model fields can have more sophisticated way to serialize to
sting than unicode() so `field.value_to_string` must be called and
`field` are only accessible in NativeSerializer object. So either
NativeSerializer will return also `field` or FormatSerializer will
inform NativeSerializer that it handles only text data.

Backward compatible dumpdata is almost working. Only few tests are not
passed, but I am not sure why.

Nested serialization of fk and m2m related fields which was main
functionality of this project is working but not well tested. There are
some issues especially with xml. I must write new xml format because old
wont work with nested serialization.

I didn't do any performance tests. Running full test suite take 40
seconds more with my serialization (about 1500s at all) if I remember
correctly.

I will try to complete this project so it will be at least bug free and
usable. If someone was interested in using nested serialization there is
other great project: https://github.com/tomchristie/django-serializers

Code: https://github.com/grapo/django/tree/soc2012-serialization
Documentation: https://gist.github.com/3085250

--
Piotr Grabowski

Tom Christie

unread,
Aug 24, 2012, 6:24:02 AM8/24/12
to django-d...@googlegroups.com
Thanks Piotr,

  It's been interesting and helpful watching your progress on this project.
I wouldn't worry too much about not quite meeting all the goals you'd hoped for - it's a deceptively difficult task.
In particular it's really difficult trying to maintain full backwards comparability with the existing fixture serialization implementation,
whilst also totally redesigning the API to support the requirements of a more flexible serialization system.
Like you say, I think the overall direction of your project is right, and personally I've found it useful for my own work watching how you've tackled various parts of it.

All the best,

  Tom

Russell Keith-Magee

unread,
Aug 24, 2012, 10:42:49 PM8/24/12
to django-d...@googlegroups.com
Hi Piotr,

Thank you so much for your efforts over the summer.

I'd also like to apologise for my lack of communication; I certainly
haven't been a model mentor over the course of the program.

Although we may not have achieved all the goals we set out to achieve
at the start of the program, I don't think it's been a complete loss
-- we've certainly thrashed out some interesting ideas, and between
your work and Tom's, I'm sure we can salvage something that the
community can make use of.

Now that the program has finished, If you have any feedback about what
we could do differently next year, I'd love to hear it. Obviously,
we'd like every SoC student to be a complete success, so if there's
anything the Django team could do to improve the chances of success
for next year's program, I'd like to be able to learn from the
mistakes of this year.

Yours,
Russ Magee %-)
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-develop...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages