Hi Vivek - sorry about the long reply-wait on this! My initial thoughts
are below.
> The user can define methods beginning with �meta_� to add metadata
> about each field. And functions starting with �meta2_� can be used to
> add metadata at the model level. Here is an example:
>
> ...
>
> The existing implementation of ``model.name`` and ``model.pk`` can be
> described using �meta2_� functions. These will be provided as
> ``meta2_name`` and ``meta2_pk`` to facilitate loading and dumping of
> fixtures.
I'm unclear about what meta2_ accomplishes - is it for things that are
not fields, but still serialisable? Surely there's a better way to go
about this?
> Permission Framework
> =================
> While creating an API, there may arise a need to give varying levels
> of access to data to different people. For this I propose a permission
> framework, where the user can choose to restrict data to certain
> groups while defining a model.
I'm not entirely sure this is something that should be in the same scope
as the main project - adding user permissions into a serialisation
framework feels a bit ugly, especially when it's relatively easy for
people to implement themselves (with the exclude arguments, etc)
> � An extras argument, which would allow properties and data returned
> by some methods to be serialized.
How, exactly? What do you pass in this argument?
> -----------------------------------------------------------------
> Representing the existing serialization model
> -----------------------------------------------------------------
> Here is an implementation of the existing serialization format in
> JSON, this would be the �fixture� mode that I�ve mentioned above.
Presumably you're planning to leave the existing fixture-loading code as
it currently is, given that there's no mention of it here? Are the
customisable serialisers purely for use by other, non-Django
applications in your plan?
> ===================
> Deliverables and Timeline
> ===================
>
> I would be working for about 40-45 hours each week and I would be
> writing tests, exceptions and error messages along with development.
> This would more or less be my timeline:
>
Are you really going to be able to commit 40-45 hours a week? That's a
significant commitment, and more than many full-time jobs (in addition,
I don't see this being 400 man-hours' worth of work - not that that's a
bad thing, we'd rather it was less, as that's a lot of work to commit to)
I also haven't seen any proposals or examples of how I'd use the API as
an end user - are people going to be able to register serialisers to
models (since they're apparently tied to specific models anyway)?
How about if I just want to customise how the serialiser outputs
DateTimeFields, or tell it how to serialise my new, shiny, custom field
- does your proposal have any way to override things on a field type basis?
Those are my initial reactions on the first reading of the proposal with
your change to authentication added in - don't take any criticism too
harshly, we just have to be thorough.
Andrew
What if you need to support both? e.g.,
<field foo="the foo value">
<bar>the bar value</bra>
</field>
It seems to me that you would be better served providing a way to
annotate each individual metadata value as (and I'm bikeshedding a
name here) 'major' or 'minor'. JSON would render all metadata as
key-values, and XML can make the distinction and render minor metadata
as attributes, and major metadata as tags.
I think I see where you're going here. However, I'm not sure it
captures the entire problem.
Part of the problem with the existing serializers is that they don't
account for the fact that there's actually two subproblems to
serialization:
1) How do I output the value of a specific field
2) What is the gross structure of an object, which is a collection of
fields plus, plus metadata about an object, plus
So, for a single object the JSON serializer currently outputs:
{
"pk": 1,
"model": "myapp.mymodel",
"fields": {
"foo": "foo value",
"bar": "bar value"
}
}
Implicit in this format is a bunch of assumptions:
* That the primary key should be rendered in a different way to the
rest of the fields
* That I actually want to include model metadata like the model name
* That the list of fields is an embedded structure rather than a list
of top-level attributes.
* That I want to include all the fields on the model
* That I don't have any non-model or computed metadata that I want to include
When you start dealing with the XML serializer, you have all these
problems and more (because you have the attribute/tag distinction for
each of these decisions, too -- for example, I may want some fields to
be rendered as attributes, and some as tags.
When you start dealing with foreign keys and m2m, you have an
additional set of assumptions --
* How far should I traverse relations?
* Do I traverse reverse relations?
* How do I represent traversed objects? As FK values? As embedded objects?
* If they're embedded objects, how do I represent *their* traversed values?
* What happens with circular relations?
* If I have two foreign keys on the same model, are they both
serialized the same way?
And so on.
There are some promising aspects to your proposal -- for example, the
datatype conversion and field output ideas seem sound (although as
Andrew noted, they may need a little more elaboration with regards to
non-simple datatypes -- especially datetimes and Geo values).
However, I'm not sure you've fully captured the gross serialization
structure problem. This is the real driving reason for introducing a
broader serialization framework -- to give complete flexibility of the
serialization process to the end user.
> ---------------------------------------------------------
> New features in the serialize() function
> ---------------------------------------------------------
> Apart from the changes I’ve proposed for the ``fields`` argument of
> serialize, I would like to add a couple of features:
>
> • An exclude argument, which would be a list of fields to exclude from
> the model, this would also contain the fields to exclude in related
> models.
>
> • An extras argument, which would allow properties and data returned
> by some methods to be serialized.
For me, the goal should be to deprecate these sorts of arguments. The
decision to include (or exclude) a particular field is a feature of
serialization that is intimately tied to the serialization format, not
something that is an external argument.
> -----------------------------------
> Permission Framework
> -----------------------------------
I'm not sure I see the value in this bit -- at least, not as a
baked-in feature of the serialization framework. A serialization
format encompasses "what should I output"; if you've defined a
sufficiently flexible framework, it should be possible to introduce
permission checks without needing to embed them into the base
serialization framework -- they should just be a set of specific
decisions made by a specific serializer.
In fact -- this may be a good test of your proposed API: Could a third
party write a serializer that prohibited serialization of certain
attributes, or modified the serialization of certain attributes, based
on a check of Django's permissions? Personally, I don't see this as a
core requirement, but demonstrating that it is possible in principle
would be a compelling argument for your API.
> -----------------------------------------------------------------
> Representing the existing serialization model
> -----------------------------------------------------------------
> Here is an implementation of the existing serialization format in
> JSON, this would be the ‘fixture’ mode that I’ve mentioned above.
I think these examples demonstrate what I said earlier -- your
proposed framework allows me to customize the name given to a field in
XML, but doesn't allow me to change the parent of that field within
the broader XML structure.
> ===================
> Deliverables and Timeline
> ===================
>
> I would be working for about 40-45 hours each week and I would be
> writing tests, exceptions and error messages along with development.
> This would more or less be my timeline:
Broadly, this timeline looks like a good start. It's certainly
provides enough detail to demonstrate that you've thought about your
project and it's needs and dependencies.
One suggestion -- what isn't clear from this timeline is when we will
start to see concrete evidence of your progress. From a broad project
management perspective, it would be good to see some concrete
deliverables in your timeline -- e.g., at the end of week 2, it will
be possible to serialize a simple object with integer and string
attributes into a configured JSON structure; by week 4, it will be
possible to use the same structure with XML; and so on.
So -- in summary -- this is a promising start. You've clearly given
the problem some serious thought, but some more serious thought is
needed. I look forward to seeing what you can do with the next
iteration.
Yours,
Russ Magee %-)
For a suitably relaxed definition of "field". Remember, serialized
data doesn't necessarily have to come from the model -- it could come
from a related model, or be a constant, or be a computed field, or
many other options.
>> When you start dealing with foreign keys and m2m, you have an
>> additional set of assumptions --
>>
>> * How far should I traverse relations?
>
> The user can specify a limit to the levels of nesting through
> variable ``max_nesting_depth``.
A simple "nesting depth" approach won't work. You really need to
handle this on a per-model basis; Mode
It might be possible to automate some of this with a simple nesting
depth definition, but there will always be a need to define the exact
rollout of a tree of serialization options.
This is also a case where being explicit makes your life easier. If
you stop looking at "depth" as a single number specified at the top of
the tree, it becomes a lot easier to handle recursive or
>> * Do I traverse reverse relations?
>
> In my opinion, traversing reverse relations can get really ugly at
> times, especially when there are M2M fields, foreign keys or circular
> relations involved. But there are some scenarios where the data is in
> a relatively simpler format and serializing them would be useful. To
> support this, I thought of something like this:
>
> class Srz(Serializer):
> ...
> reverse_relations = [ (from_model_type, to_model_type), ... ]
>
> But this should be used with caution and avoided when possible.
>
>> * How do I represent traversed objects? As FK values? As embedded objects?
>
> As embedded objects, if the nesting depth limit is reached, then as FK
> values.
My point is that this is a serialization option. You're dictating a
policy here, rather than allowing it to be a configuration option.
>> * If they're embedded objects, how do I represent *their* traversed values?
>
> Their traversed values would be represented just as a normal model
> would be, with field-value mappings. The user can choose which fields
> to dump.
Again -- you're dictating a policy, not allowing the user to define one.
>> * What happens with circular relations?
>
> For all model type objects, like the base model in the query set and
> all FK and M2M fields, some uniquely identifying data (like the
> primary key, content type) will be stored in a list as each one of
> them is processed. Before serializing a model, it would be checked if
> the model is already on the list or not. If it is there, it is a
> circular reference and that model would be ignored .
>
>> * If I have two foreign keys on the same model, are they both
>> serialized the same way?
>
> Yes.
Why should this be the case? Again, you are dictating policy, not
allowing policy to be defined.
>> When you start dealing with the XML serializer, you have all these
>> problems and more (because you have the attribute/tag distinction for
>> each of these decisions, too -- for example, I may want some fields to
>> be rendered as attributes, and some as tags.
>>
>
> For XML, I thought of using an intermediary container for a node that
> would store all these details.
>
>> > ---------------------------------------------------------
>> > New features in the serialize() function
>> > ---------------------------------------------------------
>> > Apart from the changes I’ve proposed for the ``fields`` argument of
>> > serialize, I would like to add a couple of features:
>>
>> > • An exclude argument, which would be a list of fields to exclude from
>> > the model, this would also contain the fields to exclude in related
>> > models.
>>
>> > • An extras argument, which would allow properties and data returned
>> > by some methods to be serialized.
>>
>> For me, the goal should be to deprecate these sorts of arguments. The
>> decision to include (or exclude) a particular field is a feature of
>> serialization that is intimately tied to the serialization format, not
>> something that is an external argument.
>>
> Initially, I thought the goal was not to tie down a serializer to any
> model, I can integrate these features into the serializer class then.
My point is that it should be *possible* to define a "generic"
serialization strategy -- after all, that's what Django does right
now. If arguments like this do exist, they should essentially be
arguments used to instantiate a specific serialization strategy,
rather than something baked into the serialization API.
>> > -----------------------------------------------------------------
>> > Representing the existing serialization model
>> > -----------------------------------------------------------------
>> > Here is an implementation of the existing serialization format in
>> > JSON, this would be the ‘fixture’ mode that I’ve mentioned above.
>>
>> I think these examples demonstrate what I said earlier -- your
>> proposed framework allows me to customize the name given to a field in
>> XML, but doesn't allow me to change the parent of that field within
>> the broader XML structure.
>
> I'm not sure that I follow this, It would be great if you could give
> an example. As I mentioned earlier, there will be an option to
> 'flatten' the nested models , provide alternate names to the fields,
> and wrap fields into a group. Initially I had thought of adding this
> to the external ``fields`` argument in ``serialize()``, but I can add
> them to the specification object too.
The recurring theme in my comments is that you are dictating policy,
not allowing users to define policy. We already have the former; we
need the latter.
Heres an example of the problem as I see it. Consider the
serialization of a "book" model.
Option 1: Django's current serializer:
{
"pk": 1
"model": "library.book"
"fields": {
"authors": [3, 4]
"editor": 5
"title": "Django for Dummies"
}
}
No real surprises here.
Option 2: the format required for a particular book publishing API
{
"book details": {
"book title": "Django for dummies"
"authors": [
"John Smith",
"Bob Jones"
],
}
"author_count": 2,
"editor": {
"firstname": "Alice"
"lastname": Watson",
"coworkers": [1, 5],
"contact_phone": "98761234",
"company": {
"name": "Mega publishing corp",
"founded": 2010
}
},
}
Notable features:
* Authors and editor are both "Person" objects, but
- We need to serialize editors in detail, including recursive calls
- Authors are serialized using their combined first+last name
* "book title" is a rename of the native model field
* "author_count" isn't on the model at all.
* "book details" doesn't reflect any aspect of model structure --
it's entirely decoration.
Now - show me how both of these serializers are defined using your
proposed API.
Yours,
Russ Magee %-)
Damn... pressed send before I finished editing my thoughts here.
A simple "nesting depth" approach won't work. You really need to be
able to handle this on a per-model basis.
It might be possible to automate some of this with a simple nesting
depth definition, but there will always be a need to define the exact
rollout of a tree of serialization options.
This is also a case where being explicit makes your life easier.
Resolving loops and cycles on a tree of arbitrary depth is hard;
resolving loops and cycles independently on each level of a
serialization structure is much easier.
Yours,
Russ Magee %-)
So, by my count, here's a list of your implicit assumptions in this
syntactical expression:
* the fields are embedded in a substructure within the main model serialization
* ... and this is something that is sufficiently common that it
deserves first-class representation in your serializer syntax
* The pk is serialized as a top-level attribute
* ... and it's serialized using the attribute name 'pk'
* The model name is serialized as a top-level attribute
* ... and it's serialized using the attribute name 'model'
* indentation is something that has been defined as an attribute of
the serializer, rather than as a configurable item decided at
rendering time (like it is currently)
There are also a bunch of implicit rules about the ways foreign keys,
m2ms and so on are rendered. These aren't so concerning, since there
needs to be some defaults if the syntax is going to be even remotely
terse. However, it bears repeating that they are implicit.
> """
> The second option.
> """
>
> class Base(JSONSerializer):
> """
> A base serializer with formatting options.
> """
> indent = 4
>
> class Editor(Base):
> """
> This can be used to serialize a Person model.
> Note the company field, if no such Serializer object is specified,
> the foreign key is returned.
>
> By specifying the ``attrs`` argument while initializing the
> object, one can restrict the fields
> being serialized. Otherwise, all available fields and metadata
> will be serialized.
> """
> company = Company(from_field = 'company', attrs = ['name',
> 'founded'])
ok - so:
* Where is "Company" defined?
* Why aren't the attributes of Company embedded one level deep?
* Why is the attribute named company, and then an argument is passed
to the Company class with the value of 'company'?
> class Author(Base):
> """
> The dump_object method is called when a serializer is plugged into
> another and returns a python object,
> this is normally a dict but this can be overridden.
> """
> def dump_object(self, obj):
> return obj.first_name + ' ' + obj.last_name
* Why is there a difference between a serializer that returns a list,
a serializer that returns a dictionary, and a serializer that returns
a string?
I'm sorry, but this seems extremely confused to me.
You have fields with names like "wrap_all" and "wrap_fields". These
are presumably fixed names that will be interpreted by the parser in a
particular way.
You have fields like "author' and "editor" in the same namespace that
correspond to model attributes.
In the same namespace, you also have functions like meta2_account --
names derived from model attributes.
You also have functions like dump_object().
So - four different types of attribute, all in the same namespace, and
all with different behaviors. What happens with name collisions (e.g.,
a model field named "aliases")? What precedence rules exist?
This proposal had a promising start, but to me, it seems to have gone
massively off the rails. At it's core, serialization should be a very
simple process:
* I have an object.
* That object has attributes.
* Turn that list of attributes into a list of serialized properties
* Each of those serialized properties may be:
- a flat value
- a nested value whose structure is determined by the object itself
- a nested value whose structure is determined by a related object.
And thats it. There are some minor complications in dealing with the
fact that XML has two different ways of rendering attributes, but that
should be a fairly minor extension of the "get me a list of
attributes" use case. I don't see the mapping between the examples
you've given, and this basic set of principles.
It feels to me like you've looking at a specific example that I've
given you, and every time you've found an exception, you've invented a
new keyword or access mechanism to resolve that use case. What I don't
see is a coherent whole -- a thread of similarity in the way you've
approached the problem.
The deadline for GSoC applications is rapidly approaching; while we
don't need to see an absolutely final API proposal, we do at least
need to see promise that you're moving in the right direction. If you
want to pursue this project, I suggest you take another pass at the
test case I gave you, and try to dramatically simplify your proposed