Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Proposal for discussion about Serialization requirements and requesting for Review
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Madhusudan C.S  
View profile  
 More options Mar 26 2009, 12:48 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Thu, 26 Mar 2009 22:18:34 +0530
Local: Thurs, Mar 26 2009 12:48 pm
Subject: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi all,
    After some discussions with Malcolm on this list and doing some
research based on the pointers he gave me I have come up with a
rough plan of what I want to do this summer for Django. Since we
are running out of time, I have come up with a *rough draft* of the
proposal without full discussion with the Django community about the
features that can be implemented. So this is in no way a *Complete
Proposal* and I don't want to submit until some discussion on this
happens really. Also the required proposal format asks to put the
links of the devel list discussions that led to the proposal, which I don't
have except Malcolm's mails. So I kindly request you all to review my
proposal thoroughly and suggest me what I can add or subtract from
the proposal. If my propositions and assumptions are true and how I
can correct myself, so that I can submit my proposal to Google.

*Note: *
  Django doesn't serialize inherited Model fields in the Child Model. I
asked
on IRC why this decision was taken but got no response. I searched the
devel list too, but did not get anything on it. I want to add it to my
proposal, but before doing it I wanted to know why this decision was
taken. Will it be a workable and necessary solution to add that to my
proposal?
Same is the case for Ticket #10201. Can someone please tell me why
microsecond data was dropped?

  Also I am leaving adding extras option to serializers since a patch for it

has already been submitted(Ticket #5711) and looks like a working
solution. If you all want something extra to be done there to
commit it to django trunk, please tell me, I will work on that a bit
and add it to the proposal.

Here is my long long long proposal:

Title: Restructuring of existing Serialization format and improvisation of
APIs

~~~~~~~~~
Abstract
~~~~~~~~~

Greetings!

   I wish to provide Django, a better support for Serialization by building
upon the
existing Serialization framework. This project includes extending the format
of the
Serialized output that existing Serializer produces by allowing in-depth
traversal of
Relation Fields in a given Model. The project also includes extending the
existing API
to specify the depth of the relations to be serialized, the name of the
related model
to be serialized. The API also provides for backwards compatibility to allow
older
versions of serialized output to work with the to-be introduced changes. All
the
changes will be made keeping in mind 2 important things.
   1. All the changes should be backwards compatible (can only break when a
very
     important requirement that improves the serialization by many folds
cannot be
     implemented without making backwards incompatible changes and django
     community gives a GO Green signal for doing so).
   2. The serialized data should be useful not just for use withing Django
apps but
     also for exporting the data for external use and processing.

~~~~~~~
Why?
~~~~~~~

- The existing format of the serialized output firstly doesn't specify the
name of the
  Primary Key(PK henceforth), which is a problem for fields which are
implicitly set
  as PKs (Ticket #10295).
- The existing format only specifies the PK of the related field, but
doesn't traverse it
  in depth to specify its fields (Ticket #4656).
- There are no APIs for the above said requirement.
- The inherited models fields are not serialized.

Situations/problems arising from attempting to fix the above problems
- When we allow Serialization to follow relations, it becomes unnatural if
  the related Model is included in every relating model data. The data
  becomes extremely redundant. Consider the following example.

  class Poll2(models.Model):
      question = models.CharField(max_length=200)
      pub_date = models.DateTimeField('date published')

      def __unicode__(self):
          return self.question

  class Choice2(models.Model):
      poll = models.ForeignKey(Poll)
      choice = models.CharField(max_length=200)
      votes = models.IntegerField()

      def __unicode__(self):
          return self.choice

  The serializing Choice2 Model might look something like below if we allow
following-of-Relations:
[
    {
        "pk": 1,
        "model": "testapp.choice2",
        "fields": {
            "votes": 1,
            "poll": [
                {
                    "pk": 1,
                    "model": "testapp.poll2",
                    "fields": {
                        "question": "What's Up?",
                        "pub_date": "2009-03-01 06:00:00"
                    }
                }
            ]
            "choice": "Django"
        }
    },
    {
        "pk": 2,
        "model": "testapp.choice2",
        "fields": {
            "votes": 2,
            "poll": [
                {
                    "pk": 1,
                    "model": "testapp.poll2",
                    "fields": {
                        "question": "What's Up?",
                        "pub_date": "2009-03-01 06:00:00"
                    }
                }
            ]
            "choice": "Python"
        }
    },
    {
        "pk": 3,
        "model": "testapp.choice2",
        "fields": {
            "votes": 4,
            "poll": [
                {
                    "pk": 1,
                    "model": "testapp.poll2",
                    "fields": {
                        "question": "What's Up?",
                        "pub_date": "2009-03-01 06:00:00"
                    }
                }
            ]
            "choice": "Others are useless"
        }
    }
]
  which clearly shows the redundant Poll data. Here we are serializing
Choice2, of
  course, but that doesn't mean Serializing Polls will give the natural
serialized
  output. In fact serializing Poll doesn't give anything pertaining to
Choice Model
  instance. A more natural serialization should result from Serializing a
Poll Model
  instance which includes within itself all the Choice Model instances that
are
  related to it. This is an obvious consequence of how Database schemas are
  designed by applying Normalization rules.

- The way loaddata and dumpdata are handled is changed. The new version of
this
  loaddata and dumpdata may not be compatible with the fixtures generated
from
  older versions.

Most of the above said problems have been addressed in the tickets
specified, but
the patches need to be dealt more thoroughly after discussing with the
Django
community in general. So design decisions need to be taken for fixing most
of the
tickets(which I will do in community bonding phase).

~~~~~~~
How?
~~~~~~~

  The project begins with implementing a version-id field in the serialized
output. This
field is provided for backwards compatibility. Then it proceeds by
converting the existing
PK field which appears as
{
    "pk": 1,
    "model": "testapp.choice2",
    #...
to serialize the name of the PK field. I propose it to be presented as:
{
    "pk": {
        "id": 1
    },
    "model": "testapp.choice2",
    #...
  This change is being proposed keeping in mind that David Crammer's patches
for
Ticket #373 gets into Django trunk sometime or the other, since it should
happen as
it is a long standing requirement. This representation allows for multiple
PK fields to
exist in the model and be serialized correctly.

  The corresponding changes in the deserializers to process this data will
also be made
at this stage. The implementation touches the following parts of Django:
django.core.serializers.python.Serializer.end_object()
django.core.serializers.xml_serializer.start_serialization() [It already
implements version.]
and related methods and files.

  The project proceeds by splitting the serializer into 2 versions to handle
the older
version and this current version of the serialized output. The decision as
to which
version of the serializer to use will be taken by adding an API option
"old_version=True"
parameter to serialize method. The deserialize method can however decide
this by
looking at the new version-id. Also options for django-admin.py loaddata and
dumpdata
commands will be provided with --old_version.

  The second phase, the biggest phase, starts by implementing serializing of
relations in
depth. The APIs will be implemented for these things hand-in-hand as the
features are
being implemented. An API to specify, what relations to serialize, will be
provided with
"relations=(rel1, rel2, ...)" parameter to serialize. Also a parameter to
specify
"relation_depth=(N1, N2, ...)" will be provided to serialize the related
models recursively
till the specified depth N. Skipping "relations=" implies to serialize all
the related models
in a given model and skipping "relation_depth=" implies serializing to full
depth. Skipping
both serializes just the PK of the related models(old style). Further
selection of fields in
the individual related models to be serialized is provided with a
DjangoFullSerializers like
syntax, using dictionaries. An exclude fields option will be given similar
to
DjangoFullSerializers.
Link to DjangoFullSerializers:
http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers

  This phase proceeds by providing the API optional parameter,
"reverse_relation=[rel1,
rel2]" within a Related Model(Poll2 in the example), rather than the Model
that relates
to this model(Choice2). This does a reverse relation look up and for each
Related Model
instance it serializes all the reverse relations that relate to this model
instance which
solves the above said problem of data redundancy. The output looks something
like below
if serialized as: serializers.serialize('json', Poll2.objects.all(),
reverse_relation=('choice2'))
[
    {
        "pk": 1,
        "model": "testapp.poll2",
        "fields": {
            "question": "What's Up?",
            "pub_date": "2009-03-01 06:00:00"
        }
        "testapp.choice2": [
            {
                "pk": 2,
                "model": "testapp.choice2",
                "fields": {
...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Mar 26 2009, 2:53 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Fri, 27 Mar 2009 00:23:58 +0530
Local: Thurs, Mar 26 2009 2:53 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi all,

What a blunder :( I submitted my proposal the way I will
have to submit to socghop.appspot.com with lines manually wrapped
at 80 chars per line and the groups wrapp it at 75 chars making
my proposal look as ugly as possible. Did not realize that it was
75 chars here. Please excuse me, tell me if my proposal is
unreadable I will resubmit it with lines wrapped at 70 chars
or so.

--
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
Official Email ID: madhusu...@madhusudancs.info


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Malcolm Tredinnick  
View profile  
 More options Mar 26 2009, 11:33 pm
From: Malcolm Tredinnick <malc...@pointy-stick.com>
Date: Fri, 27 Mar 2009 14:33:11 +1100
Local: Thurs, Mar 26 2009 11:33 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Most likely because it will lead to duplicate data when you dump the
models for a particular app. Often the parent and child are in the same
application and you'll see the data from the parent in two places.
*Very* fiddly to untangle. It might be possible to add an option so that
parent data is optionally dumped when you dump a specific model (as
opposed to the whole app).

>  Will it be a workable and necessary solution to add that to my
> proposal?
> Same is the case for Ticket #10201. Can someone please tell me why
> microsecond data was dropped?

Quite probably because MySQL (or possibly just the MySQLdb wrapper)
sucks and can't support microsecond data when it's in a datetime value.
So reinserting that data requires yet another place where we have to set
microseconds to 0. It's one of those cases where we've adopted the
lowest common denominator.

Regards,
Malcolm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Malcolm Tredinnick  
View profile  
 More options Mar 26 2009, 11:35 pm
From: Malcolm Tredinnick <malc...@pointy-stick.com>
Date: Fri, 27 Mar 2009 14:35:11 +1100
Local: Thurs, Mar 26 2009 11:35 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

On Fri, 2009-03-27 at 00:23 +0530, Madhusudan C.S wrote:
> Hi all,

> What a blunder :( I submitted my proposal the way I will
> have to submit to socghop.appspot.com with lines manually wrapped
> at 80 chars per line and the groups wrapp it at 75 chars making
> my proposal look as ugly as possible. Did not realize that it was
> 75 chars here. Please excuse me, tell me if my proposal is
> unreadable I will resubmit it with lines wrapped at 70 chars
> or so.

Why manually wrap it at all? Email clients have been able to handle
wrapping lines sensibly on behalf of the sender for about 20 years now.
Just type normally and only hit Return between paragraphs.

Regards,
Malcolm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Mar 27 2009, 2:42 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Sat, 28 Mar 2009 00:12:49 +0530
Local: Fri, Mar 27 2009 2:42 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi Malcolm,

On Fri, Mar 27, 2009 at 9:05 AM, Malcolm Tredinnick <

  Right. I get it now. Won't do that blunder again :( Some of my friends who
participated in previous years of GSoC had told me to manually wrap the text
since they felt the text would look ugly after submission to Google's app if
it is not wrapped with some small paragraphs appearing as a single huge line
and also since wrapping gives a neatly presented look too :(

--
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
Official Email ID: madhusu...@madhusudancs.info


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Mar 27 2009, 3:12 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Sat, 28 Mar 2009 00:42:18 +0530
Local: Fri, Mar 27 2009 3:12 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi Malcolm,
        Thanks for the response. I am looking forward to hear the review
from other prospective mentors too :(

On Fri, Mar 27, 2009 at 9:03 AM, Malcolm Tredinnick <

I personally this may be required in some situations, but I agree that those
situations may be rare and it will be better to have it forcibly by passing
parameter to serializer. I am still wondering why no one has opened a ticket
on this? Do you think it will be a good idea to add in the proposal? (If so
I will also open a ticket on the issue and do some preliminary work on how
it may be implemented? )

> >  Will it be a workable and necessary solution to add that to my
> > proposal?
> > Same is the case for Ticket #10201. Can someone please tell me why
> > microsecond data was dropped?

> Quite probably because MySQL (or possibly just the MySQLdb wrapper)
> sucks and can't support microsecond data when it's in a datetime value.
> So reinserting that data requires yet another place where we have to set
> microseconds to 0. It's one of those cases where we've adopted the
> lowest common denominator.

> But do you think you may add it sooner or later since it is one of those

tickets lying there? And may haven't done it now because you have better
things to concentrate ATM? I feel it may be unfair to other backends that
support microseconds info.

--
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
Official Email ID: madhusu...@madhusudancs.info


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Mar 28 2009, 2:47 am
From: Russell Keith-Magee <freakboy3...@gmail.com>
Date: Sat, 28 Mar 2009 15:47:09 +0900
Local: Sat, Mar 28 2009 2:47 am
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

On Fri, Mar 27, 2009 at 1:48 AM, Madhusudan C.S <madhusuda...@gmail.com> wrote:
> Hi all,
> *Note: *
>   Django doesn't serialize inherited Model fields in the Child Model. I
> asked
> on IRC why this decision was taken but got no response. I searched the
> devel list too, but did not get anything on it. I want to add it to my
> proposal, but before doing it I wanted to know why this decision was
> taken. Will it be a workable and necessary solution to add that to my
> proposal?

Malcolm has already addressed this, and his analysis is pretty much
spot on. I would only add that the current behaviour can also be
explained by looking at the heritage of the fixture system.
Historically, Django's fixtures have been used as a way of serializing
output for transfer between two Django installations (for example, as
test fixtures). To this end, the serializers have concentrated on
replicating a very database-like structure - that is, the structures
that are serialized closely match the underlying database structures.
In an inheritance situation, child tables don't contain all the data
from the parent table; hence, neither do the serialized structures.

Obviously, this focus on representing the database misses an obvious
alternate use case - occasions where serialization is required to
communicate to some other data consumer, such as an AJAX framework. In
my 'big picture' of the ideal serialization SoC project, this is the
problem that needs to be fixed. More on in later comments.

> Same is the case for Ticket #10201. Can someone please tell me why
> microsecond data was dropped?

Again, Malcolm is on the money. If you can come up with a fix that
enables non-millisecond deprived databases to maintain microseconds,
I'm sure it would be a welcome inclusion. Thinking about it, this
shouldn't actually be that hard to achieve.

>   Also I am leaving adding extras option to serializers since a patch for it
> has already been submitted(Ticket #5711) and looks like a working
> solution. If you all want something extra to be done there to
> commit it to django trunk, please tell me, I will work on that a bit
> and add it to the proposal.

If you are intending to take on "updating the serializers" as a SoC
project, I would encourage you to include #5711 as part of your
proposal. There may be a patch on #5711, and it may be the right
solution, but the patch isn't even close to being ready for trunk -
for one thing, there are no tests or documentation. Finishing the work
on this ticket would be a very worthwhile contribution.

> Here is my long long long proposal:
...
> ~~~~~~~
> Why?
> ~~~~~~~

> - The existing format of the serialized output firstly doesn't specify the
> name of the
>   Primary Key(PK henceforth), which is a problem for fields which are
> implicitly set
>   as PKs (Ticket #10295).

This ticket is a very small part of a bigger problem. The fact that
the primary key isn't named in the serialization format is of no
consequence to the 'database replication' role for serializers,
evidenced by the extensive test suite that demonstrates round trip
fixture loading. It is only significant when you need to support some
alternate data consumer that needs to know the name of the primary
key.

The bigger issue is that we need to be able to easily reconfigure the
output format of serializers to suit the specific requirements of
other data consumers.

> - The existing format only specifies the PK of the related field, but
> doesn't traverse it
>   in depth to specify its fields (Ticket #4656).
> - There are no APIs for the above said requirement.
> - The inherited models fields are not serialized.

Again, these are just variations on the same theme. The real problem
is being able to easily reconfigure the output format.

> Situations/problems arising from attempting to fix the above problems
> - When we allow Serialization to follow relations, it becomes unnatural if
>   the related Model is included in every relating model data. The data
>   becomes extremely redundant. Consider the following example.

It may be redundant, but it may also be required, depending on
circumstance. This is something that needs to be left in the hand of
the end-user.

> - The way loaddata and dumpdata are handled is changed. The new version of
> this
>   loaddata and dumpdata may not be compatible with the fixtures generated
> from
>   older versions.

The current serialization format is well known, well understood, and
well suited to the task it was designed to perform. As a result, I
would expect that this format would remain as the 'default' format for
Django fixtures, and be entirely backwards compatible without extra
options/flags.

However, there is an obvious need for alternate formats. These formats
may be dramatically different from the current serialization formats,
and certain output formats may not contain enough data to be used for
later loading - for example, consider the case where your want an AJAX
response that contains a list of (author_name, book_title) tuples.
This structure may be useful to your AJAX application, but won't be
useful for recreating a list of Author and Book records in your
database.

My point is that a serialization format doesn't necessarily have to be
'round-trip'. The existing default format is, and needs to remain that
way. However, the corollary of this is that 'new format' serializers
don't necessarily need to be made available to loaddata/dumpdata.

This isn't a problem you need to worry about. If/when #373 lands, the
default serialization format will also have to change. We don't need
to pre-emptively change it, and the existing serialization format
would be entirely compatible with a world where multiple primary key
models exist. Remember - in the 'database serialization' case, we can
introspect model definitions to see when multiple primary keys exist,
so on deserialization, we will know when "pk" is a single value and
when it is a list/dict/whatever format eventuates to support multiple
primary keys.

I can see how your proposal addresses #10295, but as I said earlier,
that's a very small scope version of a larger problem. I would suggest
addressing your efforts at fixing the bigger problem, rather than a
cosmetic approach to #10295 that is backwards incompatible with all
existing fixtures.

>   The second phase, the biggest phase, starts by implementing serializing of
> relations in
> depth. The APIs will be implemented for these things hand-in-hand as the
> features are
> being implemented. An API to specify, what relations to serialize, will be
> provided with
> "relations=(rel1, rel2, ...)" parameter to serialize. Also a parameter to
> specify
> "relation_depth=(N1, N2, ...)" will be provided to serialize the related
> models recursively

There are two ways to interpret ticket #4656, and you have picked my
least favourite of the two. :-)

Option 1 (my preferred interpretation) is to look at this as a
'gathering dependencies' interface to the existing serializers.

For example, you pass an Book object to the serializer. That book
contains a reference to an Author. That author contains a reference to
a City.

In the current serializers, your output fixture only contains the Book
object. This may be a useful fixture, but it has referential integrity
problems - the Book contains a FK reference to a non-existent author.
It would also be nice to be able to say "and also serialize all the
other objects that are required in order to reproduce this full
object" - that is, by serializing the Book, you automatically get all
the related Author and City records. At the moment, the only way to do
this is to dump the entire Book, Author and City tables, and prune out
any data you don't want.

Adding a 'select related' option to the existing serializers would
make it much easier to generate fixtures, or dump parts of a database,
and it requires no changes to the output format at all.

Option 2 (your interpretation), is to allow for inline serialization
of related models. All my previous arguments about output format
apply, along with all your arguments about redundant encoding of data.
If you solve the bigger problem of allowing flexible output formats,
then the need to hard-code embedded child model data goes away.

>   The project is planned to be completed in 9 phases.
...
>   2. Finalizing Design and Coding Phase I (May 22th – May 31st )
>   3. Testing Phase I (June 1st – June 5th )

As a prior warning - I'm very skeptical of anyone that proposes a
"test" phase that isn't integrated with the "build" phase. If you're
not testing at the same time you are building, then you don't know you
have the right result? If you test after you build, what happens when
your test reveals a problem with your implementation?

I know line items like this make accountant types happy, but it just
doesn't wash with me. If your implementation, including tests, will
take 3 weeks, then say three weeks. Don't say 2 weeks implementation
followed by a 1 week test.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Mar 29 2009, 12:51 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Sun, 29 Mar 2009 22:21:40 +0530
Local: Sun, Mar 29 2009 12:51 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi Russell,
   I am extremely thankful to you for spending your invaluable time for
doing a review (err... should I say post-mortem? ;-) ) of my complete
proposal. I had kept my fingers crossed for someone who knew about the
technical aspects of it to do it since most of my friends did only a
language review (some of them even gave up seeing the length :( ). I am also
equally thankful to Malcolm for it.

After a lot of thinking, reviewing and studying how other serializers, apart
from Django serializers, in different languages and frameworks such as PHP,
Python(pickle), Java, Turbogears(TurboJSON) and Boost work, the whole of
yesterday, I have come up with some ideas which mostly departs from what I
have proposed earlier. From the top view I still propose to solve the same
problems I suggested in my initial proposal along with considering the
bigger problems you suggested. Again this is a very rough draft of my ideas
and requires a lot of refining by discussing with you and rest of the
community.

Thanks to ideas on the Wiki. Reference to ModelAdmin there gave me some
ideas to think further. Though this is not a copy, I have borrowed some
ideas from other serializers I studied yesterday. Also I have ensured as far
as possible that this doesn't break the existing Serializer and fixtures in
any way, but only adds on to it. Please point out if I have gone against
this somewhere.

The bigger issue is that we need to be able to easily

> reconfigure the output format of serializers to suit the
> specific requirements of other data consumers.

The idea that I propose below is mostly to tackle this bigger issue which
you pointed out throughout.

Let us consider same 2 models as before:

class Poll2(models.Model):
    question = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')

class Choice2(models.Model):
    poll = models.ForeignKey(Poll)
    choice = models.CharField(max_length=200)
    votes = models.IntegerField()

The user now will be able to construct a class on the lines of ModelAdmin
for specifying custom serialization formats. I propose the API based on the
following ideas.
The user will be given an option to define a Serializer class that inherits
from the framework's serializers classes, Base, XML, Python, YAML and JSON.
For the moment, to avoid confusion, let me call the new Serializer
newserialzer (But this is only tentative, decision as to whether we must
rename the framework or just the classes can be finalized later). From what
I have understand, Python mainly consists of basic datatypes of single value
or the data structures like List, Tuple and Dictionary. Most other complex
data types/structures are derived from these types and thus represented with
those notations.

So our base class defines a set of class attributes that define the notation
for these fields which are same as the Python notations, for example
ListSeparators will be a 3-tuple containing enclosing notations and the List
item separator ('[', ']', ','). Similarly Dictionary Separtors is a 4-tuple
('{', '}', ',', ':'). The last item is for key:value separation. Similarly
more specialized cases will be defined for YAML and JSON classes. We can use
this approach to XML too. For this case we can pass a tuple of strings with
this format.
list_separator = ('<list-name>', '</list-name>', '<>list-value</>')
dict_separtorr = ('<dict-name>', </dict-name>', '<dict-key=dict-value></>')
It is important to note here that list-name, dict-name, list-value,
dict-value, dict-key are all indicative and are a part of the API(A better
naming convention will be developed) and they are not the place holders for
some other value there. As in, those are the names that must be always used
consistently, which will be evident from the below examples.

The user can now inherit from one of these classes in his app depending upon
the his requirements and over-ride these class attributes as per the format
he wants. The API rougly looks like this for Serializing the Poll class, in
a format similar to JSON notation.

class PollSerializer(newserializer.JSONSerializer):
    list_separator = ('{%', '%}', ':')
    dict_separator = ('{{', '}}', ':', '|')

In addition to this the user can specify the fields to be selected, by
over-riding a class attribute, fields. This attribute is a tuple of strings
where each item is the name of the field to be serialized. The above class
can now be written as follows:

class PollSerializer(newserializer.JSONSerializer):
    list_separator = ('{%', '%}', ':')
    dict_separator = ('{{', '}}', ':', '|')
    fields = ('question', 'pub_date')

Additionally a class attribute named exclude_fields, a tuple of strings, is
added which is just complimentary of fields attribute(Thanks to
DjangoFullSerializers for giving this idea).

To solve the ticket #5711, I propose a method extra_fields() which returns a
dictionary. It must return dictionary instead of a tuple because most of the
times the extra fields are computed/derived fields. Example below:

class PollSerializer(newserializer.JSONSerializer):
   #...
   def extra_fields(self):
       pub_date_recent = pub_date > '2009-03-15'
       return {'is_recent': pub_date_recent}

One can also specify how a Primary Key can be serialized with the method def
pk_serialize() which returns a dictionary. This should address the ticket
#102. Example below:

class PollSerializer(newserializer.JSONSerializer):
   #...
   def pk_serialize(self):
       return {'pk': pk_value, 'pk name': 'id'}

The dictionary can contain any number of items, but the stress is for the
use of *pk_value* at least once to serialize the PK value somewhere. I am
still unsure, if I should make this a method or an attribute. Can some one
kindly give suggestions?

The serialized output after over-riding the pk_serialize() method looks
something like below.
 {
        "pk": 1,
        "pk name": 'id'
        "model": "testapp.poll2",
        "fields": {
            "pub_date": "2009-03-01 06:00:00",
            "question": "What's Up?"
        }
 }

An additional model_extras() method can be overridden, which by default
returns nothing in the Parent classes. But in the over-ridden method of the
derived class this can return a dictionary of values which are added to the
Model's serialized data. An example of this can be version number of the
serialized format. API example:

class PollSerializer(newserializer.JSONSerializer):
   #...
   def model_extras(self):
       return {'version': '2.1'}

Finally coming to the big thing, Ticket #4656, I propose 3 Class attributes
for this. First one being select_related (as per your suggestion) which is a
dictionary. The key of the dictionary being the name of the Relation
Attribute and the value is a dictionary. This dictionary can have keys -
'fields' or 'excludefields', whose values are tuples of strings, which
indicate the name of the fields in that model to be selected or excluded. If
this dictionary is empty, it serializes the entire model, by using its
Serialization class similar to this one, if at all defined or using the
existing serializers.

Example:
class ChoiceSerializer(newserializer.JSONSerializer):
    #...
    select_related = {'poll': {'fields': ('question')}}

NOTE: I am not very sure if I can implement this in the SoC timeline, but I
will include it in the API proposal, if I run out of time I will continue
with this after GSoC. If time permits, well and good, I will implement this
too. The value of 'fields' key in the above dictionary is a tuple of strings
which clearly means I cannot follow a relation on that model. So I wish to
also allow dictionaries in this tuple along with the strings. This
dictionary is again a select_related kind of nested dictionary which can
follow the relation in that realtion and so on.
For the Book, Author, City example you gave, it can looks like this:
class BookSerializer(newserializer.JSONSerializer):
    #...
    select_related = {
        'author': {
            'fields': ('name', 'age', {
                'city':{
                    'fields': ('cityname', ...)
                }
            })
        }
    }
 *END NOTE*

Rest of the following are in the SoC timeline.
The second of the 3 attributes, is the inline_related attribute which can be
set to True. In the parent class this is false. If it is set to true,
Serializer will serialize the select_related relations inline.

The third attribute is the reverse_related. It is again a dictionary,
similar in structure to the select_related dictionary, with keys being the
name of the Model that relates to this model. For example:

class PollSerializer(newserializer.JSONSerializer):
   #...
   reverse_related = {'choice': {
       'fields': ('choice', 'votes')
   }}

Last but not the least always exists ;-)

The user registers this PollSerializer class with our serializer framwork,
similar to ModelAdmin as:
serializer.register.model(Poll, PollSerializer)

Now a question arises, what if the user wants to change only the
serialization format i.e notation, nothing else in the entire app? Should he
do the donkey's coding job of copy pasting list_separtor and dict_separator?
I feel he need not. For that I propose the following. The solution is to
define a Serializer class, say AppnameSerializer with what ever app specific
customization he wants(provided by the API) and the call
serializer.register.app(AppName, AppnameSerializer).

This can be extended to multiple apps and too. If he wants to customize a
set of apps, he can say:
serializer.register.app(multiple_apps=(App1Name, App2Name, ...),
AppSetSerializer).

On Sat, Mar 28, 2009 at 12:17 PM, Russell Keith-Magee <

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Mar 29 2009, 2:08 pm
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Sun, 29 Mar 2009 23:38:27 +0530
Local: Sun, Mar 29 2009 2:08 pm
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hello all,
    Also I would like to add again that, I am madrazr on #django-dev.
Whenever I tried to ask something I haven't got any response till now. I am
not complaining, I understand it is mainly because of timezone problems. I
just want to inform anyone who wants to tell me directly on my face ;-)
anything about my proposal that I am available for that :D
I will be around whenever I am logged into the channel.

--
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
Official Email ID: madhusu...@madhusudancs.info


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Madhusudan C.S  
View profile  
 More options Apr 1 2009, 4:25 am
From: "Madhusudan C.S" <madhusuda...@gmail.com>
Date: Wed, 1 Apr 2009 13:55:35 +0530
Local: Wed, Apr 1 2009 4:25 am
Subject: Re: [GSoC] Proposal for discussion about Serialization requirements and requesting for Review

Hi Russell,

  After some thinking again, I have re-worked on my proposal and come up
with the following idea. Here is my draft proposal. I have also submitted it
to socghop.appspot.com

Let us consider the following two models for discussion through out:
  class Poll(models.Model):
      question = models.CharField(max_length=200)
      pub_date = models.DateTimeField('date published')
      creator = models.CharField(max_length=200)
      valid_for = models.IntegerField(max_length=200)

      def __unicode__(self):
          return self.question

  class Choice(models.Model):
      poll = models.ForeignKey(Poll)
      choice = models.CharField(max_length=200)
      votes = models.IntegerField()

      def __unicode__(self):
          return self.choice

  This projects begins by providing ModelAdmin and Feeds framework
like APIs for Serializers where the user now will be able to construct
a class for specifying custom serialization formats. I propose the API
based on the following ideas.

  The user will first define a Class inherited from the Serializer
framework. The parent class is a generic base Serializer class. The
user defined class is then passed as a parameter to the serialize
method we call when we want to serialize the Models. Within this class
the user will be able to specify the customized serialization format
in which he desires the output. Since Python supports majorly three
data structures, Lists, Tuples and Dictionaries, this format can
contain any of these data structures in any possible order. Examples:

Example 1:
  class PollSerializer(Serializer):
      custom_format = [("question", "valid_for", "id")]

The output in this case will be a list of tuples containing the values
of question, valid_for and id fields. Here the strings are the names
of the fields in the model.

                        OR
Example 2:
  class PollSerializer2(Serializer):
      custom_format = (["question", {
          "valid_for_number_of_days": "valid_for"
          "Poll ID": "id"
      }])

The output in this case will be a tuple of lists containing the values
of question and a dictionary which contains valid_for and id fields
as values and their description as keys of a dictionary.

The implementation although not trivial, will work as follows:
(This is not final. Final implementation will be worked out by
discussing with the community)
- The custom_format will be checked for the type. The top level
  structure will be decided from this type. "{}" if dictionary, "()"
  if tuple and "[]" if list. In case of XML, the root tag will be
  django-objects. Also its children will have tag name  as "object"
  and include model="Model Name" in the tag. This is same as the
  existing XML Serializer till here.

- Further the type of the only item within the top-level structure
  is determined. All the django objects serialized will be of this
  type. In case of XML, the children of "object" tag will be the tags
  having the name "field". The tags will also have name="fieldname"
  and type="FieldType" attributes within this tag. Additionally if
  these field tags are items of the dictionary, they will have a
  description="dictionary_key" attribute in the field tag.

- Further each item within the inner object("question","valid_for"
  and "id" in the first example) is checked for the type and the
  serialized output will have corresponding type. This is implemented
  recursively from this level. In case of XML, however, the name of
  the tag for further level groupings will have to be chosen in some
  consistent way. My suggestion for now is to name the tags as
  "field1" for the third level in the original custom format structure,
  "field2" for the fourth level in the original custom format
  structure, and so on.

For the second example above, we call the serializer as follows:

  serializer.serialize("json", Poll.objects.all(),
      custom_serializer=PollSerializer2)

The output looks as follows:
(
    ["What's Up?", {
        "valid_for_number_of_days": "30"
        "Poll ID": "1"
        }
    ],
    ["Elections 2009", {
        "valid_for_number_of_days": "60"
        "Poll ID": "2"
        }
    ]
)

Also if we use XML,
  serializer.serialize("xml", Poll.objects.all(),
      custom_serializer=PollSerializer2)

The output looks as follows:

<django-objects version="1.0">
    <object pk="1" model="testapp.poll2">
        <field type="CharField" name="question">What's Up?</field>
        <field>
            <field1 type="IntegerField" name="valid_for"
description="valid_for_number_of_days">
                30
            </field1>
            <field1 type="AutoField" name="id" description="POLL ID">
                1
            </field1>
        </field>
    </object>
    <object pk="2" model="testapp.poll2">
        <field type="CharField" name="question">Elections 2009</field>
        <field>
            <field1 type="IntegerField" name="valid_for"
description="valid_for_number_of_days">
                60
            </field1>
            <field1 type="AutoField" name="id" description="POLL ID">
                2
            </field1>
        </field>
    </object>
</django-objects>

  Further when a user wants to include extra fields in the serialized
data like additional non-model fields or computed fields, he needs
to specify the name of the method in the class that returns the value
of this field as the value of that item in his format. It should not
be a String. So that we can check if the item value is callable
and if so we can call that method and use the return value for
serialization. For example:

Example 3:
  class PollSerializer(Serializer):
      custom_format = [("question", "valid_for", till_date)]

      def till_date(self):
          import datetime
          delta_time = datetime.timedelta(
              days=Poll.objects.get(pk=self.pk).valid_for)
          new_datetime = Poll.objects.get(pk=self.pk).pub_date +
                             delta_time
          return new_datetime

  Further an important thing to note here is that, whenever the string
passed as an item value to the custom_format anywhere in the whole
format doesn't evaluate to any field in the model, it is serialized as
the same string in the final output, thereby allowing addition of
non-model static data, such as version number of the format among
other things.

  Another point to note here is that, the string specified in the
custom format can also include fields from the Parent Models, thereby
allowing even Parent Model fields to be serialized.

  Further the user will be well informed in the docs that he cannot
pass any arbitrary Django object when calling the serialize()
method with custom_format parameter, but only the Objects of type
for which the custom_format is defined using the ModelSerializer class.
If he does so we it will be flagged as error.

  Also last but not the least, a select_related parameter will be
added to the serialize method, upon setting to True will automatically
serialize all the related models for this model. Serializing the
related model facilitates the reconstruction of the database tables
for the given model in case there exists any constraints. Further
the related models will be serialized in a default format.

  Further if user knows what models might be selected when
select_related is true, he can provide the parameter like below:

  related_custom_serializers={
      "Model1" : Model1Serializer
      "Model2" : Model2Serializer
  }

  While Serializing the related models, the serializer checks to see
if related_custom_serializers have items for the selected model
and serializes in that format if it exists. Example:
  serializer.serialize("json", Poll.objects.all(),
      custom_serializer=PollSerializer2, select_related=True,
      related_custom_serializers={
      "Model1" : Model1Serializer
      "Model2" : Model2Serializer
      }
  )

(I am very skeptical about the use cases for the above feature, since
select_related is usually needed for round trips and rarely needed for
external applications. Nevertheless I propose it here, "Waiting for
further discussion")

NOTE: I must also admit that I am following the other proposal on the same
idea. Felt no point in hiding it. But it was my idea too to provide
custom format. I had started with this in my previous proposal itself I
feel. I was having very similar idea in mind when I used list and
dict separators, but got it wrong. After thinking of its weaknesses you said
for a day or so, I came up with the same idea, but was unfortunately late in
sending it, since you know I had already got it wrong 2 times :( Wanted to
tell something sensible 3rd time and was
preparing a more comprehensive solution. I hope it answers almost all the
questions you gave as braindump on the other proposal.

- Thanks and regards,
  Madhusudan.C.S


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »