Hi all, After some discussions with Malcolm on this list and doing some research based on the pointers he gave me I have come up with a rough plan of what I want to do this summer for Django. Since we are running out of time, I have come up with a *rough draft* of the proposal without full discussion with the Django community about the features that can be implemented. So this is in no way a *Complete Proposal* and I don't want to submit until some discussion on this happens really. Also the required proposal format asks to put the links of the devel list discussions that led to the proposal, which I don't have except Malcolm's mails. So I kindly request you all to review my proposal thoroughly and suggest me what I can add or subtract from the proposal. If my propositions and assumptions are true and how I can correct myself, so that I can submit my proposal to Google.
*Note: * Django doesn't serialize inherited Model fields in the Child Model. I asked on IRC why this decision was taken but got no response. I searched the devel list too, but did not get anything on it. I want to add it to my proposal, but before doing it I wanted to know why this decision was taken. Will it be a workable and necessary solution to add that to my proposal? Same is the case for Ticket #10201. Can someone please tell me why microsecond data was dropped?
Also I am leaving adding extras option to serializers since a patch for it
has already been submitted(Ticket #5711) and looks like a working solution. If you all want something extra to be done there to commit it to django trunk, please tell me, I will work on that a bit and add it to the proposal.
Here is my long long long proposal:
Title: Restructuring of existing Serialization format and improvisation of APIs
~~~~~~~~~ Abstract ~~~~~~~~~
Greetings!
I wish to provide Django, a better support for Serialization by building upon the existing Serialization framework. This project includes extending the format of the Serialized output that existing Serializer produces by allowing in-depth traversal of Relation Fields in a given Model. The project also includes extending the existing API to specify the depth of the relations to be serialized, the name of the related model to be serialized. The API also provides for backwards compatibility to allow older versions of serialized output to work with the to-be introduced changes. All the changes will be made keeping in mind 2 important things. 1. All the changes should be backwards compatible (can only break when a very important requirement that improves the serialization by many folds cannot be implemented without making backwards incompatible changes and django community gives a GO Green signal for doing so). 2. The serialized data should be useful not just for use withing Django apps but also for exporting the data for external use and processing.
~~~~~~~ Why? ~~~~~~~
- The existing format of the serialized output firstly doesn't specify the name of the Primary Key(PK henceforth), which is a problem for fields which are implicitly set as PKs (Ticket #10295). - The existing format only specifies the PK of the related field, but doesn't traverse it in depth to specify its fields (Ticket #4656). - There are no APIs for the above said requirement. - The inherited models fields are not serialized.
Situations/problems arising from attempting to fix the above problems - When we allow Serialization to follow relations, it becomes unnatural if the related Model is included in every relating model data. The data becomes extremely redundant. Consider the following example.
class Poll2(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
The serializing Choice2 Model might look something like below if we allow following-of-Relations: [ { "pk": 1, "model": "testapp.choice2", "fields": { "votes": 1, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Django" } }, { "pk": 2, "model": "testapp.choice2", "fields": { "votes": 2, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Python" } }, { "pk": 3, "model": "testapp.choice2", "fields": { "votes": 4, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Others are useless" } } ] which clearly shows the redundant Poll data. Here we are serializing Choice2, of course, but that doesn't mean Serializing Polls will give the natural serialized output. In fact serializing Poll doesn't give anything pertaining to Choice Model instance. A more natural serialization should result from Serializing a Poll Model instance which includes within itself all the Choice Model instances that are related to it. This is an obvious consequence of how Database schemas are designed by applying Normalization rules.
- The way loaddata and dumpdata are handled is changed. The new version of this loaddata and dumpdata may not be compatible with the fixtures generated from older versions.
Most of the above said problems have been addressed in the tickets specified, but the patches need to be dealt more thoroughly after discussing with the Django community in general. So design decisions need to be taken for fixing most of the tickets(which I will do in community bonding phase).
~~~~~~~ How? ~~~~~~~
The project begins with implementing a version-id field in the serialized output. This field is provided for backwards compatibility. Then it proceeds by converting the existing PK field which appears as { "pk": 1, "model": "testapp.choice2", #... to serialize the name of the PK field. I propose it to be presented as: { "pk": { "id": 1 }, "model": "testapp.choice2", #... This change is being proposed keeping in mind that David Crammer's patches for Ticket #373 gets into Django trunk sometime or the other, since it should happen as it is a long standing requirement. This representation allows for multiple PK fields to exist in the model and be serialized correctly.
The corresponding changes in the deserializers to process this data will also be made at this stage. The implementation touches the following parts of Django: django.core.serializers.python.Serializer.end_object() django.core.serializers.xml_serializer.start_serialization() [It already implements version.] and related methods and files.
The project proceeds by splitting the serializer into 2 versions to handle the older version and this current version of the serialized output. The decision as to which version of the serializer to use will be taken by adding an API option "old_version=True" parameter to serialize method. The deserialize method can however decide this by looking at the new version-id. Also options for django-admin.py loaddata and dumpdata commands will be provided with --old_version.
The second phase, the biggest phase, starts by implementing serializing of relations in depth. The APIs will be implemented for these things hand-in-hand as the features are being implemented. An API to specify, what relations to serialize, will be provided with "relations=(rel1, rel2, ...)" parameter to serialize. Also a parameter to specify "relation_depth=(N1, N2, ...)" will be provided to serialize the related models recursively till the specified depth N. Skipping "relations=" implies to serialize all the related models in a given model and skipping "relation_depth=" implies serializing to full depth. Skipping both serializes just the PK of the related models(old style). Further selection of fields in the individual related models to be serialized is provided with a DjangoFullSerializers like syntax, using dictionaries. An exclude fields option will be given similar to DjangoFullSerializers. Link to DjangoFullSerializers: http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers
This phase proceeds by providing the API optional parameter, "reverse_relation=[rel1, rel2]" within a Related Model(Poll2 in the example), rather than the Model that relates to this model(Choice2). This does a reverse relation look up and for each Related Model instance it serializes all the reverse relations that relate to this model instance which solves the above said problem of data redundancy. The output looks something like below if serialized as: serializers.serialize('json', Poll2.objects.all(), reverse_relation=('choice2')) [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } "testapp.choice2": [ { "pk": 2, "model": "testapp.choice2", "fields": {
What a blunder :( I submitted my proposal the way I will have to submit to socghop.appspot.com with lines manually wrapped at 80 chars per line and the groups wrapp it at 75 chars making my proposal look as ugly as possible. Did not realize that it was 75 chars here. Please excuse me, tell me if my proposal is unreadable I will resubmit it with lines wrapped at 70 chars or so.
On Thu, 2009-03-26 at 22:18 +0530, Madhusudan C.S wrote: > Hi all, > After some discussions with Malcolm on this list and doing some > research based on the pointers he gave me I have come up with a > rough plan of what I want to do this summer for Django. Since we > are running out of time, I have come up with a *rough draft* of the > proposal without full discussion with the Django community about the > features that can be implemented. So this is in no way a *Complete > Proposal* and I don't want to submit until some discussion on this > happens really. Also the required proposal format asks to put the > links of the devel list discussions that led to the proposal, which I > don't > have except Malcolm's mails. So I kindly request you all to review my > proposal thoroughly and suggest me what I can add or subtract from > the proposal. If my propositions and assumptions are true and how I > can correct myself, so that I can submit my proposal to Google.
> *Note: * > Django doesn't serialize inherited Model fields in the Child Model. > I asked > on IRC why this decision was taken but got no response. I searched the > devel list too, but did not get anything on it. I want to add it to > my > proposal, but before doing it I wanted to know why this decision was > taken.
Most likely because it will lead to duplicate data when you dump the models for a particular app. Often the parent and child are in the same application and you'll see the data from the parent in two places. *Very* fiddly to untangle. It might be possible to add an option so that parent data is optionally dumped when you dump a specific model (as opposed to the whole app).
> Will it be a workable and necessary solution to add that to my > proposal? > Same is the case for Ticket #10201. Can someone please tell me why > microsecond data was dropped?
Quite probably because MySQL (or possibly just the MySQLdb wrapper) sucks and can't support microsecond data when it's in a datetime value. So reinserting that data requires yet another place where we have to set microseconds to 0. It's one of those cases where we've adopted the lowest common denominator.
On Fri, 2009-03-27 at 00:23 +0530, Madhusudan C.S wrote: > Hi all,
> What a blunder :( I submitted my proposal the way I will > have to submit to socghop.appspot.com with lines manually wrapped > at 80 chars per line and the groups wrapp it at 75 chars making > my proposal look as ugly as possible. Did not realize that it was > 75 chars here. Please excuse me, tell me if my proposal is > unreadable I will resubmit it with lines wrapped at 70 chars > or so.
Why manually wrap it at all? Email clients have been able to handle wrapping lines sensibly on behalf of the sender for about 20 years now. Just type normally and only hit Return between paragraphs.
> On Fri, 2009-03-27 at 00:23 +0530, Madhusudan C.S wrote: > > Hi all,
> > What a blunder :( I submitted my proposal the way I will > > have to submit to socghop.appspot.com with lines manually wrapped > > at 80 chars per line and the groups wrapp it at 75 chars making > > my proposal look as ugly as possible. Did not realize that it was > > 75 chars here. Please excuse me, tell me if my proposal is > > unreadable I will resubmit it with lines wrapped at 70 chars > > or so.
> Why manually wrap it at all? Email clients have been able to handle > wrapping lines sensibly on behalf of the sender for about 20 years now. > Just type normally and only hit Return between paragraphs.
Right. I get it now. Won't do that blunder again :( Some of my friends who participated in previous years of GSoC had told me to manually wrap the text since they felt the text would look ugly after submission to Google's app if it is not wrapped with some small paragraphs appearing as a single huge line and also since wrapping gives a neatly presented look too :(
> On Thu, 2009-03-26 at 22:18 +0530, Madhusudan C.S wrote: > > Hi all, > > After some discussions with Malcolm on this list and doing some > > research based on the pointers he gave me I have come up with a > > rough plan of what I want to do this summer for Django. Since we > > are running out of time, I have come up with a *rough draft* of the > > proposal without full discussion with the Django community about the > > features that can be implemented. So this is in no way a *Complete > > Proposal* and I don't want to submit until some discussion on this > > happens really. Also the required proposal format asks to put the > > links of the devel list discussions that led to the proposal, which I > > don't > > have except Malcolm's mails. So I kindly request you all to review my > > proposal thoroughly and suggest me what I can add or subtract from > > the proposal. If my propositions and assumptions are true and how I > > can correct myself, so that I can submit my proposal to Google.
> > *Note: * > > Django doesn't serialize inherited Model fields in the Child Model. > > I asked > > on IRC why this decision was taken but got no response. I searched the > > devel list too, but did not get anything on it. I want to add it to > > my > > proposal, but before doing it I wanted to know why this decision was > > taken.
> Most likely because it will lead to duplicate data when you dump the > models for a particular app. Often the parent and child are in the same > application and you'll see the data from the parent in two places. > *Very* fiddly to untangle. It might be possible to add an option so that > parent data is optionally dumped when you dump a specific model (as > opposed to the whole app).
I personally this may be required in some situations, but I agree that those situations may be rare and it will be better to have it forcibly by passing parameter to serializer. I am still wondering why no one has opened a ticket on this? Do you think it will be a good idea to add in the proposal? (If so I will also open a ticket on the issue and do some preliminary work on how it may be implemented? )
> > Will it be a workable and necessary solution to add that to my > > proposal? > > Same is the case for Ticket #10201. Can someone please tell me why > > microsecond data was dropped?
> Quite probably because MySQL (or possibly just the MySQLdb wrapper) > sucks and can't support microsecond data when it's in a datetime value. > So reinserting that data requires yet another place where we have to set > microseconds to 0. It's one of those cases where we've adopted the > lowest common denominator.
> But do you think you may add it sooner or later since it is one of those
tickets lying there? And may haven't done it now because you have better things to concentrate ATM? I feel it may be unfair to other backends that support microseconds info.
On Fri, Mar 27, 2009 at 1:48 AM, Madhusudan C.S <madhusuda...@gmail.com> wrote: > Hi all, > *Note: * > Django doesn't serialize inherited Model fields in the Child Model. I > asked > on IRC why this decision was taken but got no response. I searched the > devel list too, but did not get anything on it. I want to add it to my > proposal, but before doing it I wanted to know why this decision was > taken. Will it be a workable and necessary solution to add that to my > proposal?
Malcolm has already addressed this, and his analysis is pretty much spot on. I would only add that the current behaviour can also be explained by looking at the heritage of the fixture system. Historically, Django's fixtures have been used as a way of serializing output for transfer between two Django installations (for example, as test fixtures). To this end, the serializers have concentrated on replicating a very database-like structure - that is, the structures that are serialized closely match the underlying database structures. In an inheritance situation, child tables don't contain all the data from the parent table; hence, neither do the serialized structures.
Obviously, this focus on representing the database misses an obvious alternate use case - occasions where serialization is required to communicate to some other data consumer, such as an AJAX framework. In my 'big picture' of the ideal serialization SoC project, this is the problem that needs to be fixed. More on in later comments.
> Same is the case for Ticket #10201. Can someone please tell me why > microsecond data was dropped?
Again, Malcolm is on the money. If you can come up with a fix that enables non-millisecond deprived databases to maintain microseconds, I'm sure it would be a welcome inclusion. Thinking about it, this shouldn't actually be that hard to achieve.
> Also I am leaving adding extras option to serializers since a patch for it > has already been submitted(Ticket #5711) and looks like a working > solution. If you all want something extra to be done there to > commit it to django trunk, please tell me, I will work on that a bit > and add it to the proposal.
If you are intending to take on "updating the serializers" as a SoC project, I would encourage you to include #5711 as part of your proposal. There may be a patch on #5711, and it may be the right solution, but the patch isn't even close to being ready for trunk - for one thing, there are no tests or documentation. Finishing the work on this ticket would be a very worthwhile contribution.
> Here is my long long long proposal: ... > ~~~~~~~ > Why? > ~~~~~~~
> - The existing format of the serialized output firstly doesn't specify the > name of the > Primary Key(PK henceforth), which is a problem for fields which are > implicitly set > as PKs (Ticket #10295).
This ticket is a very small part of a bigger problem. The fact that the primary key isn't named in the serialization format is of no consequence to the 'database replication' role for serializers, evidenced by the extensive test suite that demonstrates round trip fixture loading. It is only significant when you need to support some alternate data consumer that needs to know the name of the primary key.
The bigger issue is that we need to be able to easily reconfigure the output format of serializers to suit the specific requirements of other data consumers.
> - The existing format only specifies the PK of the related field, but > doesn't traverse it > in depth to specify its fields (Ticket #4656). > - There are no APIs for the above said requirement. > - The inherited models fields are not serialized.
Again, these are just variations on the same theme. The real problem is being able to easily reconfigure the output format.
> Situations/problems arising from attempting to fix the above problems > - When we allow Serialization to follow relations, it becomes unnatural if > the related Model is included in every relating model data. The data > becomes extremely redundant. Consider the following example.
It may be redundant, but it may also be required, depending on circumstance. This is something that needs to be left in the hand of the end-user.
> - The way loaddata and dumpdata are handled is changed. The new version of > this > loaddata and dumpdata may not be compatible with the fixtures generated > from > older versions.
The current serialization format is well known, well understood, and well suited to the task it was designed to perform. As a result, I would expect that this format would remain as the 'default' format for Django fixtures, and be entirely backwards compatible without extra options/flags.
However, there is an obvious need for alternate formats. These formats may be dramatically different from the current serialization formats, and certain output formats may not contain enough data to be used for later loading - for example, consider the case where your want an AJAX response that contains a list of (author_name, book_title) tuples. This structure may be useful to your AJAX application, but won't be useful for recreating a list of Author and Book records in your database.
My point is that a serialization format doesn't necessarily have to be 'round-trip'. The existing default format is, and needs to remain that way. However, the corollary of this is that 'new format' serializers don't necessarily need to be made available to loaddata/dumpdata.
> The project begins with implementing a version-id field in the serialized > output. This > field is provided for backwards compatibility. Then it proceeds by > converting the existing > PK field which appears as > { > "pk": 1, > "model": "testapp.choice2", > #... > to serialize the name of the PK field. I propose it to be presented as: > { > "pk": { > "id": 1 > }, > "model": "testapp.choice2", > #... > This change is being proposed keeping in mind that David Crammer's patches > for > Ticket #373 gets into Django trunk sometime or the other, since it should > happen as > it is a long standing requirement. This representation allows for multiple > PK fields to > exist in the model and be serialized correctly.
This isn't a problem you need to worry about. If/when #373 lands, the default serialization format will also have to change. We don't need to pre-emptively change it, and the existing serialization format would be entirely compatible with a world where multiple primary key models exist. Remember - in the 'database serialization' case, we can introspect model definitions to see when multiple primary keys exist, so on deserialization, we will know when "pk" is a single value and when it is a list/dict/whatever format eventuates to support multiple primary keys.
I can see how your proposal addresses #10295, but as I said earlier, that's a very small scope version of a larger problem. I would suggest addressing your efforts at fixing the bigger problem, rather than a cosmetic approach to #10295 that is backwards incompatible with all existing fixtures.
> The second phase, the biggest phase, starts by implementing serializing of > relations in > depth. The APIs will be implemented for these things hand-in-hand as the > features are > being implemented. An API to specify, what relations to serialize, will be > provided with > "relations=(rel1, rel2, ...)" parameter to serialize. Also a parameter to > specify > "relation_depth=(N1, N2, ...)" will be provided to serialize the related > models recursively
There are two ways to interpret ticket #4656, and you have picked my least favourite of the two. :-)
Option 1 (my preferred interpretation) is to look at this as a 'gathering dependencies' interface to the existing serializers.
For example, you pass an Book object to the serializer. That book contains a reference to an Author. That author contains a reference to a City.
In the current serializers, your output fixture only contains the Book object. This may be a useful fixture, but it has referential integrity problems - the Book contains a FK reference to a non-existent author. It would also be nice to be able to say "and also serialize all the other objects that are required in order to reproduce this full object" - that is, by serializing the Book, you automatically get all the related Author and City records. At the moment, the only way to do this is to dump the entire Book, Author and City tables, and prune out any data you don't want.
Adding a 'select related' option to the existing serializers would make it much easier to generate fixtures, or dump parts of a database, and it requires no changes to the output format at all.
Option 2 (your interpretation), is to allow for inline serialization of related models. All my previous arguments about output format apply, along with all your arguments about redundant encoding of data. If you solve the bigger problem of allowing flexible output formats, then the need to hard-code embedded child model data goes away.
> The project is planned to be completed in 9 phases. ... > 2. Finalizing Design and Coding Phase I (May 22th – May 31st ) > 3. Testing Phase I (June 1st – June 5th )
As a prior warning - I'm very skeptical of anyone that proposes a "test" phase that isn't integrated with the "build" phase. If you're not testing at the same time you are building, then you don't know you have the right result? If you test after you build, what happens when your test reveals a problem with your implementation?
I know line items like this make accountant types happy, but it just doesn't wash with me. If your implementation, including tests, will take 3 weeks, then say three weeks. Don't say 2 weeks implementation followed by a 1 week test.
> Lastly I want to express my deep commitment for this project and Django. > I'm fully > available this summer without any other commitments, will tune my day/night > rhythm
Hi Russell, I am extremely thankful to you for spending your invaluable time for doing a review (err... should I say post-mortem? ;-) ) of my complete proposal. I had kept my fingers crossed for someone who knew about the technical aspects of it to do it since most of my friends did only a language review (some of them even gave up seeing the length :( ). I am also equally thankful to Malcolm for it.
After a lot of thinking, reviewing and studying how other serializers, apart from Django serializers, in different languages and frameworks such as PHP, Python(pickle), Java, Turbogears(TurboJSON) and Boost work, the whole of yesterday, I have come up with some ideas which mostly departs from what I have proposed earlier. From the top view I still propose to solve the same problems I suggested in my initial proposal along with considering the bigger problems you suggested. Again this is a very rough draft of my ideas and requires a lot of refining by discussing with you and rest of the community.
Thanks to ideas on the Wiki. Reference to ModelAdmin there gave me some ideas to think further. Though this is not a copy, I have borrowed some ideas from other serializers I studied yesterday. Also I have ensured as far as possible that this doesn't break the existing Serializer and fixtures in any way, but only adds on to it. Please point out if I have gone against this somewhere.
The bigger issue is that we need to be able to easily
> reconfigure the output format of serializers to suit the > specific requirements of other data consumers.
The idea that I propose below is mostly to tackle this bigger issue which you pointed out throughout.
Let us consider same 2 models as before:
class Poll2(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
The user now will be able to construct a class on the lines of ModelAdmin for specifying custom serialization formats. I propose the API based on the following ideas. The user will be given an option to define a Serializer class that inherits from the framework's serializers classes, Base, XML, Python, YAML and JSON. For the moment, to avoid confusion, let me call the new Serializer newserialzer (But this is only tentative, decision as to whether we must rename the framework or just the classes can be finalized later). From what I have understand, Python mainly consists of basic datatypes of single value or the data structures like List, Tuple and Dictionary. Most other complex data types/structures are derived from these types and thus represented with those notations.
So our base class defines a set of class attributes that define the notation for these fields which are same as the Python notations, for example ListSeparators will be a 3-tuple containing enclosing notations and the List item separator ('[', ']', ','). Similarly Dictionary Separtors is a 4-tuple ('{', '}', ',', ':'). The last item is for key:value separation. Similarly more specialized cases will be defined for YAML and JSON classes. We can use this approach to XML too. For this case we can pass a tuple of strings with this format. list_separator = ('<list-name>', '</list-name>', '<>list-value</>') dict_separtorr = ('<dict-name>', </dict-name>', '<dict-key=dict-value></>') It is important to note here that list-name, dict-name, list-value, dict-value, dict-key are all indicative and are a part of the API(A better naming convention will be developed) and they are not the place holders for some other value there. As in, those are the names that must be always used consistently, which will be evident from the below examples.
The user can now inherit from one of these classes in his app depending upon the his requirements and over-ride these class attributes as per the format he wants. The API rougly looks like this for Serializing the Poll class, in a format similar to JSON notation.
In addition to this the user can specify the fields to be selected, by over-riding a class attribute, fields. This attribute is a tuple of strings where each item is the name of the field to be serialized. The above class can now be written as follows:
Additionally a class attribute named exclude_fields, a tuple of strings, is added which is just complimentary of fields attribute(Thanks to DjangoFullSerializers for giving this idea).
To solve the ticket #5711, I propose a method extra_fields() which returns a dictionary. It must return dictionary instead of a tuple because most of the times the extra fields are computed/derived fields. Example below:
One can also specify how a Primary Key can be serialized with the method def pk_serialize() which returns a dictionary. This should address the ticket #102. Example below:
The dictionary can contain any number of items, but the stress is for the use of *pk_value* at least once to serialize the PK value somewhere. I am still unsure, if I should make this a method or an attribute. Can some one kindly give suggestions?
The serialized output after over-riding the pk_serialize() method looks something like below. { "pk": 1, "pk name": 'id' "model": "testapp.poll2", "fields": { "pub_date": "2009-03-01 06:00:00", "question": "What's Up?" } }
An additional model_extras() method can be overridden, which by default returns nothing in the Parent classes. But in the over-ridden method of the derived class this can return a dictionary of values which are added to the Model's serialized data. An example of this can be version number of the serialized format. API example:
class PollSerializer(newserializer.JSONSerializer): #... def model_extras(self): return {'version': '2.1'}
Finally coming to the big thing, Ticket #4656, I propose 3 Class attributes for this. First one being select_related (as per your suggestion) which is a dictionary. The key of the dictionary being the name of the Relation Attribute and the value is a dictionary. This dictionary can have keys - 'fields' or 'excludefields', whose values are tuples of strings, which indicate the name of the fields in that model to be selected or excluded. If this dictionary is empty, it serializes the entire model, by using its Serialization class similar to this one, if at all defined or using the existing serializers.
Example: class ChoiceSerializer(newserializer.JSONSerializer): #... select_related = {'poll': {'fields': ('question')}}
NOTE: I am not very sure if I can implement this in the SoC timeline, but I will include it in the API proposal, if I run out of time I will continue with this after GSoC. If time permits, well and good, I will implement this too. The value of 'fields' key in the above dictionary is a tuple of strings which clearly means I cannot follow a relation on that model. So I wish to also allow dictionaries in this tuple along with the strings. This dictionary is again a select_related kind of nested dictionary which can follow the relation in that realtion and so on. For the Book, Author, City example you gave, it can looks like this: class BookSerializer(newserializer.JSONSerializer): #... select_related = { 'author': { 'fields': ('name', 'age', { 'city':{ 'fields': ('cityname', ...) } }) } } *END NOTE*
Rest of the following are in the SoC timeline. The second of the 3 attributes, is the inline_related attribute which can be set to True. In the parent class this is false. If it is set to true, Serializer will serialize the select_related relations inline.
The third attribute is the reverse_related. It is again a dictionary, similar in structure to the select_related dictionary, with keys being the name of the Model that relates to this model. For example:
The user registers this PollSerializer class with our serializer framwork, similar to ModelAdmin as: serializer.register.model(Poll, PollSerializer)
Now a question arises, what if the user wants to change only the serialization format i.e notation, nothing else in the entire app? Should he do the donkey's coding job of copy pasting list_separtor and dict_separator? I feel he need not. For that I propose the following. The solution is to define a Serializer class, say AppnameSerializer with what ever app specific customization he wants(provided by the API) and the call serializer.register.app(AppName, AppnameSerializer).
This can be extended to multiple apps and too. If he wants to customize a set of apps, he can say: serializer.register.app(multiple_apps=(App1Name, App2Name, ...), AppSetSerializer).
On Sat, Mar 28, 2009 at 12:17 PM, Russell Keith-Magee <
> On Fri, Mar 27, 2009 at 1:48 AM, Madhusudan C.S <madhusuda...@gmail.com> > wrote: > > Hi all, > > *Note: * > > Django doesn't serialize inherited Model fields in the Child Model. I > > asked > > on IRC why this decision was taken but got no response. I searched the
Hello all, Also I would like to add again that, I am madrazr on #django-dev. Whenever I tried to ask something I haven't got any response till now. I am not complaining, I understand it is mainly because of timezone problems. I just want to inform anyone who wants to tell me directly on my face ;-) anything about my proposal that I am available for that :D I will be around whenever I am logged into the channel.
After some thinking again, I have re-worked on my proposal and come up with the following idea. Here is my draft proposal. I have also submitted it to socghop.appspot.com
Let us consider the following two models for discussion through out: class Poll(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') creator = models.CharField(max_length=200) valid_for = models.IntegerField(max_length=200)
This projects begins by providing ModelAdmin and Feeds framework like APIs for Serializers where the user now will be able to construct a class for specifying custom serialization formats. I propose the API based on the following ideas.
The user will first define a Class inherited from the Serializer framework. The parent class is a generic base Serializer class. The user defined class is then passed as a parameter to the serialize method we call when we want to serialize the Models. Within this class the user will be able to specify the customized serialization format in which he desires the output. Since Python supports majorly three data structures, Lists, Tuples and Dictionaries, this format can contain any of these data structures in any possible order. Examples:
Example 1: class PollSerializer(Serializer): custom_format = [("question", "valid_for", "id")]
The output in this case will be a list of tuples containing the values of question, valid_for and id fields. Here the strings are the names of the fields in the model.
OR Example 2: class PollSerializer2(Serializer): custom_format = (["question", { "valid_for_number_of_days": "valid_for" "Poll ID": "id" }])
The output in this case will be a tuple of lists containing the values of question and a dictionary which contains valid_for and id fields as values and their description as keys of a dictionary.
The implementation although not trivial, will work as follows: (This is not final. Final implementation will be worked out by discussing with the community) - The custom_format will be checked for the type. The top level structure will be decided from this type. "{}" if dictionary, "()" if tuple and "[]" if list. In case of XML, the root tag will be django-objects. Also its children will have tag name as "object" and include model="Model Name" in the tag. This is same as the existing XML Serializer till here.
- Further the type of the only item within the top-level structure is determined. All the django objects serialized will be of this type. In case of XML, the children of "object" tag will be the tags having the name "field". The tags will also have name="fieldname" and type="FieldType" attributes within this tag. Additionally if these field tags are items of the dictionary, they will have a description="dictionary_key" attribute in the field tag.
- Further each item within the inner object("question","valid_for" and "id" in the first example) is checked for the type and the serialized output will have corresponding type. This is implemented recursively from this level. In case of XML, however, the name of the tag for further level groupings will have to be chosen in some consistent way. My suggestion for now is to name the tags as "field1" for the third level in the original custom format structure, "field2" for the fourth level in the original custom format structure, and so on.
For the second example above, we call the serializer as follows:
Further when a user wants to include extra fields in the serialized data like additional non-model fields or computed fields, he needs to specify the name of the method in the class that returns the value of this field as the value of that item in his format. It should not be a String. So that we can check if the item value is callable and if so we can call that method and use the return value for serialization. For example:
Example 3: class PollSerializer(Serializer): custom_format = [("question", "valid_for", till_date)]
Further an important thing to note here is that, whenever the string passed as an item value to the custom_format anywhere in the whole format doesn't evaluate to any field in the model, it is serialized as the same string in the final output, thereby allowing addition of non-model static data, such as version number of the format among other things.
Another point to note here is that, the string specified in the custom format can also include fields from the Parent Models, thereby allowing even Parent Model fields to be serialized.
Further the user will be well informed in the docs that he cannot pass any arbitrary Django object when calling the serialize() method with custom_format parameter, but only the Objects of type for which the custom_format is defined using the ModelSerializer class. If he does so we it will be flagged as error.
Also last but not the least, a select_related parameter will be added to the serialize method, upon setting to True will automatically serialize all the related models for this model. Serializing the related model facilitates the reconstruction of the database tables for the given model in case there exists any constraints. Further the related models will be serialized in a default format.
Further if user knows what models might be selected when select_related is true, he can provide the parameter like below:
While Serializing the related models, the serializer checks to see if related_custom_serializers have items for the selected model and serializes in that format if it exists. Example: serializer.serialize("json", Poll.objects.all(), custom_serializer=PollSerializer2, select_related=True, related_custom_serializers={ "Model1" : Model1Serializer "Model2" : Model2Serializer } )
(I am very skeptical about the use cases for the above feature, since select_related is usually needed for round trips and rarely needed for external applications. Nevertheless I propose it here, "Waiting for further discussion")
NOTE: I must also admit that I am following the other proposal on the same idea. Felt no point in hiding it. But it was my idea too to provide custom format. I had started with this in my previous proposal itself I feel. I was having very similar idea in mind when I used list and dict separators, but got it wrong. After thinking of its weaknesses you said for a day or so, I came up with the same idea, but was unfortunately late in sending it, since you know I had already got it wrong 2 times :( Wanted to tell something sensible 3rd time and was preparing a more comprehensive solution. I hope it answers almost all the questions you gave as braindump on the other proposal.