Google SoC: Introduction and Ideas

2 views
Skip to first unread message

Jason Ledbetter

unread,
Mar 18, 2008, 1:15:49 PM3/18/08
to Django developers
Good morning!

My name is Jason Ledbetter and I'm a student hoping to participate in
Summer of Code 2008 through the Django team. I thought I should post
here first to introduce myself and throw out some of the ideas I have
baking to make sure I'm not a) replicating someone else's work, b)
completely on the wrong track or c) missing something obvious(ly bad)
about my design approach.

Just a smidge about me: I've been using Django for around 2 years to
design and maintain CMS tools for my company, Commonwealth
Underwriters. At 28 years old, I'm a little old to be a college senior
just getting my bachelor's degree. I took the long way. ;)

Django appeals to me on a few levels: it's in python, which is the
only language other than C or Erlang that I enjoy using. It's designed
by journalists; I became a programmer after I dropped out of Ball
State's journalism school. There's an obvious obsession with doing
things the Right Way, even if that means scrapping a bunch of code
that's shown itself to be the Wrong Way.

On to the actual content! My first impulse for a project is to add a
series of tools, each somewhat small in scope, that could smooth out
certain aspects of development. I figure if I tackle four month-long
projects in somewhat diverse areas, I could learn more about the code
base and gain a broader range of experience.

(Don't let the style of any of my sample code scare you; when writing
project code I'll have a list of the Django style guidelines next to
my keyboard.)

Here's what I have so far:


1. Batch Test Data Creation

Unless I'm missing something, there's no clean/easy *pythonic* way to
generate sets of test data for a given application. A large part of
what I do at work models with a lot of little details, a lot of models
strung together in complex ways, and a need to scale when confronted
with large data sets and lots of connections.

Between testing the models themselves and testing the viability of
certain reporting methods, it's important to have a sizable data set
to get various levels of real-world performance metrics.

Granted, there are ways to generate the data either by hand-writing
python scripts or by taking it down to the database level. But part of
django's philosophy (as I see it) is to make simple, monotonous things
easy to write, read and execute.

On my CMS I store report definitions as django models. I wrote a hard-
coded library to help me generate a given report in python code, so
that I can quickly create a given base set of reports on a new
installation. Using it looks like this:

http://python.pastebin.com/m2c9e1f17

The way this works is relatively simple: the outer "Report" is
actually a function, and everything else is an atom defined in my
easydefine.py. I'm not hung up on this particular syntax per-se, just
on the concept.

My code automatically links each model to the next one up on the tree
(e.g., each page is linked to the report). It also lazily links the
many-to-many objects together through refhandle and bound_page atoms.

I want to take this library and make it generic. Using introspection
(_meta) info, any given model could be registered with an 'easy
creator' system which could create the necessary atoms automatically.
Or batches could be implemented as a custom manager or part of the
default manager. Here's roughly what I'm envisioning:

http://python.pastebin.com/m365c09de

(I just noticed one of my sample fields conflicts with python reserved
words; ignore plz)

When you execute batch.run(), this would create 15 policies with a
random insured name pulled from an included dictionary of test human
names. Premium and class are randomly generated integers. Coverage
names are created by randomly pairing words from different lists; I'd
include a utilty object to easily create new sets of words to string
together which can be passed to the batch creator. The claim's loss
amount is set to the same amount every time; each setting atom will
accept standard python data objects.

Each policy would have one coverage which is automatically linked to
that policy; the batch creator will look for foreign keys which point
to the model one step up on the tree. The 5 claims made for each
coverage will also automatically link upward. In the case of multiple
keys to the surrounding object type, the batch creator will raise a
warning.

That's my idea in rough. As I mentioned earlier, I'm not hell-bent on
that particular syntax. I think the utility would be very useful,
however.

I'd make it easy to create other random dicts or other content
generators to pass a given attribute atom. Say, if someone wanted to
create a LorumIpsum() generator to fill a char field with lorum ipsum
sample text.


2. Type Coercion for Fields

Right now fields aren't enforced or coerced at all except to
(de)serialize for database communication or if a validation check is
run. I've yet to find a consistently implemented way to make sure the
value I pull from a given object is in the form that the given field
expects.

For example, using our above models, if I have a policy and I set its
premium to the string '100000', then it'll save fine. But if while
iterating through its fields, if I run a
field.to_python(field.value_from_object(policy)) then it doesn't
return int('100000'), it just returns '100000'. Granted, for complex
types like Decimal the to_python method works fine.

The reason I'm asking is, I don't understand why all fields don't
enforce such coercion. It'd be useful for certain tools, such as (for
example) a changelog that needs to compare the current values assigned
to an object instance versus the last known good values of said object
instance. If you don't make sure that the two values are of the same
basic type, the comparison won't be valid.

I'd be lying if I claimed I knew all the issues involved off-hand. So
on this one, I'm looking for more information and to find out any work
to get us from where I think I am to where I'd like to be.

Which brings me to my next idea.


3. The Union Field

Referenced code: http://python.pastebin.com/m606219e5

In a few situations (like the above change log) I've known that I want
to store a given value but it's impossible to determine until run time
what sort of value I want to store. If we have a changelog object like
the first model then we have to coerce to and from a string value in
order to do the proper comparison. Which isn't terrible, but it tends
to fill code with a lot of ugly double-checking, which is boilerplate,
which violates DRY.

What if instead we could use the second model's method? This would
work a lot like generic relations. Now when each change object is
saved, the value types can be set to a field type on the fly.
old_value and new_value could be properties which handle the coercion
to and from string values to python types automatically. It could also
be a property that automatically contributes the correct field type to
the object upon instancing.

One thing I haven't decided: if it's implemented as a property,
setting the value to a given python object type could automatically
set the value_type field. So if you run "ch.old_value =
datetime.date(2006, 1, 1)" then it'd automatically set the value_type
to be DateField.


I have other ideas that I'd like to run by you guys but I'll stop here
for now; I don't want my first post to be a novella. In general, I
thought that four fairly hefty and useful development tools written
one per month could make for a pretty successful Summer of Code. This
is my idea for two such tools.

Thanks in advance for your feedback. If I'm totally on the wrong
track, slap me onto the right one!


- Jason Ledbetter

Russell Keith-Magee

unread,
Mar 19, 2008, 10:00:19 PM3/19/08
to django-d...@googlegroups.com
On Wed, Mar 19, 2008 at 2:15 AM, Jason Ledbetter
<sarcast...@gmail.com> wrote:
>
> Good morning!
>
> My name is Jason Ledbetter and I'm a student hoping to participate in
> Summer of Code 2008 through the Django team.

Hi Jason,

Apologies for taking so long to respond. You have some good ideas
here. A few comments:

> On to the actual content! My first impulse for a project is to add a
> series of tools, each somewhat small in scope, that could smooth out
> certain aspects of development. I figure if I tackle four month-long
> projects in somewhat diverse areas, I could learn more about the code
> base and gain a broader range of experience.

While I understand what you're trying to do here, I'm not sure it is
the best strategy for getting accepted by GSOC as a student. The
banner title for your project will effectively be "Lots of Django
Stuff" - it's a little difficult to state in one line what it is you
are trying to achieve.

I suspect you may get more traction if you pick several small projects
in a similar area (for example, a few small test system improvements),
or pick a single problem that spans multiple parts of the Django core
(no immediate suggestions here - unicode conversion would have been a
good suggestion, but it's been done already :-) This makes your end
goal more obvious, and makes it easier to rally support for your
application.

> 1. Batch Test Data Creation

From your initial description, this (or something like it) sounds like
it would be a useful contribution. It is somewhat related to:

http://code.djangoproject.com/ticket/5419

which has already been accepted, but no work is underway to the best
of my knowledge.

If you're looking for some other test-related suggestions, try:
http://code.djangoproject.com/query?status=new&status=assigned&status=reopened&component=Unit+test+system&order=priority


> 2. Type Coercion for Fields
>

> I'd be lying if I claimed I knew all the issues involved off-hand. So
> on this one, I'm looking for more information and to find out any work
> to get us from where I think I am to where I'd like to be.

I think what you are referring to here is what the Django community
generally calls "model validation". Model validation is a work in
progress - I beleive Jacob has some prototype code. I have heard
noises that it might be a pre-v1.0 feature, but I will stand corrected
on this.

> 3. The Union Field

I'm not sure I see the value in this. It looks like what you're aiming
at is the equivalent of a C void *, or a CORBA Any type. Storage types
like this are a delightful way to shoot yourself in the foot, and I
can't say I consider it a major design flaw in Django that we don't
support them. In this case, I suspect that your original solution
(nullable columns, and an accessor property) is the right solution.

Anyway - best of luck with your GSOC application.

Yours,
Russ Magee %-)

alex....@gmail.com

unread,
Mar 19, 2008, 10:49:57 PM3/19/08
to Django developers
Model validation is listed on the Version One Features page, it was
being discussed by Jacob, Adrian, and others at the sprint on monday
and they definitely have an "architecture".

On Mar 19, 9:00 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:
> On Wed, Mar 19, 2008 at 2:15 AM, Jason Ledbetter
>
> If you're looking for some other test-related suggestions, try:http://code.djangoproject.com/query?status=new&status=assigned&status...

Jason Ledbetter

unread,
Mar 20, 2008, 2:32:50 PM3/20/08
to Django developers
I appreciate the input!

> While I understand what you're trying to do here, I'm not sure it is
> the best strategy for getting accepted by GSOC as a student. The
> banner title for your project will effectively be "Lots of Django
> Stuff" - it's a little difficult to state in one line what it is you
> are trying to achieve.

Very good point. I'm going to refocus specifically on methods of
testing models. Being able to create complex, linked, arbitrarily
large data sets could be combined with tests to throw random,
sometimes incorrect data at fields. A mix of the library I'm talking
about and some automatically generated unit tests could handle a large
part of such a system.

I could also flesh out a large set of automatic content creators, such
as the before mentioned "Lorum Ipsum" generator, to give more heft to
the library.

I'll brainstorm some ideas and wander around the ticket system a bit.
> If you're looking for some other test-related suggestions, try:http://code.djangoproject.com/query?status=new&status=assigned&status...

Thanks a ton, this is exactly the sort of information I needed since I
don't have my finger on the pulse of django's development yet.

> > 3. The Union Field
>
> Storage types
> like this are a delightful way to shoot yourself in the foot, and I
> can't say I consider it a major design flaw in Django that we don't
> support them.

I see your point. However, my idea isn't to recreate C's void pointers
(god no, I don't miss segfaults) but rather to make such a field that
*wouldn't* shoot its owner in the foot.

Once explicitly set to a certain data type, the field would behave
only and exactly like that data type, so that setting the field to
"Integer" for a given model instance would make the union field
perfectly mimic an IntegerField; only the invisible method of storage
on the DB back end would be different. In other words, even as a
native Django field, my idea of a UnionField would store itself in two
columns in the database, one holding an explicit value type and one
that holds the value itself.

In other words, my *intended* sacrifice wouldn't be less safety for
more flexibility (like void pointers), but rather less speed and more
storage size for more flexibility.

Even the admittedly wild idea of letting the data type change on the
fly (which would be a field argument, e.g., ducktype=True) would use
python's type checking to enforce cleanliness. The field would only
store a piece of data if it recognized the given data's type/class
("hey, this is an int/str/Decimal!") or if the data's class provided a
(unionize?) method. Otherwise the field would raise a ValueError
exception.

However, since this doesn't directly correlate to my new focus on
model testing, I'll probably just throw something together outside of
SoC and if it seems useful, I'll submit it through the usual channels.
I'm fairly sure I can make a safe, well-designed union field that
balances flexibility and safety; what I'm *not* sure of is if anyone
but me would ever need to use it.

Regardless, further commentary on the idea is useful since I'll be
taking a whack at it, SoC or not. :)

> Anyway - best of luck with your GSOC application.

Thanks, you've been very helpful!


-Jason L.

Jason Ledbetter

unread,
Mar 20, 2008, 2:35:50 PM3/20/08
to Django developers
On Mar 19, 10:49 pm, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> Model validation is listed on the Version One Features page, it was
> being discussed by Jacob, Adrian, and others at the sprint on monday
> and they definitely have an "architecture".

Thanks! I'll read what they have online so far. If they're actively
developing this then I'll leave it to the masters. If they're not,
then this could easily fit into my "testing models for integrity"
themed SoC project.

-Jason L.

Russell Keith-Magee

unread,
Mar 20, 2008, 11:11:19 PM3/20/08
to django-d...@googlegroups.com
On Fri, Mar 21, 2008 at 3:32 AM, Jason Ledbetter
<sarcast...@gmail.com> wrote:
>
> > While I understand what you're trying to do here, I'm not sure it is
> > the best strategy for getting accepted by GSOC as a student. The
> > banner title for your project will effectively be "Lots of Django
> > Stuff" - it's a little difficult to state in one line what it is you
> > are trying to achieve.
>
> Very good point. I'm going to refocus specifically on methods of
> testing models. Being able to create complex, linked, arbitrarily
> large data sets could be combined with tests to throw random,
> sometimes incorrect data at fields. A mix of the library I'm talking
> about and some automatically generated unit tests could handle a large
> part of such a system.

Sounds like a good focus.

> I could also flesh out a large set of automatic content creators, such
> as the before mentioned "Lorum Ipsum" generator, to give more heft to
> the library.

Just to help out - Django already has the core of a Lorum Ipsum
algorithm as part of the contrib.webdesign application. It isn't
currently linked to testing, but the algorithm for producing dummy
content is already in place.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages