GSOC 2015: Improving the less popular database backends

398 views
Skip to first unread message

Yichun Duan

unread,
Mar 9, 2015, 7:17:24 AM3/9/15
to django-d...@googlegroups.com
Hi,

I'm Yichun and I'm interested in the Project ' Improving the less popular database backends' in GSOC 2015. I major in computer science in Peking University. I've worked with C++, Java, Python and I've written several webapps using Django. 

I have some questions related to this project and your answers will be of big help.

(1) I'd like to make sure that I've understood this project in a right way. For example, if I choose to improve the backend of Oracle, then I can choose some related issues in Trac's list of Oracle issues and prepare to work on them during the whole term, right? I've not seen a mentor for this project in the GSOC guide, and I think someone here may help me.
(2) Actually I'm familiar with MySQL and SqlServer, but never use Oracle before. Will it be a big obstacle in my work if I choose to write the backend of Oracle? Or if I begin studying Oracle right now, it won't be a problem?
(3) I'm a new contributor in Django. Before I start working on my project, should I start from fix simple bugs or something else?

Thanks,
Yichun

Russell Keith-Magee

unread,
Mar 9, 2015, 8:10:49 PM3/9/15
to Django Developers
Hi Yichun,

On Mon, Mar 9, 2015 at 4:13 PM, Yichun Duan <yichu...@gmail.com> wrote:
Hi,

I'm Yichun and I'm interested in the Project ' Improving the less popular database backends' in GSOC 2015. I major in computer science in Peking University. I've worked with C++, Java, Python and I've written several webapps using Django. 

I have some questions related to this project and your answers will be of big help.

(1) I'd like to make sure that I've understood this project in a right way. For example, if I choose to improve the backend of Oracle, then I can choose some related issues in Trac's list of Oracle issues and prepare to work on them during the whole term, right? I've not seen a mentor for this project in the GSOC guide, and I think someone here may help me.

I don't think anyone has specifically stepped forward to mentor an Oracle-based project, but if you put forward a strong proposal, we will find someone to mentor you. 

As for what to tackle for the project itself - one approach for your project would be to just tackle a collection of Oracle-related tickets. A better approach (more likely to be accepted) would be for your to analyse the tickets that have been reported, and see if you can find a common theme (or themes) - and then propose a way that we can eliminate those problems at a higher level.
 
(2) Actually I'm familiar with MySQL and SqlServer, but never use Oracle before. Will it be a big obstacle in my work if I choose to write the backend of Oracle? Or if I begin studying Oracle right now, it won't be a problem?

A history of Oracle experience would definitely work in your favour, but *not* having that experience won't necessarily work against you. The fact that you've got experience with multiple SQL flavours means you should be aware of some of the interesting ways SQL is interpreted. Highlight that experience in your application, and that should be enough.

Alternatively, you could try to work with the developers of the SQLServer backend and see if you could improve support for that backend.
 
(3) I'm a new contributor in Django. Before I start working on my project, should I start from fix simple bugs or something else?
 
You don't have to, but it would certainly be looked upon favourably. In particular, if you were to tackle one or two small Oracle bugs, that would be very helpful to prove that you're going to be able to quickly pick up the new skills you require for the GSoC term.

Yours,
Russ Magee %-)

Yichun Duan

unread,
Mar 9, 2015, 10:46:28 PM3/9/15
to django-d...@googlegroups.com
Thanks a lot for your suggestions. I'll try.

在 2015年3月10日星期二 UTC+8上午8:10:49,Russell Keith-Magee写道:

Shai Berger

unread,
Mar 10, 2015, 7:39:35 PM3/10/15
to django-d...@googlegroups.com
One point which may be relatively easy and valuable is to get Django to
support backend-provided tests, and do the final clean-up of backend-specific
tests from the general test suite. That is, put all the postgres-specific tests
somewhere in django.db.backends.postgresql_psycopg2 etc, and have the Django
test runner pick them up from there, only if that backend is actually specified
in settings. There are some open questions about this -- e.g. proper handling
of multiple databases -- but in general, this will both make it easier for all
backends to pass the Django test suite, and make it easier for each backend to
improve its own quality.

On Tuesday 10 March 2015 04:46:28 Yichun Duan wrote:
> Thanks a lot for your suggestions. I'll try.
>
> 在 2015年3月10日星期二 UTC+8上午8:10:49,Russell Keith-Magee写道:
>
> > Hi Yichun,
> >
> > On Mon, Mar 9, 2015 at 4:13 PM, Yichun Duan <yichu...@gmail.com
> >
> > <javascript:>> wrote:
> >> Hi,
> >>
> >> I'm Yichun and I'm interested in the Project ' Improving the less
> >> popular database backends' in GSOC 2015. I major in computer science in
> >> Peking University. I've worked with C++, Java, Python and I've written
> >> several webapps using Django.
> >>
> >> I have some questions related to this project and your answers will be
> >> of big help.
> >>
> >> (1) I'd like to make sure that I've understood this project in a right
> >> way. For example, if I choose to improve the backend of Oracle, then I
> >> can choose some related issues in Trac's list of Oracle issues
> >> <https://code.djangoproject.com/query?status=assigned&status=new&summary
> >> =~oracle&or&status=assigned&status=new&keywords=~oracle&col=id&col=summa
> >> ry&col=status&col=owner&col=type&col=component&order=priority> and

Yichun Duan

unread,
Mar 14, 2015, 9:45:16 PM3/14/15
to django-d...@googlegroups.com
Thank you. When you say "backend-provided tests", do you mean MySQL scripts, Oracle scripts or something else like these? Or just some tests based on django standard, which have marks to tell test runner the backend they suite?

在 2015年3月11日星期三 UTC+8上午7:39:35,Shai Berger写道:

Shai Berger

unread,
Mar 15, 2015, 7:07:11 AM3/15/15
to django-d...@googlegroups.com
On Sunday 15 March 2015 03:45:16 Yichun Duan wrote:
> Thank you. When you say "backend-provided tests", do you mean MySQL
> scripts, Oracle scripts or something else like these? Or just some tests
> based on django standard, which have marks to tell test runner the backend
> they suite?
>

I mean standard Django tests, which are located in folders inside the backend
source. That is, Oracle-specific tests in django.db.backends.oracle.tests etc.
Today, the test-runner will not discover such tests and so will not run them
even if the specific backend is in use.

In the current Django test-suite, there are several tests marked to run only
on a specific backend. I'd also like to remove them from there, and put them
into the backends. That's the "clean-up".

Shai.

Yichun Duan

unread,
Mar 15, 2015, 7:09:02 AM3/15/15
to django-d...@googlegroups.com
Got it. Thanks.

在 2015年3月15日星期日 UTC+8下午7:07:11,Shai Berger写道:

Yichun Duan

unread,
Mar 15, 2015, 8:03:44 AM3/15/15
to django-d...@googlegroups.com
And another question, how long will this task take in your opinion? I can hardly estimate it now. 


在 2015年3月15日星期日 UTC+8下午7:07:11,Shai Berger写道:
On Sunday 15 March 2015 03:45:16 Yichun Duan wrote:

Shai Berger

unread,
Mar 16, 2015, 7:22:40 AM3/16/15
to django-d...@googlegroups.com
I can't answer for sure because there are still open issues there (e.g. what
do you do in the face of multiple databases using different engines). But for
the "clear cut" issues, my estimate is that they shouldn't take more than a
few days.

Shai.

Tim Graham

unread,
Mar 16, 2015, 11:04:51 AM3/16/15
to django-d...@googlegroups.com
I like the idea of establishing a convention for discovering backend specific tests. As it is now, it seems that third-party backends need to have a customized version of Django's own runtests.py which duplicates a lot of code. Example for django-mssql:
https://bitbucket.org/Manfre/django-mssql/src/1d8da0ee6dd4c1f1e32d29bf405dd8c7e682fba8/tests/?at=master

I'm not thrilled with the convention of storing them in a subdirectory of the backend itself (easier to keep code and tests separate, I think; also we moved contrib tests out of contrib for similar reasons), but that seems like an implementation detail that could be sorted out.

Yichun, you can grep Django's tests for "connection.vendor" to see the backend specific tests we have now. There's only about 35 occurrences so as Shai alluded to, there wouldn't be huge about of code shuffle.

Yichun Duan

unread,
Mar 17, 2015, 8:31:48 AM3/17/15
to django-d...@googlegroups.com
I've written a draft proposal according to your idea. draft proposal
I'm looking forward for your feedback. Thank you for your help.

在 2015年3月16日星期一 UTC+8下午7:22:40,Shai Berger写道:

Yichun Duan

unread,
Mar 17, 2015, 8:34:19 AM3/17/15
to django-d...@googlegroups.com
Thank you for your suggestions. I'll try. 
And I've written a draft proposal according to this Shai's idea. Maybe you will be willing to take a look at it. draft proposal

在 2015年3月16日星期一 UTC+8下午11:04:51,Tim Graham写道:

Russell Keith-Magee

unread,
Mar 17, 2015, 6:39:46 PM3/17/15
to Django Developers
Hi Yichun Duan,

Thanks for submitting a proposal - what you've submitted is a good start, but it needs some more detail and clarification.

4.1: You say you're going to save the database-specific tests "to another place" - Can you give specifics, or at least an idea of you current thinking? How will that chosen location allow external database backends to provide tests?

4.1 (the second one): What backend specific tests are you proposing to write? We've got a fairly well tested code base; if there's a specific area where you think more tests are required, you should be explicit about the nature of those tests.

4.2: Isn't really true. We already have database specific tests, using the @skipUnless(connection.vendor == '...')  decoration on tests. What were you thinking of introducing in this step?

4.3: If I understand this, You're allocating a week to *run* a test suite? This sounds... very well padded. Unless you've got something specific in mind, I would consider "running the test suite" to be just part of normal development, not something that requires a specific time allocation.

4.4: "Solve the database problem". Sorry, but this sounds like an "underpants gnome" answer: Step one - steal the underpants, Step two...... Step three Profit! It's not clear to this proposal that you've described a single problem, let alone proposed a solution that can clearly be implemented in 2 weeks.

4.5: What constitutes a "more complex environment"? You either have a MySQL database, or you don't. What "complexity" are you referring to? 

4.6-4.8 are broadly Ok - having a couple of weeks for cleanup and extension tasks at the end of your project is a good idea. However, you've allocated 1/3 of your project time to "cleanup" tasks. That seems a little excessive.

Item 4.4 is the biggest issue in my opinion. It feels like this is the real "meat" in your proposal, but it is not clear to me at all what you're planning to do here. You don't have to provide detailed specifics, and what you propose here may not end up being what is accepted -- but some broad architectural ideas would be very helpful.

It would also be helpful to think about what you can provide as interim deliverables for this project. Ideally, the result of this project wouldn't be "one big PR at the end" - it would be much better to see a number of smaller pull requests. Smaller PRs are more likely to get merged, and reduce the need for any time allocation for merge tasks.

Yours,
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1f8f91d1-e921-4dc7-9ac6-efc61f667660%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Yichun Duan

unread,
Mar 22, 2015, 8:32:46 AM3/22/15
to django-d...@googlegroups.com
Thanks. Updated. :)

在 2015年3月18日星期三 UTC+8上午6:39:46,Russell Keith-Magee写道:

Russell Keith-Magee

unread,
Mar 22, 2015, 7:22:30 PM3/22/15
to Django Developers
Hi Yichuan Duan,

I've taken another look; many of the same comments from last time still apply - especially those related to time estimates. 

When we say we want a detailed plan, we mean *detailed*. *Nobody* - I don't care how long they've been working with software - *Nobody* can estimate "3 weeks" for anything and be remotely accurate. In my experience, the accuracy of any estimate greater than 2-3 days is usually "a complete guess".  Building an estimate at this level of granularity takes time - but it's also part of making sure you actually understand the problem. 

Historically, we've preferred estimates that are mostly at the level of 1 week granularity; 2 weeks blocks are *occasionally* allowed for a particularly complex piece of work. However, that 2 week block must then be accompanied by a *detailed* description of exactly what is going to be done. 

To pick on one particular estimation task as an example: 5.2 "write backend specific tests". Why is this a 2 week task? That's 10 working days, or 80 working hours (assuming a standard work week). Lets say that it takes 10 minutes to write an individual test case - that means you're proposing to add 480 new tests to the Django test suite. Where is the inspiration for these new tests coming from? There's only 195 open tickets on database backends - and you won't be able to use those tickets for inspiration, because those problems haven't been fixed yet. Even if the same test applies to 5 different backends, you're still looking at almost 100 unique tests. 

Looking at that math, your estimate of 2 weeks to "write tests" looks *incredibly* padded. That said, you may have something in mind. You may have ideas about what you're going to test, and why. However, none of this is communicated by your proposal. 

The process of estimation isn't to write down a series of bullet points that adds up to 12 weeks. It's to convince us that you have enough work to fill 12 weeks. And, to be clear - we already believe that there is 12 weeks worth of work here - the problem is that you need to convince us that *you* know where those 12 weeks of work are.

Yours
Russ Magee %-)



Yichun Duan

unread,
Mar 22, 2015, 11:07:18 PM3/22/15
to django-d...@googlegroups.com
Got it. Thanks.

在 2015年3月23日星期一 UTC+8上午7:22:30,Russell Keith-Magee写道:
Reply all
Reply to author
Forward
0 new messages