WOULD LIKE TO CONTRIBUTE TO DJANGO

101 views
Skip to first unread message

Muhereza Herman

unread,
Oct 3, 2017, 10:31:56 AM10/3/17
to Django developers (Contributions to Django itself)
Hello, anyone reading this please help me out.
am a new developer but i would like to contribute to django.
please guide me on how to do that. thank you.

Дилян Палаузов

unread,
Oct 3, 2017, 11:23:28 AM10/3/17
to django-d...@googlegroups.com, Muhereza Herman
Hello Muhereza,

I assume you understand by now Django quite well and are willing to give
something "in return".

Currently QuerySet.get_or_create() consists of two SQL commands: SELECT
+ optional INSERT. They cause a concurrent problem, if another
get_or_create() is called between the SQL statements.

With the Postgresql backend it is possible to reduce the queries to a
single one.

Consider this table:

CREATE TABLE t (
id SERIAL PRIMARY KEY,
name VARCHAR(10) NOT NULL UNIQUE,
comment VARCHAR(10));

The following query can do what get_or_create() currently achieves:

WITH
maybe_found AS (SELECT * FROM t WHERE t.name='nameD'),
to_be_inserted AS (SELECT 'nameD' as "name", 'comment13' as "comment"),
just_inserted AS (
INSERT INTO t (name, comment) SELECT * FROM to_be_inserted
WHERE NOT EXISTS(SELECT * FROM maybe_found)
RETURNING *)
SELECT *, FALSE as "created" FROM maybe_found UNION ALL
SELECT *, TRUE AS "created" FROM just_inserted LIMIT 2;

where "to_be_inserted' contains the values for the new object ('default'
parameter of get_or_create) and 'nameB' in maybe_found is the criterion
passed to get().

The challenge is to integrate the WITH ... SELECT query in Django. As
guidance I can only suggest looking at the existing code.

Regards
Дилян


On 10/03/2017 10:24 AM, Muhereza Herman wrote:
> Hello, anyone reading this please help me out.
> am a new developer but i would like to contribute to_*django*_.
> please guide me on how to do that. thank you.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/e6d798ac-4ede-45c4-9f20-ca62f0595131%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/e6d798ac-4ede-45c4-9f20-ca62f0595131%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Oct 3, 2017, 12:00:44 PM10/3/17
to django-d...@googlegroups.com
Hello,

Since I haven't seen positive feedback from a committer, I'm not convinced there's consensus about this change.

Also this doesn't look like a particularly easy topic for a beginner. It raises the following questions:

- INSERT ON CONFLICT UPDATE would likely be a better option on Postgres ≥ 9.5, wouldn't it?
- what about other databases with built-in backends? third-party backends?
- is this implementation really appropriate at any isolation level?
- what are the consequences for performance?

Best regards,

-- 
Aymeric.


To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com <mailto:django-developers+unsubsc...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com <mailto:django-developers@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

Adam Johnson

unread,
Oct 3, 2017, 12:43:25 PM10/3/17
to django-d...@googlegroups.com
Also you posted this in https://groups.google.com/forum/#!msg/django-developers/r7JP_YHsIhM/jmof68XFBAAJ . Please don't repost, keep the conversation in one place.


For more options, visit https://groups.google.com/d/optout.



--
Adam

Дилян Палаузов

unread,
Oct 3, 2017, 3:17:39 PM10/3/17
to django-d...@googlegroups.com, Aymeric Augustin, Muhereza Herman
Hello,

Qui tacet consentire videtur.

I agree there is still a concurrency problem with the approach I
suggested. INSERT ON CONFLICT DO UPDATE might not work, if the
simultaneously inserted object has for some reason other default values,
that the current object, so that DO UPDATE will overwrite the data. I
am not talking about other backends.

Another idea to start with:

1. Add on_conflict=IGNORE parameter to bulk_create() -
https://code.djangoproject.com/ticket/28668

The problem is that in order to insert a lot of data in a database fast,
a single INSERT must be sent, but it fails if any of the records were
already in the database, so if this could happen, one has to iterate
over the input and do a separate INSERT for each object, but this is
slower than a single INSERT.

However with something like INSERT ... ON CONFLICT DO NOTHING (which
varies in the different RDBMs) the database is instructed to absorb all
the data that is not already there.

2. Afterwards extend bulk_create(..., on_conflict=IGNORE) to detect for
Postgresql which objects were actually added, as described at
https://groups.google.com/forum/#!topic/django-developers/wdHIYdQHO_0 ,
and return only those.

Regards
Дилян

On 10/03/2017 06:00 PM, Aymeric Augustin wrote:
> Hello,
>
> Since I haven't seen positive feedback from a committer, I'm not
> convinced there's consensus about this change.
>
> Also this doesn't look like a particularly easy topic for a beginner. It
> raises the following questions:
>
> - INSERT ON CONFLICT UPDATE would likely be a better option on Postgres
> ≥ 9.5, wouldn't it?
> - what about other databases with built-in backends? third-party backends?
> - is this implementation really appropriate at any isolation level?
> - what are the consequences for performance?
>
> Best regards,
>
> --
> Aymeric.
>
>
> 2017-10-03 17:23 GMT+02:00 Дилян Палаузов <dpa-d...@aegee.org
> <mailto:dpa-d...@aegee.org>>:
>
> Hello Muhereza,
>
> I assume you understand by now Django quite well and are willing to
> give something "in return".
>
> Currently QuerySet.get_or_create() consists of two SQL commands:
> SELECT + optional INSERT.  They cause a concurrent problem, if
> another get_or_create() is called between the SQL statements.
>
> With the Postgresql backend it is possible to reduce the queries to
> a single one.
>
> Consider this table:
>
> CREATE TABLE t (
>    id SERIAL PRIMARY KEY,
>    name VARCHAR(10) NOT NULL UNIQUE,
>    comment VARCHAR(10));
>
> The following query can do what get_or_create() currently achieves:
>
> WITH
>   maybe_found AS (SELECT * FROM t WHERE t.name
> <http://t.name>='nameD'),
> django-develop...@googlegroups.com
> <mailto:django-developers%2Bunsu...@googlegroups.com>
> <mailto:django-develop...@googlegroups.com
> <mailto:django-developers%2Bunsu...@googlegroups.com>>.
> To post to this group, send email to
> django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>
> <mailto:django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>>.
> <https://groups.google.com/group/django-developers>.
> <https://groups.google.com/d/msgid/django-developers/e6d798ac-4ede-45c4-9f20-ca62f0595131%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers  (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to django-develop...@googlegroups.com
> <mailto:django-developers%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> <https://groups.google.com/group/django-developers>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/062e270c-85f6-23a5-64a3-b225ddcff3ab%40aegee.org
> <https://groups.google.com/d/msgid/django-developers/062e270c-85f6-23a5-64a3-b225ddcff3ab%40aegee.org>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Tom Forbes

unread,
Oct 3, 2017, 3:36:45 PM10/3/17
to django-d...@googlegroups.com
Your idea about ignoring conflicts in bulk_create is a great idea, i've made a merge request that attempts to implements this. If anyone has any time to review it it would be much appreciated.

I couldn't find a very elegant way of adding it though - Sqlite and mysql need something added just after the INSERT, whereas postgres needs something added after the values but before the RETURNING statement. Oracle doesn't support this as far as I'm aware.

Qui tacet consentire videtur.

Silence is rarely an agreement on a mailing list, certainly not regarding a proposed new feature or potentially breaking change. Code speaks louder than words though :) 


        To post to this group, send email to

        Visit this group at
        https://groups.google.com/group/django-developers



    --     You received this message because you are subscribed to the Google
    Groups "Django developers  (Contributions to Django itself)" group.
    To unsubscribe from this group and stop receiving emails from it,

    To post to this group, send email to

    Visit this group at
    https://groups.google.com/group/django-developers



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com <mailto:django-developers+unsubsc...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com <mailto:django-developers@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

Brice Parent

unread,
Oct 3, 2017, 4:02:37 PM10/3/17
to django-d...@googlegroups.com

I'm not sure I got this right, but it seems to me that using get_or_create() is not the same thing as inserting or failing to do so, then getting the always existing data.

If there is no unicity in the fields we use for the get_or_create (because of bad design, uncommon scenario, database created elsewhere or imported using inspectdb, etc.), with get_or_create, we don't create a new entry, while with on_conflict=IGNORE, we do (correct me if I'm wrong, or if I didn't get something), because this insertion is not creating any conflict.

Also, with this, would we get the info if the element has been inserted or if it already existed (the return of the insert seems to provide the info, but I'm not sure) ?


Le 03/10/17 à 21:36, Tom Forbes a écrit :
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Дилян Палаузов

unread,
Oct 3, 2017, 4:27:36 PM10/3/17
to django-d...@googlegroups.com, Brice Parent
Hello Brice,

Here is the query I sent today, which improves my first suggestion:

1. WITH
2. maybe_found AS (SELECT * FROM t WHERE t.name='nameD'),
3. to_be_inserted AS (SELECT 'nameD' as "name", 'comment13' as "comment"),
4. just_inserted AS (
5. INSERT INTO t (name, comment) SELECT * FROM to_be_inserted
6. WHERE NOT EXISTS(SELECT * FROM maybe_found)
7. RETURNING *)
8. SELECT *, FALSE as "created" FROM maybe_found UNION ALL
9. SELECT *, TRUE AS "created" FROM just_inserted LIMIT 2;

INSERT from line 5 is only performed, if the data is not already in the
database, as stated on line 6 (WHERE NOT EXISTS).

The column "created" from the result states, if the data was inserted or
was already there before the call.

Regards
Дилян
>> <mailto:dpa-d...@aegee.org>> wrote:
>>
>> Hello,
>>
>> Qui tacet consentire videtur.
>>
>> I agree there is still a concurrency problem with the approach I
>> suggested.  INSERT ON CONFLICT DO UPDATE might not work, if the
>> simultaneously inserted object has for some reason other default
>> values, that the current object, so that DO UPDATE will overwrite
>> the data.  I am not talking about other backends.
>>
>> Another idea to start with:
>>
>> 1. Add on_conflict=IGNORE parameter to bulk_create() -
>> https://code.djangoproject.com/ticket/28668
>> <https://code.djangoproject.com/ticket/28668>
>>
>> The problem is that in order to insert a lot of data in a database
>> fast, a single INSERT must be sent, but it fails if any of the
>> records were already in the database, so if this could happen, one
>> has to iterate over the input and do a separate INSERT for each
>> object, but this is slower than a single INSERT.
>>
>> However with something like INSERT ... ON CONFLICT DO NOTHING
>> (which varies in the different RDBMs) the database is instructed
>> to absorb all the data that is not already there.
>>
>> 2. Afterwards extend bulk_create(..., on_conflict=IGNORE) to
>> detect for Postgresql which objects were actually added, as
>> described at
>> https://groups.google.com/forum/#!topic/django-developers/wdHIYdQHO_0
>> <https://groups.google.com/forum/#%21topic/django-developers/wdHIYdQHO_0>
>> , and return only those.
>>
>> Regards
>>   Дилян
>>
>>
>> On 10/03/2017 06:00 PM, Aymeric Augustin wrote:
>>
>> Hello,
>>
>> Since I haven't seen positive feedback from a committer, I'm
>> not convinced there's consensus about this change.
>>
>> Also this doesn't look like a particularly easy topic for a
>> beginner. It raises the following questions:
>>
>> - INSERT ON CONFLICT UPDATE would likely be a better option on
>> Postgres ≥ 9.5, wouldn't it?
>> - what about other databases with built-in backends?
>> third-party backends?
>> - is this implementation really appropriate at any isolation
>> level?
>> - what are the consequences for performance?
>>
>> Best regards,
>>
>> --
>> Aymeric.
>>
>>
>> 2017-10-03 17:23 GMT+02:00 Дилян Палаузов
>> <dpa-d...@aegee.org <mailto:dpa-d...@aegee.org>
>> <mailto:dpa-d...@aegee.org <mailto:dpa-d...@aegee.org>>>:
>> django-develop...@googlegroups.com
>> <mailto:django-developers%2Bunsu...@googlegroups.com>
>>
>> <mailto:django-developers%2Bunsu...@googlegroups.com
>> <mailto:django-developers%252Buns...@googlegroups.com>>
>>         <mailto:django-develop...@googlegroups.com
>> <mailto:django-developers%2Bunsu...@googlegroups.com>
>>
>> <mailto:django-developers%2Bunsu...@googlegroups.com
>> <mailto:django-developers%252Buns...@googlegroups.com>>>.
>>
>>         To post to this group, send email to
>> <mailto:django-d...@googlegroups.com>>>.
>> django-develop...@googlegroups.com
>> <mailto:django-developers%2Bunsu...@googlegroups.com>
>>     <mailto:django-developers%2Bunsu...@googlegroups.com
>> <mailto:django-developers%252Buns...@googlegroups.com>>.
>>
>>     To post to this group, send email to
>> <mailto:django-d...@googlegroups.com>>.
>>     <https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>>.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "Django developers (Contributions to Django
>> itself)" group.
>> To unsubscribe from this group and stop receiving emails from
>> it, send an email to
>> <mailto:django-developers%2Bunsu...@googlegroups.com>>.
>> To post to this group, send email to
>> <mailto:django-d...@googlegroups.com>>.
>> Visit this group at
>> https://groups.google.com/group/django-developers
>> <https://groups.google.com/group/django-developers>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com>
>> <https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com?utm_medium=email&utm_source=footer
>> <https://groups.google.com/d/msgid/django-developers/CANE-7mVgudhd0wDAvX9_uLxFUJ89w2Lge6-0rOh_U3hPCc4WnA%40mail.gmail.com?utm_medium=email&utm_source=footer>>.
>>
>>
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to django-develop...@googlegroups.com
>> <mailto:django-developers%2Bunsu...@googlegroups.com>.
>> To post to this group, send email to
>> django-d...@googlegroups.com
>> <mailto:django-d...@googlegroups.com>.
>> Visit this group at
>> https://groups.google.com/group/django-developers
>> <https://groups.google.com/group/django-developers>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/9182fb5c-4410-b28d-9e30-8d9b295ea22d%40aegee.org
>> <https://groups.google.com/d/msgid/django-developers/9182fb5c-4410-b28d-9e30-8d9b295ea22d%40aegee.org>.
>>
>>
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to django-develop...@googlegroups.com
>> <mailto:django-develop...@googlegroups.com>.
>> To post to this group, send email to
>> django-d...@googlegroups.com
>> <mailto:django-d...@googlegroups.com>.
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/CAFNZOJOpP39_-4PEUujz7FTdhERgjLGJmO8zC-BhMb1SFZUBQA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-developers/CAFNZOJOpP39_-4PEUujz7FTdhERgjLGJmO8zC-BhMb1SFZUBQA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/98d32237-5e3f-5c20-5686-9be1bfcf5168%40brice.xyz
> <https://groups.google.com/d/msgid/django-developers/98d32237-5e3f-5c20-5686-9be1bfcf5168%40brice.xyz?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages