Proposal for better managed raw SQL migrations

106 views
Skip to first unread message

Petr Přikryl

unread,
Mar 11, 2020, 5:59:45 AM3/11/20
to Django developers (Contributions to Django itself)
I have only proposal for making Django migrations better handling raw SQL migrations. I like the way how django-migrate-sql do that https://github.com/festicket/django-migrate-sql. And it would be interested if Django has this feature build-in.

We use a lot of raw SQL objects in our apps and time to time we are cleaning migrations so storing to SQL itself in migrations isn't right way for us. I like the idea, looking on SQL objects like on models and generate migrations for them automatically based on their definition changes. What do you think?
Message has been deleted

Adam Johnson

unread,
Mar 11, 2020, 6:20:56 AM3/11/20
to django-d...@googlegroups.com
Hi Petr

I too often end up managing some database objects like triggers, normally in order to support database migrations. I have always been happy using RawSQL migration operations though.

What types of database objects are you mostly using? And how?

I think django-migrate-sql is a neat idea, but since it requires full authorship of the SQL it doesn't provide the most ORM-like experience. It's also not backend agnostic, which is something preferable for anything we add to Django. I'd be more interested in implementing classes that represent the underlying object type, like Models represent tables, and translating changes to them into migration operations.

Thanks,

Adam

On Wed, 11 Mar 2020 at 09:04, Petr Přikryl <nic...@gmail.com> wrote:
In our apps we have a lot of database objects which are hard to manage in classic Django migrations. Next, we clean our migrations time to time to speed up deployment process. And it would be awesome if Django would have system for raw SQL "models" and handle migrations and dependencies automatically in makemigrations and migrate commands like django-migrate-sql-deux https://pypi.org/project/django-migrate-sql-deux/. What do you think?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c078a0ae-9643-4d5c-b1de-f82cec2a7f33%40googlegroups.com.


--
Adam

Petr Přikryl

unread,
Mar 24, 2020, 11:15:48 AM3/24/20
to Django developers (Contributions to Django itself)
Hi Adam,
thank you for your reply.

We usually have few indices, functions and triggers. But the most used database object is view. We used them for data synchronizing from some third party databases. These databases have complex schema which we want simplify. So we are building low-level DB API via views. Then we create Django models for these views. Then is easy to use ORM for data access or sync operations. 

I can show you some example from our code for one third party database:

sql_items = [
    datetime_function_sql_item,
    is_seo_valid_function_sql_item,
    quality_view_sql_item,
    article_category_view_sql_item,
    base_article_view_sql_item,
    article_view_sql_item,
    article_b2b_view_sql_item,
    master_article_view_sql_item,
    reclamation_mat_view_sql_item,
    purchase_mat_view_sql_item,
    purchase_id_index_sql_item,
    purchase_date_of_purchase_index_sql_item,
    sale_mat_view_sql_item,
    sale_id_index_sql_item,
    sale_date_of_sale_index_sql_item,
    incentive_sale_mat_view_sql_item,
    incentive_item_mat_view_sql_item,
    product_view_sql_item,
    color_master_view_sql_item,
    product_special_offers_view_sql_item,
    series_view_sql_item,
    stock_products_view_sql_item,
    pl_master_product_view_sql_item,
    pl_master_product_quality_view_sql_item,
    pl_product_view_sql_item,
    pl_product_variant_view_sql_item,
    pl_stock_products_sold_view_sql_item,
]

#
#  module containing `datetime_function_sql_item` and `
is_seo_valid_function_sql_item` follows
#

from migrate_sql.config import SQLItem

"""
This third party app stores datetime as DATE column and INT column with seconds passed that day.
This function converts these columns into PostgreSQL datetime type.
"""
datetime_function_sql_item = SQLItem(
    name='datetime',
    sql='''
        CREATE OR REPLACE FUNCTION datetime(day date, seconds numeric) returns timestamp without time zone
        LANGUAGE plpgsql AS $$
          BEGIN
            RETURN day + (floor(seconds / 3600) || ':' || floor(seconds % 3600 / 60) || ':' || seconds % 60)::TIME;
          END;
        $$;
    ''',
    reverse_sql='DROP FUNCTION IF EXISTS datetime(day date, seconds numeric)',
    replace=True,
)


is_seo_valid_function_sql_item = SQLItem(
    name='is_seo_valid',
    sql='''
        CREATE OR REPLACE FUNCTION is_seo_valid(seo text) returns boolean
        LANGUAGE plpgsql AS $$
          BEGIN
            RETURN seo ~ '^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$';
          END;
        $$;
    ''',
    reverse_sql='DROP FUNCTION IF EXISTS is_seo_valid(seo text)',
    replace=True,
)

#
#  module containing `article_view_sql_item`
follows
#

article_view_sql_item = SQLItem(
    name=Article._meta.db_table,
    sql='''
        CREATE OR REPLACE VIEW {view_name} AS
        SELECT
          sq.*,
          CASE
            WHEN sq.buying_price <> 0 THEN round((sq.price / sq.buying_price - 1) * 100, 2)
            ELSE 0
          END                                                                             AS margin,
          GREATEST(
            LEAST(
              NULLIF(sq.price_for_installment_calculation, 0),  -- zero turns off this price
              sq.eshop_price
            ),
            0
          )                                                                               AS installment_price
        FROM (
          SELECT
            DISTINCT ON (a.id)
            a.id                      AS id,
            a.name                    AS "name",
            a.code                    AS code,
            COALESCE(a.master, FALSE) AS master,
            a.created                 AS created,
            a.fk_id_article_producer  AS fk_id_article_producer,
            a.fk_id_master_product    AS fk_id_master_product,
            a.fk_id_color_rgb         AS fk_id_color_rgb,
            a.fk_id_vat               AS fk_id_vat,
            a.cf_article_name         AS cf_article_name,
            a.cf_article_short_name   AS cf_article_short_name,
            a.cf_seo_name             AS cf_seo_name,
            a.cf_article_type         AS cf_article_type,
            q.cf_article_quality      AS cf_article_quality,
            COALESCE(a.active2, FALSE)                                                    AS active2,
            COALESCE(a.cf_www_visible, FALSE)                                             AS cf_www_visible,
            COALESCE(a.cf_bo_for_sell, FALSE)                                             AS cf_bo_for_sell,
            COALESCE(a.cf_buy_up_recommended, FALSE)                                      AS cf_buy_up_recommended,
            COALESCE(a.cf_clearance_sale, FALSE)                                          AS cf_clearance_sale,
            COALESCE(a.cf_article_short_name, a.name)                                     AS short_name,
            COALESCE(a.cf_article_name, a.name)                                           AS long_name,
            COALESCE(q.uniform_id, 1)                                                     AS quality_uniform_id,
            COALESCE(repurchase_prices.repurchase_price, 0)                               AS repurchase_price,
            COALESCE(
              CASE
                WHEN a.is_spare_part THEN spare_parts_buying_prices.buying_price
                WHEN q.is_new THEN buying_prices.buying_price
                ELSE repurchase_prices.repurchase_price
              END,
              0
            )                                                                             AS buying_price,
            COALESCE(pd_store.price_with_vat, 0)                                          AS price,
            COALESCE(pd_eshop.price_with_vat, pd_store.price_with_vat, 0)                 AS eshop_price,
            COALESCE(pd_common.price_with_vat, 0)                                         AS common_price,
            COALESCE(pd_installment.price_with_vat, 0)                                    AS price_for_installment_calculation,
            q.is_vat_version                                                              AS is_vat_version,
            a_vat_0.id                                                                    AS fk_id_article_zero_vat_version,
            CASE WHEN q.is_new THEN COALESCE(a.cf_bo_cheque, 0) ELSE 0 END                AS cheque_value,
            CASE WHEN q.is_new THEN COALESCE(a.cf_cheque_explicit, FALSE) ELSE FALSE END  AS gold_cheque
          FROM {base_article_view} a
            LEFT JOIN article_variant av ON av.fk_id_article = a.id
            LEFT JOIN {quality_view} q ON q.cf_article_quality = COALESCE(a.cf_article_quality, '{new_quality_code}')
            LEFT JOIN
              (
                SELECT
                  DISTINCT ON (c.fk_id_article_variant)
                  c.fk_id_article_variant       AS fk_id_article_variant,
                  round(bp.price::NUMERIC, 2)   AS repurchase_price
                FROM price bp
                  JOIN conditions c ON bp.fk_id_conditions = c.id
                  JOIN chain ch ON c.fk_id_chain = ch.id
                WHERE ch.code='some-id' AND bp.valid_from <= now()
                ORDER BY c.fk_id_article_variant, bp.valid_from DESC
              ) AS repurchase_prices ON repurchase_prices.fk_id_article_variant = av.id
            LEFT JOIN
              (
                SELECT
                  DISTINCT ON (asi.fk_id_article_variant)
                  asi.fk_id_article_variant                               AS fk_id_article_variant,
                  ceil(asi.last_base_buying_price::NUMERIC * 1.21)        AS buying_price
                FROM article_store_info asi
                  JOIN store s on asi.fk_id_store = s.id
                WHERE s.code in ('some-id', 'some-id')
                ORDER BY asi.fk_id_article_variant, asi.date2 DESC, s.code
              ) AS buying_prices ON buying_prices.fk_id_article_variant = av.id
            LEFT JOIN
              (
                SELECT
                  DISTINCT ON (asi.fk_id_article_variant)
                  asi.fk_id_article_variant                               AS fk_id_article_variant,
                  ceil(asi.last_base_buying_price::NUMERIC * 1.21)        AS buying_price
                FROM article_store_info asi
                  JOIN store s on asi.fk_id_store = s.id
                WHERE s.code = 'some-id'
                ORDER BY asi.fk_id_article_variant, asi.date2 DESC
              ) AS spare_parts_buying_prices ON spare_parts_buying_prices.fk_id_article_variant = av.id
            LEFT JOIN price pd_eshop ON
              pd_eshop.fk_id_article_variant = av.id AND
              pd_eshop.fk_id_chain = 'some-id' AND
              pd_eshop.fk_id_price_zone = 'some-id' AND
              pd_eshop.valid_from < extract(EPOCH FROM now()) * 1000 AND
              pd_eshop.valid_to > extract(EPOCH FROM now()) * 1000
            LEFT JOIN price pd_store ON
              pd_store.fk_id_article_variant = av.id AND
              pd_store.fk_id_chain = 'some-id' AND
              pd_store.fk_id_price_zone = 'some-id' AND
              pd_store.valid_from < extract(EPOCH FROM now()) * 1000 AND
              pd_store.valid_to > extract(EPOCH FROM now()) * 1000
            LEFT JOIN price pd_common ON
              pd_common.fk_id_article_variant = av.id AND
              pd_common.fk_id_chain = 'some-id' AND
              pd_common.fk_id_price_zone = 'some-id' AND
              pd_common.valid_from < extract(EPOCH FROM now()) * 1000 AND
              pd_common.valid_to > extract(EPOCH FROM now()) * 1000
            LEFT JOIN price pd_installment ON   -- used for Cetelem payments
              pd_installment.fk_id_article_variant = av.id AND
              pd_installment.fk_id_chain = 'some-id' AND
              pd_installment.fk_id_price_zone = 'some-id' AND
              pd_installment.valid_from < extract(EPOCH FROM now()) * 1000 AND
              pd_installment.valid_to > extract(EPOCH FROM now()) * 1000
            LEFT JOIN {quality_view} q_vat_0
              ON q_vat_0.uniform_id = q.uniform_id AND NOT q_vat_0.is_vat_version AND q.is_vat_version
            LEFT JOIN article a_vat_0  -- #26502
              ON a_vat_0.id <> a.id
                AND a_vat_0.fk_id_vat = 'some-id'  -- zero VAT
                AND a.fk_id_vat = 'some-id'  -- 21 VAT
                AND a.fk_id_master_product = a_vat_0.fk_id_master_product
                AND a.fk_id_color_rgb = a_vat_0.fk_id_color_rgb
                AND q_vat_0.cf_article_quality = COALESCE(a_vat_0.cf_article_quality, '{new_quality_code}')
          WHERE
            a.fk_id_vat IS NOT NULL
            AND a.fk_id_article_producer IS NOT NULL
            AND (
              a.is_spare_part  -- spare part exception #27658
              OR a.master  -- masters exception (do not have cf_bo_for_sell, cf_www_visible and active2 filled)
              OR (
                NOT a.is_spare_part  -- article is not spare part (must have active2)
                AND a.active2 IS TRUE
              )
            )
          ORDER BY
            a.id,
            q.cf_article_quality,
            a_vat_0.id
        ) sq
    '''.format(
        view_name=Article._meta.db_table,
        base_article_view=base_article_view_sql_item.name,
        quality_view=quality_view_sql_item.name,
        new_quality_code=NEW_QUALITY_CODE,
    ),
    reverse_sql='DROP VIEW IF EXISTS {view_name}'.format(view_name=Article._meta.db_table),
    replace=True,
    dependencies=(
        ('my_django_app', quality_view_sql_item.name),
        ('my_django_app', base_article_view_sql_item.name),
    ),
)


You can see that SQL can be very complex. And if it is changing very often than is easier to manage it in one place with definition instead of volatile migrations. In your case you must go through all your migrations to find the final definition of some object. And with so many objects it could be complex. Next if we remove (clean) migrations in our project than we don't have easy way how to re-generate fake-able intial migrations from our code base. I know that cleaning migrations is a edge case but I think it would be nice if you could re-generate the whole schema also with low level SQL objects into migrations.

Next I think that some simple views could be generated from Django ORM instead of writing full SQL. But I am afraid of that ORM can't cover all situations for example sophisticated SQL functions. Because the database differences are too big. So from my perspective it is fine to have high abstraction ORM classes (Model, Index) but it would be even better to have low abstraction classes for functions, triggers, views and indices. Than you could easily manage the whole application database layer using same programming API in one place with final Python/SQL definition.

SQLItem objects could have definitions for multiple backends. So the re-usable Django apps could prepare custom SQLItems for all supported vendor backends.

Thanks,
Petr


Dne středa 11. března 2020 11:20:56 UTC+1 Adam Johnson napsal(a):
Hi Petr

I too often end up managing some database objects like triggers, normally in order to support database migrations. I have always been happy using RawSQL migration operations though.

What types of database objects are you mostly using? And how?

I think django-migrate-sql is a neat idea, but since it requires full authorship of the SQL it doesn't provide the most ORM-like experience. It's also not backend agnostic, which is something preferable for anything we add to Django. I'd be more interested in implementing classes that represent the underlying object type, like Models represent tables, and translating changes to them into migration operations.

Thanks,

Adam

On Wed, 11 Mar 2020 at 09:04, Petr Přikryl <nic...@gmail.com> wrote:
In our apps we have a lot of database objects which are hard to manage in classic Django migrations. Next, we clean our migrations time to time to speed up deployment process. And it would be awesome if Django would have system for raw SQL "models" and handle migrations and dependencies automatically in makemigrations and migrate commands like django-migrate-sql-deux https://pypi.org/project/django-migrate-sql-deux/. What do you think?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam

schinckel

unread,
Mar 24, 2020, 11:49:42 PM3/24/20
to Django developers (Contributions to Django itself)


On Wednesday, March 25, 2020 at 1:45:48 AM UTC+10:30, Petr Přikryl wrote:
Hi Adam,
thank you for your reply.

We usually have few indices, functions and triggers. But the most used database object is view. We used them for data synchronizing from some third party databases. These databases have complex schema which we want simplify. So we are building low-level DB API via views. Then we create Django models for these views. Then is easy to use ORM for data access or sync operations. 


Hi Petr,

I too have a bunch of database Raw SQL. I came up with a mechanism for doing this that allows for/generates numbered versions of each file.


There's no way to hook in to the migrations framework to get this to happen during `makemigrations`, but I have used the checks framework to examine the project for any files that appear to be out of date, and moan about those, as well as a custom management command that generates the migrations for any that have changed, as well as the versions.

There's also a command that copies the current version of the file to the "newest" migration which is useful for development.

I'll try to publish the actual code that does it soon.

Matt.

Petr Přikryl

unread,
Mar 25, 2020, 5:57:57 AM3/25/20
to Django developers (Contributions to Django itself)
Hi Matt,
It looks nice and the idea is very similar to https://github.com/festicket/django-migrate-sql. Problems you are noticing are same as my. Easy tracking Git changes, code reviews, one place where the final definitions live (like models). It would be nice if Django can do that.

Petr


Dne středa 25. března 2020 4:49:42 UTC+1 schinckel napsal(a):
Reply all
Reply to author
Forward
0 new messages