Help wanted testing proposal for the Migration Graph algorithm

閲覧: 255 回
最初の未読メッセージにスキップ

Carlton Gibson

未読、
2018/07/26 9:45:452018/07/26
To: Django developers (Contributions to Django itself)
Hi all.

Do you have a project with a slow and complicated migration history?

If so, any chance you could lend a brief hand testing?


This refactors the migration graph algorithm and in principle should make it significantly faster. 
(More or less linear vs currently exponential.)

Two questions: 

1. Is it correct with very complicated histories?
2. Is it a lot quicker? 

To answer this could you profile the following script on your complicated project against the PR vs against master
(or any recent version of Django really)?

Are there any errors with the PR that you don't see otherwise? 
What's the speedup, if any?

Thanks for your help! 

Kind Regards,

Carlton



The scipt: 
=======

All it does is calculate the full forwards and backwards migration plan for 
each app in your project. It doesn't apply anything. 

Dev environment should be fine. (It's the migrations, not the applied state that's in play.) 

As ever, please read it before running it. You can do anything similar if you want.

I used IPython from `django-admin shell`: 

%timeit %run ./migration_graph_timing.py 

If that doesn't work for you and you want me to make it actually run as a stand-alone, let me know. 

----
import sys

from django.db import connection
from django.db.migrations.loader import MigrationLoader

loader = MigrationLoader(connection)

backwards = loader.graph.root_nodes()
forwards = loader.graph.leaf_nodes()

print('Calculating backward plans:')
for root in backwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    #print(loader.graph.backwards_plan(root))
sys.stdout.write('\n')
print('Calculating forward plans:')
for leaf in forwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    # print(loader.graph.forwards_plan(leaf))

sys.stdout.write('\nRun Done\n\n')
----

Jeffrey Yancey

未読、
2018/08/14 10:55:522018/08/14
To: Django developers (Contributions to Django itself)
Hi Carlton,

I think I have a reasonable candidate for this and will try to find some time to do the comparison this week. I'll post results on the PR, or here if you prefer, upon completion.

Jeff

Carlton Gibson

未読、
2018/08/14 11:39:532018/08/14
To: Django developers (Contributions to Django itself)
Thanks Jeff. Super.

(PR is fine 🙂)

Carlton Gibson

未読、
2018/08/20 5:54:292018/08/20
To: Django developers (Contributions to Django itself)
Making the script less noisy here I commented out the actual work.
(So it didn't function as a test.) 
(No comment. 😐)

Corrected version:

```
import sys

from django.db import connection
from django.db.migrations.loader import MigrationLoader

loader = MigrationLoader(connection)

backwards = loader.graph.root_nodes()
forwards = loader.graph.leaf_nodes()

print('Calculating backward plans:')
for root in backwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    plan = loader.graph.backwards_plan(root)
    #print(plan)
sys.stdout.write('\n')
print('Calculating forward plans:')
for leaf in forwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    plan = loader.graph.forwards_plan(leaf)
    # print(plan)

```

d...@thread.com

未読、
2018/08/28 11:28:322018/08/28
To: Django developers (Contributions to Django itself)
Hi Carlton,

Adam asked me to take a look at this. I've run it on our codebase with ~1100 migrations and ~380 apps.

Running on Django 2.1, the results of this script (the updated version) are:

> 1min 23s ± 1.1 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


Running on the branch, installed with [1], the results are:

> 1min 20s ± 524 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There were no exceptions thrown - the script completed cleanly, although I haven't actually migrated with it or checked that the migrations work. I assume the script is enough to ensure consistent migrations? Around a 3.5% speedup.

I hope this is what you were looking for, apologies if I've got something wrong - I'm lacking full context on this. Hope this helps!


Dan


Carlton Gibson

未読、
2018/08/30 4:55:042018/08/30
To: Django developers (Contributions to Django itself)
Hi Dan. 

Super! 


On Tuesday, 28 August 2018 17:28:32 UTC+2, d...@thread.com wrote:
I've run it on our codebase with ~1100 migrations and ~380 apps.

Yes! This is what I was looking for. 
 
There were no exceptions thrown - the script completed cleanly, although I haven't actually migrated with it or checked that the migrations work. I assume the script is enough to ensure consistent migrations?

Perfect. The changes just relate to how the graph of migrations is built. This feeds into forward_plan, backwards_plan etc. (The individual migrations remain the same; theres no real need to migrate.) 

That the script executes without error on a large project is more or less what we want to see here. (There was additional input on the PR confirming the same, so that's good.) 

There's a second version of the script that writes the generated plans to a file. If you were to run this with v2.1 or master and the PR branch, you should find the generated plans to be same (i.e. the files to be identical). I'll paste this script at the bottom. If you have time to run it, as a extra check, that would be cool.

Around a 3.5% speedup.

OK. Perfect.

It looks like the kind of cycles that would lead to worst case performance don't really come up. (Most migrations just have a single dependency on the previous one, and that's it.) 
As such the performance boost of this patch is not massive. (Still worth having, other things being equal.)

We would expect slightly better improvement in real-life, since the script here recreates the loader (which regenerates the graph) each time through the loop, whereas `migrate` etc re-use the loader (which in the patch, vs master, only does the graph validation a single time).

However, still, performance will not be the key issue. 

Currently, with the feedback, my thinking is that the patch is a good refactoring (that will likely lead to further improvements). So I'm keen.
I just need to go over the code and tests again. 

Second version of script to follow. 

Thanks for the input. It's really helpful! 

Kind Regards,

Carlton



# Save as e.g. migration_graph.py
# Set DJANGO_SETTINGS_MODULE
# From project dir run: python migration_graph.py
# compare output from v2.1 or higher and PR. 
#
import sys

import django
django.setup()

from django.db import connection
from django.db.migrations.loader import MigrationLoader

f = open('plans-{}'.format(django.__version__), 'w')
print('Writing plans for v{}'.format(django.__version__))

loader = MigrationLoader(connection)
backwards = loader.graph.root_nodes()
forwards = loader.graph.leaf_nodes()

print('Calculating backwards plans:')
for root in backwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    plan = loader.graph.backwards_plan(root)
    f.write(str(plan))
    f.write('\n')

f.write('\n' * 3)
sys.stdout.write('\n')

print('Calculating forwards plans:')
for leaf in forwards:
    loader = MigrationLoader(connection)
    sys.stdout.write('.')
    plan = loader.graph.forwards_plan(leaf)
    f.write(str(plan))
    f.write('\n')

f.close()
全員に返信
投稿者に返信
転送
新着メール 0 件