What you've described is documented behavior for PostgreSQL (try
googling for "commands ignored until end of transaction block") --
Postgres does not alow *any* commands in a transaction after a
database error; you have to commit or rollback first.
Jacob
--
Collin Grady
That documentation says it is similar to auto-commit. It doesn't say it
actually is auto-commit. There are explicit transaction commits in the
code (grep for commit_unless_managed()).
> This is what I see from MySQL InnoDB.
>
> My problem is that I can't get this auto-commit behavior from
> PostgreSQL with Django - either in the default mode or even when I
> explicitly use TransactionMiddleware with the @transaction.autocommit
> decorator or set DISABLE_TRANSACTION_MANAGEMENT to True.
>
>
> My suggestion is that changing the psycopg isolation level to zero (as
> per ticket 3460) will give true auto-commit behaviour for PostgreSQL,
> since otherwise psycopg wraps the Django SQL in a transaction block.
Whilst it's true that this would give auto-commit behaviour, there's a
valid alternate side to this that doesn't seem to be addressed: whether
that's actually a good idea or not.
One reason not to do this is that it simply isn't standard behaviour for
Python database adaptors (they must be in non-autocommit mode
initially). So there's a principle of least-surprise thing going on.
As I mentioned to Collin at the code sprint in Portland, I think it'd be
a good idea to make sure we expose the ability to turn on auto-commit,
but I don't really like making it the default. In any case, providing
the ability that's can be controlled via, for example, a setting is
certainly the first step here. That's pretty independent of the whatever
the default might end up being. That's really the improvement needed to
#3460 at the moment -- separating adding functionality from changing the
default.
Regards,
Malcolm
I don't want to get into the argument about ticket 3460 itself but I
just don't get this paragraph... What is special in web applications
that makes them not require transactions? And more, how transactions --
a mechanism for preventing DB to end up in an inconsistent state -- can
get you into error cases?
FWIW, we're developing a typical web service[1] that, basically, lets
people to say "I'm going to this event". And we rely on transactions
very much because we heavily use denormalized data like keeping count of
people for an event. The only sane way of making sure that those stored
counts are sync'd to real counts of people in another table is to do
changes to them in a single transaction. It's just one example anyway...
When it comes to overhead... As far as I know PostgreSQL in autocommit
mode will wrap each statement (even SELECT) in an implicit transaction.
From
http://it.toolbox.com/blogs/database-soup/postgresql-application-performance-tips-part-1-13172:
> While you may think that you are not using transactions for singleton
> read-only SELECT statement, in fact every single statement in
> PostgreSQL is in a transaction. In the absence of an explicit
> transaction, the statement itself is an implicit transaction.
So what Django does now (and what DB API suggests) by starting and
explicit transaction up to the first save():
BEGIN;
SELECT ...
UPDATE ...
COMMIT;
is actually more efficient then equivalent in auto-commit mode:
implicit begin;
SELECT ...
implicit commit;
implicit begin;
UPDATE ...
implicit commit;
And the more statements you have the worse auto-commit is.
So if this ticket should be fixed then only for consistency with other
DBs, not for performance reasons. But I actually think that performance
is more important here. It's not hard to do explicit rollback anyway in
your view code if you're recovering from an exception.
> I suspect that this will significantly simplify the Django codebase
> vs.
> trying to simulate both auto-commit and transaction behaviors on top
> of a transactional connection - it would certainly remove most of the
> mechanisms in django/db/transaction.py which track state, dirty, etc.
It will also leave things vulnerable to having only some of your objects
updated, some of them deleted, some of them saved with no way to restore
to a known state. This proposed change is a bit more complicated than
the patches in #3460, since it's kind of a good thing to preserve
integrity of operations for things like saving, deleting and updating.
All of those involve more than one SQL operation in the general case.
As I mentioned previously, we're always going to need some transactional
management for stuff like that, so this is going to go a lot further
more easily when we have a patch that allows the behaviour to be
controlled. Then we can worry about what the default setting might be.
At the moment #3460 provides a good way to end up with inconsistent data
due to interactions with other transactions and an inability to roll
back to the start. The fact that none of the many patches on that ticket
even touch save/update/delete handling has to be an error.
Malcolm
I've done a quick non-scientific test. Here's the code:
c = psycopg2.connect(...)
c.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = c.cursor()
now = datetime.now()
for x in range(1000):
cursor.execute('select created from cicero_article where id
= %s', [x])
res = cursor.fetchall()
c.commit()
print datetime.now() - now
For the record, 'select' there does a fetch by primary key from a ~6500
record table (i.e. -- fast). The results of executing 1000 selects are:
ISOLATION_LEVEL_AUTOCOMMIT: ~0.29 sec
ISOLATION_LEVEL_READ_COMMITTED: ~0.26 sec
So the default level is indeed faster but I have to confess I'm a lousy
tester so I encourage anybody to conduct their own experiment.
Also this observation matches psycopg2 docs[1]:
> `ISOLATION_LEVEL_READ_COMMITTED` This is the default value. A new
> transaction is started at the first `.execute()` command on a cursor
> and at each new `.execute()` after a `.commit()` or a `.rollback()`.
Which conflicts with what 3460 states[2]:
> The difference between autocommit and read committed is that read
> committed in psycopg2 puts all statements inside a BEGIN/END block
> (or BEGIN/ROLLBACK or BEGIN/COMMIT).
Jack, can you comment?
[1]: http://www.initd.org/svn/psycopg/psycopg2/trunk/doc/extensions.rst
[2]: http://code.djangoproject.com/ticket/3460
> Django with #3460 is _significantly_ faster when the query load is
> high. If I ran stock Django on Chesspark, the database overhead would
> cripple the site.
I'd really like to see a kind of benchmark that I can try out to check your measurements.
A previous poster found your claims to be not justified.
I'm very curious were the overhead comes from. The database more or less
needs to use transactions internally anyway, even for a select, to give you
a consistent view for a select. It cannot know that you get only one (1)
result. So, is the overhead just from parsing the transaction statements?
Michael
--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company
Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689
I didn't see anywhere that Malcolm claimed a transaction will
magically make concurrency issues go away all on its own, and I'm a
bit put off by the fact that you're here:
1. Overloading the word "integrity" with multiple meanings and
choosing which one you want to use at a given moment, and
2. Doing a lot of dancing from one issue to another (one moment you're
talking about BEGIN/COMMIT causing overhead, the next you're arguing
about transaction isolation).
Can we figure out what this thread is about, and stick to it, please?
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
>
> > Can we figure out what this thread is about, and stick to it, please?
>
> As the person who originally started the thread ;-)
>
> I'd like to agree a design for a patch which will be accepted into
> Django trunk and will enable us to close the #3460 ticket.
There is no way to agree about the "#3460 ticket", because the ticket does
not really state what you want to solve, but how you want to change Django.
It's like going to the dentist and telling him to extract 5 left top (or
however this terminology works). Well ... in the role of the dentist, we're
asking you: Where does it really hurt?
The ticket and the following discussion clames that:
(a) django used the wrong transaction isolation level
(b) it's unefficient that there are so many BEGIN/COMMITs
(c) it is about that postgresql does not allow any actions after an
error before you finish the transaction
Now, please, pick only one pain, and try not to confuse us with too many
stuff at once. We can deal with the other teeth later.
I've snipped the explanation of serialization levels. That's known
stuff. It's also pretty much irrelevant to this discussion, but since
you insist, I've addressed why the current behaviour is also a valid
possibility, below.
You completely missed my point that if there's some integrity or other
database error in the sequence of multiple updates or deletes (and when
Django does updates to multiple tables or deletes, it extracts the id
values first), we can currently roll back to the start of the sequence
and people don't have half-update or half-deleted sequences of objects.
Straight auto-commit behaviour loses that unless there are a lot more
changes to the code than any patch I've seen currently.
> This means in a
> transaction like:
>
> BEGIN;
> SELECT username, password FROM auth_user WHERE id = 1;
> UPDATE auth_user SET username = 'foo', password = 'bar' WHERE id = 1;
> COMMIT;
>
> will do the wrong thing. Note that this is basically how Django does
> all its attribute updates.
You say "wrong", but relating that to how Django's save() works and
saying that therefore loading/saving is wrong is drawing an unwarranted
conclusion. Django's saving works exactly as intended with the current
code: all the models attribute values at the time of saving are stored
into the database. Sure, you'd like the option to only have the changed
values saved, which is something the rest of us are certainly interested
in implementing as well. Again, it probably won't be the default, since
it would change behaviour in ways that are almost impossible to detect
for thousands of users, but as an option it's not a bad idea. I have a
couple of different implementations of save-only-changes code floating
around that I've been thinking about (incorporating the various attempts
in the ticket addressed to this) and I'm sure something will land before
1.1
> Let's assume that the Django code that
> generated this was
>
> user.password = 'bar'
> user.save()
>
> The problem here is that the SELECT statement will read all
> attributes, and before UPDATE is executed, another transaction may
> have changed the username column. Now the UPDATE actually reverts
> such a change because Django does not keep track of which attributes
> have changed.
In other words, the object doing the second write wrote the values it
had. Once you accept that saving a model writes out all the attribute
values, this is pretty easy to understand. I agree we could document
this more clearly, but it's hardly mysterious behaviour (If any one says
"I assumed differently", that doesn't make it mysterious. It means that
making assumptions is usually a bad idea.)
> (Collin advocated very strongly for this patch in
> Portland, but it was also not included in 1.0).
If you look at a calendar, you shouldn't be at all surprised by this.
The feature freeze date was weeks before the Portland sprint. Again,
this is an option for alternative behaviour that you're talking about,
not a no-brainer bug in Django. Acknowledge that there's a second side
to this, please.
> So here Django
> actually causes data loss due to the race condition, even though
> transactions are used.
It's not data loss if the intended mode is that the save() call writes
out everything. That works correctly and reliably under the given
hypothesis.
> So please don't assume you're safe just because you are in a
> transaction.
As mentioned elsewhere in other messages in this thread, you're
overloading the term "safe" here and it's taking the thread way off
track.
All through this conversation you are continually failing to acknowledge
that there is alternate side to these technical issues, as with all
non-trivial changes. At no point is anybody saying that things like
auto-commit shouldn't be possible. But it's not "the wrong serialisation
level" or anything like that. It's one way of working that happens to be
different from what Django makes possible at the moment.
Every single non-trivial decision has both advantages and disadvantages.
Trade-offs are all over the place and to successfully work out what to
do requires a proper examination of both sides.
> READ COMMITTED mode clearly does not work as Django
> database developers imagined.
Really?! This is just getting silly. Your mind-reading qualification is
from which university? I cannot speak for any other developer (since I
sadly don't have such a such a qualification), but read committed works
exactly as I understand it -- no imagination involved, since it's been
pretty well documented for years in PostgreSQL. It can produce
non-repeatable reads but doesn't allow uncommitted reads.
You can foam all you like about how incompetent all the developers are
(and that's really what this is coming down to, since nothing at all
here is really new information), but that simply isn't the case. It's
simply unprofessional, as maintainers and contributors (i.e. anybody
contributing to any thread on django-dev) to go jumping around saying
"it's all wrong, we must change", when a bit of thought shows that that
isn't the case that it's all wrong. The current behaviour does make
sense in a number of situations. Alternate behaviour also makes sense
(and that's never been in doubt), so creating extra possibilities
without harming existing code -- which is why changing the default is a
separate issue and something I'm pretty strongly against -- requires
examining why and how the current behaviour does what it does and what
the consequences for everybody would be if that changed.
Regards,
Malcolm
>
> > You completely missed my point that if there's some integrity or other
> > database error in the sequence of multiple updates or deletes (and when
> > Django does updates to multiple tables or deletes, it extracts the id
> > values first), we can currently roll back to the start of the sequence
> > and people don't have half-update or half-deleted sequences of objects.
> > Straight auto-commit behaviour loses that unless there are a lot more
> > changes to the code than any patch I've seen currently.
>
> We're all agreed htat #3460 needs to support explicit DJango
> transactions. I never intended otherwise. I always thought I had
> added this to the patch, but apparently that didn't happen. I never
> advocated removing transactions from Django altogether. I only want
> things that aren't explicitly in a transaction to really not be in a
> transaction. Richard has taken up the flag here, and I'll let him run
> with it.
Yeah, but you propose to make autocommit mode the default mode,
i.e. an incompatible change that can affect existing applications in a very
subtle way that could be very hard for some users to find.
>
> > You say "wrong", but relating that to how Django's save() works and
> > saying that therefore loading/saving is wrong is drawing an unwarranted
> > conclusion.
>
> Django's saving can trample data unintentionally, and indeed, we've
> seen this in production. Trampling data is an extremely hard to find
> problem since it amounts to a race condition. Since this could be
> easily avoided in several different ways, I consider this
> implementation incorrect and therefor wrong. From an outside
> perspective it appears that Django's "mental" model o fa transactional
> database is a database in serialied mode. If put in serialized mode,
> Django doesn't trample data and is fine.
So, is ticket #3460 about the lost update problem, i.e. one user saves data
and another user overwrites it unintentionally? Or, is it about performance?
About transaction handling in general? About postgresql insisting to roll
back a transaction after an error?
If you want to solve the lost update problem, fine. Let's discuss it. But,
please, make up your mind what issue you want to solve. And then let's find
out how to solve it, and don't assume a priori that autocommit transactions
are necessarily the solution (which they clearly are *not* if applied alone,
and not even when save() updates only modified data).
We really need that you Richard agree on an issue before it makes sense to
continue this thread.
The benefit is having a known state of single objects in the database.
Not a great benefit, I agree - but it's not wrong, or even unexpected.
You tell a model to save itself. It does so.
Even if you perceive this as misbehavior, wrong, ill-written and
whatnot, keep it to yourself. From an objective point of view, it is an
implementation solving the problem with storage of objects in a
database. It might not do so in a way that accommodates your needs, nor
mine to be quite frank - but I accept this as an implementation.
It'd seem the discussion has taken a turn to semantics, and thus this
message.
I'd be amazed (and fairly disappointed) if two copies of the code in
transaction.py was required. I'm fairly sure it can all be done by
examining the current transaction handling state inside existing
methods. But give it a shot however you like. As usual with patches, no
guarantees apply beforehand as to whether they'll be committed or not
because we often/usually have to look at an implementation before
knowing if it's going to work long-term. However, if I was doing this
(and if no reasonable alternative pops up in the next couple of months I
probably will), I'd be trying very hard to avoid any code duplication at
all. I realise it sounds fuzzy to say "my intuition says", but it looks
like it should be possible to do checks for whether we're in auto-commit
mode or not and have all the existing calls just behave normally --
including allowing manual transaction starts regardless of the mode.
The implementation here will probably turn out to be easy. The design
will be the hard bit.
Regards,
Malcolm