critical problem - race condition in "check is in db" / "insert in db" (IS_NOT_IN_DB)

72 views
Skip to first unread message

Ivan Velev

unread,
May 6, 2008, 2:23:49 AM5/6/08
to web2py Web Framework
hi,

I have identified a race condition problem in the way IS_NOT_IN_DB is
impemented.

Source code:

class SQLFORM(FORM):
...

def accepts(self,vars,session=None,formname='%
(tablename)s',keepvalues=False):
...

ret=FORM.accepts(self,vars,session,formname,keepvalues)
if not ret: return ret
...

if vars.has_key('id'):
...
self.table._db(self.table.id==id).update(**fields)
else:
self.vars.id=self.table.insert(**fields)

Because of lack of any locking, it is possible for several parallel
requests to execute FORM.accepts at the same time, they all will be
accepted and later inserted in database, leaving the db in
inconsistent state (more than one record with the same name).

I was able to exploit this race condition and create two identical
user records in SQLDB (In test environment, I had to add a small
time.sleep(), but in real world such problems DO happen - for example
when server load is higher). Similar problem may exist in other places
where check + modification is done.

You can find some background info about race conditions here:
http://en.wikipedia.org/wiki/Race_condition#Computing

This is a critical problem that need to be addressed before web2py is
can be used in production quality applications. (Unfortunatelly I am
not familiar with web2py codebase, so I cannot help much ... yet)

Massimo Di Pierro

unread,
May 6, 2008, 9:46:58 AM5/6/08
to web...@googlegroups.com
Yes this race condition exists but

1) it is responsibility of the database to guarantee data integrity,
The validation process is there only to try notify the user that
something is wrong. The validation cannot check all possible cases
before the record goes in the database without locking the database
thus having no concurrency. To do database validation just set the
field attribute to "unique" along with IS_NOT_IN_DB().

2) since accepts(...,session,...) prevents double form submission
this is very unlikely to happen anyway. A lock of any kind would slow
down everything in order to take care of a very rare event.

3) Nothing prevents the use from locking himself using (for example)
cache.locker.acquire() and cache.locker.release().

I do not believe IS_NOT_IN_DB() needs to be changed.

Massimo

Ivan Velev

unread,
May 10, 2008, 12:38:20 AM5/10/08
to web2py Web Framework
> To do database validation just set the  
> field attribute to "unique" along with IS_NOT_IN_DB().

Is this supposed to work with SQLDb ?

> 2) since accepts(...,session,...) prevents double form submission  
> this is very unlikely to happen anyway. A lock of any kind would slow  
> down everything in order to take care of a very rare event.

The problem is not related to one browser submiting one form several
times - real world applications are used by many users and framework/
application need to be designed to handle all concurency issues.

What in your opinion is the correct application behaviour in this
scenario:
- user 1 fills registration form with nick "Massimo" and submits it
- at the same timeuser 2 fills registration form with nick "Massimo"
and submits it

> 3) Nothing prevents the use from locking himself using (for example)  
> cache.locker.acquire() and cache.locker.release().

My point is that this should be part of the framework "check-insert"
worklow.

It does not make sense to have to wrap each call to form.accepts()
manually.

cache.locker.acquire()
res = form.accepts()
cache.locker.release()
if res:
....

Ivan

On May 6, 4:46 pm, Massimo Di Pierro <mdipie...@cs.depaul.edu> wrote:
> Yes this race condition exists but
>
> 1) it is responsibility of the database to guarantee data integrity,  
> The validation process is there only to try notify the user that  
> something is wrong. The validation cannot check all possible cases  
> before the record goes in the database without locking the database  
> thus having no concurrency. To do database validation just set the  
> field attribute to "unique" along with IS_NOT_IN_DB().
>
> 2) since accepts(...,session,...) prevents double form submission  
> this is very unlikely to happen anyway. A lock of any kind would slow  
> down everything in order to take care of a very rare event.
>
> 3) Nothing prevents the use from locking himself using (for example)  
> cache.locker.acquire() and cache.locker.release().
>
> I do not believe IS_NOT_IN_DB() needs to be changed.
>
> Massimo
>
> On May 6, 2008, at 1:23 AM, Ivan Velev wrote:
>
>
>
> > hi,
>
> > I have identified a race conditionproblemin the way IS_NOT_IN_DB is
> > impemented.
>
> > Source code:
>
> > class SQLFORM(FORM):
> >   ...
>
> >   def accepts(self,vars,session=None,formname='%
> > (tablename)s',keepvalues=False):
> >       ...
>
> >       ret=FORM.accepts(self,vars,session,formname,keepvalues)
> >       if not ret: return ret
> >       ...
>
> >       if vars.has_key('id'):
> >           ...
> >           self.table._db(self.table.id==id).update(**fields)
> >       else:
> >           self.vars.id=self.table.insert(**fields)
>
> > Because of lack of any locking, it is possible for several parallel
> > requests to execute FORM.accepts at the same time, they all will be
> > accepted and later inserted in database, leaving the db in
> > inconsistent state (more than one record with the same name).
>
> > I was able to exploit this race condition and create two identical
> > user records in SQLDB (In test environment, I had to add a small
> > time.sleep(), but in real world such problems DO happen - for example
> > when server load is higher). Similarproblemmay exist in other places
> > where check + modification is done.
>
> > You can find some background info about race conditions here:
> >http://en.wikipedia.org/wiki/Race_condition#Computing
>
> > This is acriticalproblemthat need to be addressed before web2py is

carlo

unread,
May 10, 2008, 3:06:38 AM5/10/08
to web2py Web Framework
I think the key is in the first Massimo's reply: it is responsibility
of the database to guarantee data integrity. And I will add it is up
to the database designer to set the right primary (and unique) keys.
If a database engine was not designed to take care of high concurrency
situations there is nothing a framework can do.

carlo

Ivan Velev

unread,
May 10, 2008, 11:12:49 PM5/10/08
to web2py Web Framework
> it is responsibility > of the database to guarantee data integrity.

I am not talking about database integirty, I am talking about form
validation.

What in your opinion is the expected behaviour/semantic of this
framework call ?

form.accept()

Ivan

mdipierro

unread,
May 10, 2008, 11:27:28 PM5/10/08
to web2py Web Framework
The function accepts has to perform two distinct operations:

1) validate the input
2) eventually insert/update in the database

if you require IS_IN_DB or IS_NOT_IN_DB, 1) and 2) will require two
distinct DB IO operations. As I explained before it is not
conceptually possible to lock them at the framework level without
serializing all database access. This is no more a problem in web2py
that it is in any other web framework.

I think that serializing all database access is the worse solution to
this problem because concurrency is the reason people use a client/
server relational database.

In the current implementation, if you use the unique attribute, the
worst case scenario is that once in a million requests (or less) a
form submission passes validation but it is rejected by the database
(raise OperationalError). If you do not catch the exception the user
gets a ticket. If you catch the exception you can notify the user to
go back and try submit again. There will be no data corruption anyway.
If the user goes back and submits the form again, he will get the
correct error message.

I do not think there is a problem here.

Massimo

mdipierro

unread,
May 10, 2008, 11:31:57 PM5/10/08
to web2py Web Framework
BTW . Web2py has transactions automatically on. This issue is normally
unrelated to transactions but it can be resolved asking the database
to serialize transactions. This would be almost equivalent to what
Ivan suggests, probably more efficient because done directly by the
DB, and would work in a multi-process environment. Yet I would not
suggest it because I think it is unnecessary.

Massimo

Ivan Velev

unread,
May 11, 2008, 12:19:51 AM5/11/08
to web2py Web Framework
What you are saying is that:

1. form.accept() semantic has 4 different exit paths which should be
handled by developer:
- accepts returns true
- accepts returns false and there are form.errors
- accepts returns false and there are no form.errors
- accepts throws exception

2. you recognize that web2py framework does have a typical race
condition problem in it's core and you find this perfectly acceptable.


I do not agree with both points:
My problem with 1. - There are two possible exit paths for duplicate
value (Error message "value already in database!" vs Exception). IMO
framework should have an unified way to report such problems - some
possible options for web2py:
- try {} catch db.insert exception and return the standard error
message "value already in database!"
- do not make extra check for uniques (after all it does not work
100%) and rely entirely on database

My problem with 2. - race condition is a ** real world problem **.
Some smart guys have done pretty nasty stuff by exploiting race
conditions. I agree that for the usual home-grown application it's not
an issue, but I cannot accept such problem in bussiness application.
And I'd definitely not accept if some guy from my team says: "Yeah, I
know that this code has race condition problem, but I'll just leave it
there" :)

Anyway, I am just trying to help, it's up to you what will be
introduced in codebase.

What I do know is that if I start using web2py for some real projects,
I'd have to design a workaround for this problem (if it is still not
resolved in the framework itself)

Ivan Velev

unread,
May 11, 2008, 12:25:01 AM5/11/08
to web2py Web Framework
> Web2py has transactions automatically on.

Does they work with SQLDb out-of-the-box ?

> it is responsibility of the database to guarantee data integrity.

Again - does this work in SQLDb ?

i.e. does following statement

db.user.email.requires=[IS_NOT_EMPTY(),IS_NOT_IN_DB(db,'user.email')]

guarantee that there won't be duplicate records in SQLDb ?

mdipierro

unread,
May 11, 2008, 12:34:35 AM5/11/08
to web2py Web Framework


On May 10, 11:25 pm, Ivan Velev <sir.hey...@gmail.com> wrote:
> > Web2py has transactions automatically on.
>
> Does they work with SQLDb out-of-the-box ?

YES, commit on return or rollback if an exception it raised and not
caught.

> Again - does this work in SQLDb ?
> i.e. does following statement
> db.user.email.requires=[IS_NOT_EMPTY(),IS_NOT_IN_DB(db,'user.email')]
> guarantee that there won't be duplicate records in SQLDb ?

NO that just does validation but the following YES will do what you
ask

SQLField('email',unique=True)

carlo

unread,
May 11, 2008, 9:09:48 AM5/11/08
to web2py Web Framework
just adding that SQLite is definetely not a client/server engine for
business applications.

carlo

carlo

unread,
May 11, 2008, 9:20:55 AM5/11/08
to web2py Web Framework
you mean that validation should be resposible for checking unique
keys? And what about foreign keys? Do you think frameworks should be
also in charge of checking consistency? I would be curious to know how
other frameworks solved race conditions..

carlo

Massimo Di Pierro

unread,
May 11, 2008, 8:19:17 PM5/11/08
to web...@googlegroups.com
Hi Ivan,

validation checks and database integrity checks and not the same
thing. Not all input comes from forms and form validation is not
always the same (for example forms may be validated in a user
dependent basis). Thus collapsing form validation with database
integrity checks is not a good idea because it would not increase but
decrease functionality. I agree it would be nice to have a way to
capture database errors and explain them to the user but this is not
technically possible because: 1) the database API only raises
OperationalError and there is no way for web2py to know if this is a
database integrity problem or not; 2) the database insert may not be
performed by a form.accepts.

Let me reiterate that this is not a problem, this how all frameworks
handle it and it cannot cause data integrity problems. The worst case
scenario is that the user gets a ticket (and the event is so rare
that you should not worry about it, it is going to be less common
that tickets caused by network problems).

You can also do

try: r=form.accepts(request.vars)
except OperationalError:
form.errors['fieldname']='database problem, perhaps this
value is already in db'
r=False

which is very close to what you asked.

Massimo

PS. Sorry my emails are out of order but I have been out of town with
spotty connection.

Ivan Velev

unread,
May 12, 2008, 12:34:18 AM5/12/08
to web2py Web Framework

> you mean that validation should be resposible for
> checking unique keys?

No.

What I wanted to say is that at the moment (quote)

"output and/or result of form.accepts() is unexpectedly and critically
dependent on the sequence or timing of other events"

Ivan

Massimo Di Pierro

unread,
May 12, 2008, 12:39:21 AM5/12/08
to web...@googlegroups.com
I disagree. If you use both IS_IN_DB and unique=True there no
unexpected behavior only a small probability that accepts raises an
exception (which you can catch) iff database changes between
validation and insert and the form that passed validation is not
unique anymore.

Please show me any framework that does not have this problem and does
not serializes database access.

Massimo

Ivan Velev

unread,
May 12, 2008, 12:51:08 AM5/12/08
to web2py Web Framework
> validation checks and database integrity checks and not the same  
> thing.

Agree. My original problem is related mostly to validation (and the
way it was done by form.accepts())

> 2) the database insert may not be  
> performed by a form.accepts.

My main problem is with the current implementation of form.accepts().

What I am saying is that current implementation of form.accepts()
(which combines form validation & insertion in db) does have clear
semantic.

It's trivial to fix it - all you need to do is to add some locking
around the two SQLs (SELECT / INSERT) and it's gone :)

> scenario is that the user gets a ticket (and the event is so rare  
> that you should not worry about it, it is going to be less common  
> that tickets caused by network problems).

Tickets are for extra-ordinary situations - during normal usage,
system should not generate tickets. And yes - I know that I can
prevent such kinds of problems form happening using custom code
outside of the framework - my point was that this better be solved
inside the framework.

I think that we can consider this discussion closed now (everybody has
shared his opinion and knows what others are thinking).

Peace :)
Ivan
P.S. Problems that happen "so rare" when one person is browsing dev
version of the app tend to become real problems when you have a couple
of thousand users are using the production system at the same time :)


Ivan Velev

unread,
May 12, 2008, 12:53:43 AM5/12/08
to web2py Web Framework
> I disagree.

ok, I won't post any more objections.

> Please show me any framework that does not have this problem and does  
> not serializes database access.

I will put this in my to do list ... I will let you know once I find
(or do not find) something.
Reply all
Reply to author
Forward
0 new messages