newsessions

7 views
Skip to first unread message

SmileyChris

unread,
Mar 13, 2007, 3:21:27 PM3/13/07
to Django developers
Anton Khalikov opened a new ticket pointing out a hole in the current
session framework which can cause session id collisions.

He's put together a newsessions package which could be good, but
obviously needs some discussion (here). So, discuss away!

http://code.djangoproject.com/ticket/3716

James Bennett

unread,
Mar 13, 2007, 3:52:29 PM3/13/07
to django-d...@googlegroups.com
On 3/13/07, SmileyChris <smile...@gmail.com> wrote:
> Anton Khalikov opened a new ticket pointing out a hole in the current
> session framework which can cause session id collisions.

The proposed solution for collision (more on that in a moment) seems
good -- the bug seems to happen solely because we don't immediately
instantiate and save a Session, so that'd work around it.

I'm more wary of tying to an IP address, because I don't think that
really buys any security; there are still ways to spoof or set up
man-in-the-middle attacks, so the only real solution would be to tell
paranoid people to use encrypted connections (which they should be
doing anyway if they'd paranoid :)).

Though I have to say that I'm rather shocked by the ease of session
collision -- it requires the same pair of random numbers (out of a
range of sys.maxint) to come up, in the same order, within a fairly
short period of time, something that just doesn't seem likely to
happen a lot (given sys.maxint on my MacBook, such a collision should
happen roughly once for every 4.6 *quintillion* trials). I know the
'random' module isn't industrial-strength, but surely it's better than
that? And if it's not, then let's start salting it with more stuff.

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Michael Radziej

unread,
Mar 13, 2007, 4:04:44 PM3/13/07
to django-d...@googlegroups.com
Hi,

as far as I see, newsessions checks REMOTE_IP. Is this a good idea?

a) the client can sit behind a NAT that might hand out different IPs
b) the server can be behind NAT and not see the true IP at all. It
might see different IPs for the same client over time.
c) a crazy load balancer might get in the way

I wouldn't use REMOTE_IP.

In Ticket #1180, Anton said that users reported that they were able to
get someone else's session. I wonder if that might be related to
something quite different. Anton, was this by chance on a threaded
server with a python-mysql < 1.2.1p2?

Here's the (old) code that generates a session:

def get_new_session_key(self):
"Returns session key that isn't being used."
# The random module is seeded when this Apache child is created.
# Use person_id and SECRET_KEY as added salt.
while 1:
session_key = md5.new(str(random.randint(0, sys.maxint - 1))
+ str(random.randint(0, sys.maxint - 1)) + settings.SECRET_KEY).hexdigest()
try:
self.get(session_key=session_key)
except self.model.DoesNotExist:
break
return session_key

The chance that two consecutive session creations generate the same
session id is 1 : sys.maxint^2, on my system sys.maxint is 2^31. So
we're at 2^62. If the internal state of the random generator is big
enough, it might make sense to hash 4 randints instead of only 2. Or to
check for collisions at the same time as you save the session. But the
chances are good that Anton experiences a different problem.

Michael


James Bennett

unread,
Mar 13, 2007, 4:59:44 PM3/13/07
to django-d...@googlegroups.com
On 3/13/07, Michael Radziej <m...@noris.de> wrote:
> a) the client can sit behind a NAT that might hand out different IPs
> b) the server can be behind NAT and not see the true IP at all. It
> might see different IPs for the same client over time.
> c) a crazy load balancer might get in the way

(or any proxy, in which case REMOTE_IP will be the same for every
request and you'd want to be looking at the 'X-Forwarded-For' header
-- this is one of the problems with IP-based sessions)

Simon G.

unread,
Mar 13, 2007, 5:49:08 PM3/13/07
to Django developers
I actually passed this onto the security mailing address, as I thought
it was better to be safe than sorry. Adrian's response was that tying
things to the IP address is not a good idea (for the reasons that
others have stated in this thread).

One thing that I would like to suggest is that we do link the SID to a
user-agent (see Chris Shiflett's excellent article on this at [1]), as
this provides some protection (-via-obscurity) against these
collisions, since the attacker not only needs to get a valid SID, but
match it to the correct UA string. There's also no reason for a
"normal" user to change UA strings without needing to login again.

This can easily be added to the existing session framework by adding
it to the hashing

--Simon

[1] http://shiflett.org/articles/the-truth-about-sessions

David Danier

unread,
Mar 13, 2007, 6:50:12 PM3/13/07
to django-d...@googlegroups.com
SmileyChris wrote:
> Anton Khalikov opened a new ticket pointing out a hole in the current
> session framework which can cause session id collisions.

Could be easily fixed with providing Model._update() and Model._insert()
as proposed here:
http://groups.google.com/group/django-developers/browse_thread/thread/6ddfb7da8319c6df/c2d3671b6a6fec5f
abstract example code:
http://groups.google.com/group/django-developers/browse_thread/thread/6ddfb7da8319c6df/c2d3671b6a6fec5f#msg_56de2a0ef2b884cc
(First code block, additional to having less queries this would prevent
race conditions)

SessionManager could then calculate a session and try to save an object
with this session-key (allowing only INSERT, not UPDATE). If the key
exists the DB-backend will raise an Exception as the key is unique
(primary_key). If it is free, the INSERT will ensure that no other
session may use the key.

If I get this right the newsession-patch is trying to do so, but will
fail, beuase s.save() (django/contrib/newsessions/models.py, line 29)
will overwrite (SQL UPDATE) existing sessions instead of failing
creating it.
One tiny advantage remains: There is less code (=time) between
calculation the session key and saving it, so less chances to get a race
condition.
But I don't see a real fix there.

Simon G. wrote:
> One thing that I would like to suggest is that we do link the SID to a

> user-agent [...]

I heard about some proxies or anonymizer services that actually do
change the user agent. But I haven't seen this so far, nor searched for
it. ;-)

Greetings, David Danier


Jeremy Dunck

unread,
Mar 13, 2007, 8:09:31 PM3/13/07
to django-d...@googlegroups.com
On 3/13/07, Simon G. <d...@simon.net.nz> wrote:
> There's also no reason for a
> "normal" user to change UA strings without needing to login again.

UA strings change on every minor rev of some browsers, plus various
tool versions (.net, etc.). Having those users logged out would suck
for my purposes.

Perhaps a setting (sorry!)?

ak

unread,
Mar 14, 2007, 12:47:06 AM3/14/07
to Django developers
Guys I would like to explain some things and decisions in my code.

1. REMOTE_IP. I think it is a good idea to check it. Load balancers
acts as proxy servers. Every web server has a module that allow to
restore REMOTE_IP from X-FORWARDED-FOR, for example apache's module is
called mod_rpaf. And ofcourse it's very easy to make REMOTE_IP
checking code to be disabled or enabled via settings.py, just like it
is in php, such modification is quite simple.
2. In my installation I use lighttpd and django as preforked fastcgi
server, python 2.4. It's hard to reproduce such things but they _do_
happen. One can say my installation is broken but I think it's rather
a hole in django if it allows such things to be ever happened.
3. Perhaps, current sessions may be patched too, what is needed is to
force django to try INSERT into table but it looks like dirty hack to
me because documentation says: if object's pk evaluates to something
else than True, django does checking if such row already exists in db
and while django does it, another process may get the same session id
and try it too and we get collision again. My way works rock stable
because of different table design and this is the only reason why i
used another package name (newsession) istead of just patching
existing package.

ak

unread,
Mar 14, 2007, 1:08:47 AM3/14/07
to Django developers
Even more on remote ip checking: it can be done in a flexible way when
user is able to set either don't ever use it, check remote ip, check
user agent name or ever both remote ip and user agent for paranoids :)
I think everyone understands that it's about 2-3 more lines of code.

James Bennett

unread,
Mar 14, 2007, 1:54:40 AM3/14/07
to django-d...@googlegroups.com
On 3/13/07, ak <an...@khalikov.ru> wrote:
> 1. REMOTE_IP. I think it is a good idea to check it. Load balancers
> acts as proxy servers. Every web server has a module that allow to
> restore REMOTE_IP from X-FORWARDED-FOR, for example apache's module is
> called mod_rpaf. And ofcourse it's very easy to make REMOTE_IP
> checking code to be disabled or enabled via settings.py, just like it
> is in php, such modification is quite simple.

I'm still wary of this -- again, it doesn't do anything to prevent
man-in-the-middle, and it introduces complexity and -- if we're
worried about session collisions -- introduces a whole new class of
bugs when you have a bunch of people behind a NAT or firewall who all,
from the POV of your server, have the same IP address.

> 2. In my installation I use lighttpd and django as preforked fastcgi
> server, python 2.4. It's hard to reproduce such things but they _do_
> happen. One can say my installation is broken but I think it's rather
> a hole in django if it allows such things to be ever happened.

I think we definitely need to tighten up the time between selecting a
session key and shoving the session into the DB -- I'm thinking along
the lines of using get_or_create and retrying until you get back a
True creation value from it (that way you don't have to do anything
else -- the session is already in the DB when get_or_create returns.

But I'm still worried about the apparent fact that this is all coming
from key collisions, because what it points to is not a race condition
at the database level -- what it points to is bad randomness.

Python's random-number generator uses the Mersenne Twister, and in
Python 2.3 it seeds itself with the current system time (and in 2.4 it
tries system-supplied randomness facilities first, then falls back to
system time).

I'm not an expert on randomness by any means, but the problem we have
here is the same pair of random numbers being generated at the same
time or within a very short time of one another, and my first
inclination is to look at the fact that we're using a deterministic
RNG seeded -- by default -- with the system time.

So instead of trying to ensure safety at the database level (because,
frankly, unless we go to extremes we're not going to manage that), why
not look at better seeds for the RNG and solve the real problem?

ak

unread,
Mar 14, 2007, 2:16:57 AM3/14/07
to Django developers
> I'm still wary of this -- again, it doesn't do anything to prevent
> man-in-the-middle, and it introduces complexity and -- if we're
> worried about session collisions -- introduces a whole new class of
> bugs when you have a bunch of people behind a NAT or firewall who all,
> from the POV of your server, have the same IP address.


Well I don't agree with you here:
1. This option can be disabled by default and only enabled optionally
by developer (like in php, as i said)
2. I have an ethernet connection @home and I sometimes log in to our
private web apps from home. Any 'c00l hacker' is able to scan network
traffic, get my session id and use it to join to my session too just
because there is absolutely no checking who uses session. We added ip
checking at our application level code but why don't implement this at
django side as an option ? Noone says this is 100% guarantee of
getting rid of man-in-the-middle, but at least it prevents from
successful hacks from kiddies when using untrusted networks (ie public
wi-fi, ethernet, etc).

> I think we definitely need to tighten up the time between selecting a
> session key and shoving the session into the DB -- I'm thinking along
> the lines of using get_or_create and retrying until you get back a
> True creation value from it (that way you don't have to do anything
> else -- the session is already in the DB when get_or_create returns.
>
> But I'm still worried about the apparent fact that this is all coming
> from key collisions, because what it points to is not a race condition
> at the database level -- what it points to is bad randomness.


Let's separate problems because there are 2 problems:
1. Somehow same ids are generated in a short period of time. So id
generation needs to be improved.
2. Django allows session collisions.

I am not an expert in the first problem but I definitely think that
second problem must be solved in the way I suggested, instead of
shortening time between checking and storing new record to db. Just
imagine: you have a gun, that sometimes fires to it's holder. I
suggest to change construction to make it impossible in any way, you
suggest to change the way how holder should pull the trigger, hold the
gun in arms etc but if we don't change the construction, whatever we
do there still IS a possibility to shoot yourself :) Hope I explained
well enough.

James Bennett

unread,
Mar 14, 2007, 2:38:59 AM3/14/07
to django-d...@googlegroups.com
On 3/14/07, ak <an...@khalikov.ru> wrote:
> 2. I have an ethernet connection @home and I sometimes log in to our
> private web apps from home. Any 'c00l hacker' is able to scan network
> traffic, get my session id and use it to join to my session too just
> because there is absolutely no checking who uses session. We added ip
> checking at our application level code but why don't implement this at
> django side as an option ? Noone says this is 100% guarantee of
> getting rid of man-in-the-middle, but at least it prevents from
> successful hacks from kiddies when using untrusted networks (ie public
> wi-fi, ethernet, etc).

I guess I just feel that someone who's determined enough to sit around
and sniff for your session ID is also going to be determined enough to
defeat IP checking (which wouldn't be that hard -- again, the real
solution to that is encrypted connections, not increasing numbers
"sessions are tied to x, y and z"), so it doesn't get us any real
improvement in security. But I'll bow out of that debate.

> Let's separate problems because there are 2 problems:
> 1. Somehow same ids are generated in a short period of time. So id
> generation needs to be improved.
> 2. Django allows session collisions.

Problem 2 is a direct consequence of Problem 1. In fact, Problem 2 is,
I think, just a symptom of problem 2 -- you can't have a DB race
condition with sessions unless you're generating identical session
keys for different people in the first place.

I'm not saying I'm against separating insert and update in the ORM,
though -- I'm just trying to think pragmatically: this is a nasty bug
that needs to be fixed as soon as possible, and separate insert/update
would be a non-trivial refactoring of the Django ORM and wouldn't
solve the underlying source of this bug, which is duplicated session
keys. Meanwhile, coming up with a better way to generate session keys
-- probably through seeding the RNG -- is easy to do and solves the
underlying problem.

Which means my vote is going to be for better session keys now, and a
design decision about insert/update when we do the QuerySet refactor
post-0.96.

> I suggest to change construction to make it impossible in any way, you
> suggest to change the way how holder should pull the trigger, hold the
> gun in arms etc but if we don't change the construction, whatever we
> do there still IS a possibility to shoot yourself

Again, for me it's a question of priority -- if we can solve the
key-collision problem by seeding the RNG, that's a better solution for
right now than overhauling the ORM's object-saving methods. To run
with your analogy, we've already been shot and we're bleeding, and the
gun is still going off. Our priority right now should be to stop the
shooting and tend to our wounds, or at least to point it away form us,
not to go shopping for a better gun.

ak

unread,
Mar 14, 2007, 3:02:47 AM3/14/07
to Django developers
> so it doesn't get us any real improvement in security

James, there is a concept of 'fool proof'. Real hackers may do many
things. But current model allows even 10 year old kids to be hackers.
This is just against them. There is no ability to protect all sites
with ssl and I would not like a neighbor's son could read my email
because he got my session id by running simple ethereal


> I'm not saying I'm against separating insert and update in the ORM,
> though

Have you ever looked in my code ? There are no hacks of ORM, there is
just a different sessions table design where session id is not anymore
a primary key but just an unique key. Primary key is a separate field
and this approach allows us to use existing ORM to do things correct
way where session collision is impossible at table level, so
impossible by design.
And again: I am not voting against improving if IDs generation, I
think these problems should be solved both.

ak

unread,
Mar 14, 2007, 3:13:05 AM3/14/07
to Django developers
Default django installation uses session lifetime equal to 2 weeks
with no session expiration at browser close. Just calculate what
should be a default strength of session id generator to successfully
generate unique IDs within 2 weeks for a web app with 1 000 000 unique
visitors per a day. As far as I understood, current generator was
checked for duplicates on just 2 000 000 of iterations.

Michael Radziej

unread,
Mar 14, 2007, 4:06:30 AM3/14/07
to django-d...@googlegroups.com
Hi Anton,

ak schrieb:

I like that your code guarantees that you don't issue a session id that
is still in use. I don't care how this is achieved, be it with a
separate id or with using get_or_create, as long as it works without
leaving holes as the current implementation does. Is there a good reason
for the current implementation?

But I don't like that newsession insists on checking the IP.

You argue that it protects against an attacker who can sniff the session
id and replay it. But wouldn't a hacker who is able to observe the
traffic simply sniff the password, too? I see problems with IP checking
(as mentioned in posts before), but no real added security.

But something else must go wrong in your case, as the chance to get the
same session ids should be so tiny that you really shouldn't reports
from your users. Comments say that the random number generator is
seeded when the server process starts, and it's seeded with the current
time. Isn't that bound to fail with multiple processes that start at the
same time? I don't know much about the random module, though.

Michael

ak

unread,
Mar 14, 2007, 4:26:56 AM3/14/07
to Django developers
Michael, I really don't know how would it be possible, I only know the
following:
1. My production web server restarts a few times per day. Sometimes
twice (in a really short period ie 2-3 minutes)
2. I got a report from _new_ user that he received e-mail with login
and pass, clicked to link and became logged in as another user without
login and pass entering. Our auth is based on putting user_id in the
session so new user must have a new clean session without any data.
Any other ideas how he could be logged in as another user, except of
session duplicate ?
3. After that report I cleaned django_session table so all existing
sessions should be immediately expired. After this I got one more
report from another user who said that he used to work in his app (was
logged in), then somehow he became another logged in user. I opened
logs and found that these users logged in to the system at the same
time.

And one more thing about ip checking: ok, I agree about network
sniffing (but there still is a possibility that sniffering was run
after i logged in so attacker could not see my pass but he see my
session id :) ). But please don't forget that session cookies may also
be stolen via XSS.

Ivan Sagalaev

unread,
Mar 14, 2007, 4:45:16 AM3/14/07
to django-d...@googlegroups.com
Michael Radziej wrote:
> But something else must go wrong in your case, as the chance to get the
> same session ids should be so tiny that you really shouldn't reports
> from your users. Comments say that the random number generator is
> seeded when the server process starts, and it's seeded with the current
> time. Isn't that bound to fail with multiple processes that start at the
> same time? I don't know much about the random module, though.

I've just recalled a case... I was using Django's randomly generated
passwords and it worked fine under mod_python. Then I switched to
FastCGI and from some moment *all* passwords started being generated
absolutely equal. The thing was that mod_python was reloaded from time
to time by Apache (using MaxRequestsPerChild) and flup-based FastCGI
were never reloaded. Over long time Python's random generator just
degenerated into giving out exact values (this was Python 2.4 on Debian).

I've solved the problem by manually re-seeding the generator based on
time and couple of other request related data. This now works about half
a year without problems. In our session case this can be as simple as:

random.seed(str(datetime.now()) + str(id(request)))

ak

unread,
Mar 14, 2007, 5:02:15 AM3/14/07
to Django developers
Btw, we use Debian with python 2.4 too

Michael Radziej

unread,
Mar 14, 2007, 6:28:25 AM3/14/07
to django-d...@googlegroups.com
On Wed, Mar 14, Ivan Sagalaev wrote:

> I've just recalled a case... I was using Django's randomly generated
> passwords and it worked fine under mod_python. Then I switched to
> FastCGI and from some moment *all* passwords started being generated
> absolutely equal. The thing was that mod_python was reloaded from time
> to time by Apache (using MaxRequestsPerChild) and flup-based FastCGI
> were never reloaded. Over long time Python's random generator just
> degenerated into giving out exact values (this was Python 2.4 on Debian).

Fascinating. `manage.py runfcgi` now has a parameter `maxrequests`, so you
can force it to restart, too.

In python 2.3, the random generator seems to be seeded at the time when the
module `random` is imported for the first time. I suspect that when all the
Apache child processes are forked, that it's possible that a few get seeded
with the same system time. Same for fastcgi.

Beginning with python2.4, the seed also uses os.urandom() when available, so
it starts to get safer.

How about this: We could fix this for python 2.3 by seeding the random generator on
django startup using the random function of the database together with
urandom (when available) and the system time.

I'm not sure whether it's a good idea to add periodic reseeding.

Michael

--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company

Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689

Afternoon

unread,
Mar 14, 2007, 7:03:57 AM3/14/07
to Django developers

> 2. I have an ethernet connection @home and I sometimes log in to our
> private web apps from home. Any 'c00l hacker' is able to scan network
> traffic, get my session id and use it to join to my session too just
> because there is absolutely no checking who uses session. We added ip
> checking at our application level code but why don't implement this at
> django side as an option ? Noone says this is 100% guarantee of
> getting rid of man-in-the-middle, but at least it prevents from
> successful hacks from kiddies when using untrusted networks (ie public
> wi-fi, ethernet, etc).

This is what SSL is for.

Tying the session to IP is a usability nightmare, what if I'm half way
through a long application process when my wifi goes out of range for
a moment and the DHCP server gives me a new IP, or I move from wifi to
my cable because I'm moving to my desk, or my DSL provider decides
it's time to give me a new IP or I'm browsing from my phone and my IP
changes each time I connect?

Tying the session to IP provides no extra security and degrades the
experience in the common case. Applying SSL adds security and does not
degrade the experience.

Entropy:

Could the RNG be re-seeded when a new session is created? Is it an
expensive operation?

Could the entire request header be used as entropy? Does not
necessarily tie the session model to the request, as the entropy could
be an argument to get_new_session_key, and I would imagine 99% of
calls to this method come from the process_response method of
SessionMiddleware.

Henrique Ser

unread,
Mar 14, 2007, 7:05:28 AM3/14/07
to django-d...@googlegroups.com
Hello all,

I am quite new at django and I've been trying to get all the info I can
by, amongst other means, reading through django-dev and django-users. I
don't remember in which I read this but fastcgi, the way it is designed,
doesn't work well with threads. The actual details escape me but I'm
certain the thing that made this come up had to do with information from
one user being magically overwritten by something submitted by other
user... which sounds terribly like you point #2.

The quick fix on that ocasion was using a forking model in apache,
instead of multithreading iirc.

Just a thought,

henrique

ak

unread,
Mar 14, 2007, 7:07:09 AM3/14/07
to Django developers
As I said, we restart our django fastcgi server a few times per a day.
I think it is more related to:

> Beginning with python2.4, the seed also uses os.urandom() when available, so it starts to get safer.

and our app runs under xen domU debian installation so may be there's
something related to xen kernel entropy

ak

unread,
Mar 14, 2007, 7:24:40 AM3/14/07
to Django developers
Henrique Ser, you lost the fact that we are using lighttpd, not
apache, but our django works as preforked fastcgi.
I can only say that i will NEVER use a potentially vulnerable session
implementation where any not logged in user may get superuser's
permission, even if a chance for this to happen are as little as 1 by
10 000 000. "Shit happens", Forrest Gump said.

Ivan Sagalaev

unread,
Mar 14, 2007, 6:49:07 AM3/14/07
to django-d...@googlegroups.com
Michael Radziej wrote:
> Beginning with python2.4, the seed also uses os.urandom() when available, so
> it starts to get safer.

Uhm... But it still happens only once on first module import. So it's
not safer at all.

> I'm not sure whether it's a good idea to add periodic reseeding.

Why not? I'm not an expert also but I just don't see any disadvantages
beside speed which is still incomparable to INSERTing a session into
database.

ak

unread,
Mar 14, 2007, 8:02:54 AM3/14/07
to Django developers
Ivan, although reseeding still not guarantees that there may be no
duplicates as INSERTing does, isn't it ?
And insert needs to be done only once at session start and is
performed ANYWAY at session start. The difference in my way is that
insert is done when new session is created and in django's way it's
done when session needs to be first saved (in the same request anyway)
so what is a problem ?

Michael Radziej

unread,
Mar 14, 2007, 9:12:59 AM3/14/07
to django-d...@googlegroups.com
On Wed, Mar 14, Ivan Sagalaev wrote:

>
> Michael Radziej wrote:
> > Beginning with python2.4, the seed also uses os.urandom() when available, so
> > it starts to get safer.
>
> Uhm... But it still happens only once on first module import. So it's
> not safer at all.

If you have multiple processes which initialize their random
module at exactly the same system time (how granular is it?, you'll get the
same seed and therefore the same sequence of pseudorandom numbers in
python2.3

In python2.4, the seeds will be different, thus you'll get different
numbers.

Just to put it clear: Something that prevents handing out a session
id that is already in use should be installed additionally.

>
> > I'm not sure whether it's a good idea to add periodic reseeding.
>
> Why not? I'm not an expert also but I just don't see any disadvantages
> beside speed which is still incomparable to INSERTing a session into
> database.

The sequence should obey certain other statistical rules. I am not sure
if reseeding could break it. I simply know enough about it to be cautions,
and I'm not deep enough into it to recommend reseeding or not reseeding ;-)

Benjamin Slavin

unread,
Mar 14, 2007, 10:33:48 AM3/14/07
to django-d...@googlegroups.com
Man, there's been a lot of movement in this thread.

I'd like to just throw out a few comments.

1) The current session framework unquestionably needs some attention.
The way in which sessions are being generated isn't terribly
efficient, and can certainly lead to collisions.

2) James has suggested that the answer is, perhaps, better seeding. I
really don't think that this is an issue. Regardless of the seed,
there is still a mathematical chance that a collision will occur. The
very definition of random means that you can't predict what will come
next. There's a very real chance that any random number generator
will give you the exact same number 100 times in a row. It's a small
probability, but it is completely possible... any random number
generator that tries to avoid this is no longer completely random.

The default Python implementation of 'randomness' should be just fine
for our purposes. To my knowledge, Mersenne is not cryptographically
secure; however, we're not using it for cryptographic purposes...
we're using it as input to a hashing function.

This brings me to my next point -- hashing. Hashing converts input to
a fixed bit-length field. Even if we have truly random and truly
unique input to the hashing function, it's possible (again, highly
unlikely, but possible) that the result of hashing two inputs will
yield the same session key.

Unless we're enforcing this at the database level, there's still the
possibility of collision.

3) In terms of seeding, the default Python approach should be fine (I
can't speak at all to Python 2.3's suitability). It's based on
start-up time of the process. It's unlikely that these seed values
will occur at the -exact- same moment in time (it is, you guessed it,
still possible).

Continual reseeding will actually probably make the problem worse, and
will degrade performance. In a large deployment with lots of
processes continually reseeding, there's a much higher likelihood that
they will hit the same seed, and thus generate the same random number
pattern. I'm fairly certain that the likelihood of collision here is
much higher than using the default Python on-import seeding mechanism.

Again, I see the best solution is to move final collision avoidance to the DB.

4) In terms of tying sessions to an IP, I am very much against this.
It creates complexity, degrades user experience for some users, and
doesn't really add any security. If someone is in a location where
they can sniff your traffic, they can very easily spoof your IP. You
should be using SSL to encrypt this traffic if you are concerned about
this possibility.

5) Thank you to Anton and SmileyChris for bringing this issue
front-and-center again.

Recap:
+1 giving some love to the current session framework
+1 ensuring uniqueness at the DB level
-1 Giving Django a new RNG
-1 Re-seeding on every session generation
-1 Tying sessions to IP

- Ben

Ivan Sagalaev

unread,
Mar 14, 2007, 11:58:45 AM3/14/07
to django-d...@googlegroups.com
ak wrote:
> Ivan, although reseeding still not guarantees that there may be no
> duplicates as INSERTing does, isn't it ?

Yeah... I actually don't object to your approach at all, I'm just
guessing why random numbers could become... uhm... not that random.

James Bennett

unread,
Mar 14, 2007, 3:13:31 PM3/14/07
to django-d...@googlegroups.com
On 3/14/07, Benjamin Slavin <benjami...@gmail.com> wrote:
> 2) James has suggested that the answer is, perhaps, better seeding. I
> really don't think that this is an issue. Regardless of the seed,
> there is still a mathematical chance that a collision will occur.

This is true, but given the range the numbers are being pulled from,
the odds of a collision should be around one in four quintillion.
Obviously, we're nowhere near that, so something's up with the
"random" numbers we're getting -- occasional flukes are one thing, but
the apparently consistent collisions people are reporting are quite
another.

> The default Python implementation of 'randomness' should be just fine
> for our purposes. To my knowledge, Mersenne is not cryptographically
> secure; however, we're not using it for cryptographic purposes...
> we're using it as input to a hashing function.

Yes. And for a given input, md5 will always produce the same hash.

And -- most importantly for our purposes -- Python's RNG is completely
deterministic: given the same seed, the underlying Mersenne
implementation will produce the same sequence of numbers. And by
default on Python 2.3, the seed is the startup time, which means that
processes spawned at the same time (say, by a web server starting up)
can get the same seeds and thus will produce the same sequence of
"random" numbers, like clockwork.

I'm not 100% certain, but I'd bet that's why we're seeing more
collisions than we'd expect.

> Unless we're enforcing this at the database level, there's still the
> possibility of collision.

We are "enforcing it" at the database level -- the session key is the
primary key of the table and thus must be unique. The problem, in a
nutshell, seems to be this:

1. User A visits the site and server process A generates a session
key, checks the database to see whether that key is in use, finds that
it is not, and assigns that key to User A.
2. At or around the same time, User B visits the site and server
process B generates the same session key (probably because both
processes started out with the same seed for Python's RNG). Because
User A's session has not yet been serialized and stored, server
process B also finds that the key is not in use, and assigns that key
to User B.
3. At some later point, both sessions are serialized and stored; the
first one executes an INSERT --- because Django does not distinguish
INSERT from UPDATE for purposes of a model's 'save' method -- and the
second executes an UPDATE on that row. Assume for the moment that User
B's session gets the INSERT and User A's session gets the UPDATE.
4. When User B comes back to the site, he finds that the Django
session framework thinks he is User A.

There are several possible approaches here:

1. Manually seeding the RNG with a value unique to the server process,
or to the request, would move the probability of key collision closer
to what it "should" be, thus dramatically reducing the problem.
2. Changing the session framework's behavior so that it stores the
session in the database as soon as it finds an unused key (e.g., by
looping with 'get_or_create' until it gets a new object) would
drastically reduce the odds of two independent processes with the same
key each thinking that their key is unused.
3. Changing Django's ORM to separate INSERT from UPDATE would make it
impossible for two processes to each execute a 'save' on sessions with
the same key and think that they each created a new row.

Options 1 and 2 are relatively easy and could be implemented quickly.
Option 3 is attractive, but would be harder to do.

(though, technically, it would also be possible to use locking or
transaction isolation to achieve the database-level enforcement, but
I'm wary of doing so)

Personally I'd vote for a combination of options 1 and 2 for now, and
a long, hard look at option 3 when we do the QuerySet refactor.

James Bennett

unread,
Mar 14, 2007, 9:03:50 PM3/14/07
to django-d...@googlegroups.com
On 3/14/07, Michael Radziej <m...@noris.de> wrote:
> If you have multiple processes which initialize their random
> module at exactly the same system time (how granular is it?, you'll get the
> same seed and therefore the same sequence of pseudorandom numbers in
> python2.3
>
> In python2.4, the seeds will be different, thus you'll get different
> numbers.

Out of curiosity, how many people who are seeing this bug happen are
also running on Windows? On Unix we can count on fine-grained
timestamps for 2.3 and the presence of /dev/random for 2.4, but on
Windows... who knows?

> The sequence should obey certain other statistical rules. I am not sure
> if reseeding could break it. I simply know enough about it to be cautions,
> and I'm not deep enough into it to recommend reseeding or not reseeding ;-)

In the case of Python, the initial seed doesn't affect the quality of
the randomness too much, but identical seeds will produce identical
sequences of numbers -- you can verify this yourself by manually
calling random.seed(), passing in a seed of your choice, and watching
the sequence of random numbers it produces afterward. If you start a
different Python process and feed it the same seed, you'll see the
same sequence of numbers come out.

Arvind Singh

unread,
Mar 14, 2007, 9:34:11 PM3/14/07
to django-d...@googlegroups.com
> 1. Manually seeding the RNG with a value unique to the server process,
> or to the request, would move the probability of key collision closer
> to what it "should" be, thus dramatically reducing the problem.

Problem is reduced... but still remains.

> 2. Changing the session framework's behavior so that it stores the
> session in the database as soon as it finds an unused key (e.g., by
> looping with 'get_or_create' until it gets a new object) would
> drastically reduce the odds of two independent processes with the same
> key each thinking that their key is unused.

Very inefficient for a generic module widely in use.

> 3. Changing Django's ORM to separate INSERT from UPDATE would make it
> impossible for two processes to each execute a 'save' on sessions with
> the same key and think that they each created a new row.

Ideal solution, exactly what it should have been in the first place.
Instead of ugly reinventions of integrity enforcements, this should be
left to be handled by the database -- where exactly it belongs.

> Options 1 and 2 are relatively easy and could be implemented quickly.
> Option 3 is attractive, but would be harder to do.

I don't think this would be any harder than:

while True:
session.id = generate_new_session_id()
try:
session.insert()
except:
continue

What I don't understand is: Why the developers don't do the correct
thing by providing separate insert() and update() in addition to
save() ? All it requires is a trivial refactoring and actually
simplifies the code (also clarifying the intent of the programmer).

The problem is not alone with sessions but any CRUD view.

Say you get input from normal users. And your current policy is that
users may only add new data but can't change existing data (typical
for comments, forums, feedbacks etc.). Now, if you just construct an
object from user supplied data and simply call save() -- on
maliciously crafted data -- you may end up overwriting existing data!

Without availability of INSERT, the only way you can save yourself is
by doing get_or_create() (or something similar) -- which is twice as
slow. Why imitate C when we have exceptions in Python? And I heard
something about being "perfectionist"...

Silently overwriting existing data has always been stupid, especially
when the data has come from humans!

Regards,
Arvind

Malcolm Tredinnick

unread,
Mar 14, 2007, 11:03:12 PM3/14/07
to django-d...@googlegroups.com
On Thu, 2007-03-15 at 07:04 +0530, Arvind Singh wrote:
[...]

> What I don't understand is: Why the developers don't do the correct
> thing by providing separate insert() and update() in addition to
> save() ? All it requires is a trivial refactoring and actually
> simplifies the code (also clarifying the intent of the programmer).

One reason it isn't there is because many of the use cases that require
differentiating between updating and creating a new object can already
be implemented without extra code. Requiring an application developer to
always specify update or create anew when it can usually be inferred
just adds to their workload without reason.

> The problem is not alone with sessions but any CRUD view.
>
> Say you get input from normal users. And your current policy is that
> users may only add new data but can't change existing data (typical
> for comments, forums, feedbacks etc.). Now, if you just construct an
> object from user supplied data and simply call save() -- on
> maliciously crafted data -- you may end up overwriting existing data!

If you create a new object, leaving the primary key field empty and call
save() on it, Django will never overwrite existing data. You will only
overwrite existing data if you also specify the primary key field. So we
have a way to enforce "create" already for the typical uses (e.g.
comments, forums, feedbacks).

The problem being discussed in this thread is not about update vs.
creation, it's about enforcing uniqueness and how best to ensure that
when we save the object we won't have a validity constrain violation.

> Without availability of INSERT, the only way you can save yourself is
> by doing get_or_create() (or something similar) -- which is twice as
> slow. Why imitate C when we have exceptions in Python? And I heard
> something about being "perfectionist"...
>
> Silently overwriting existing data has always been stupid, especially
> when the data has come from humans!

You might find your concerns listened to a bit more sympathetically if
you adopted a less confrontational tone in your email. Calling the
developers stupid and being sarcastic about perfectionism is probably
not the best approach to suggesting an alternative (particularly when,
despite your belief, it's not necessarily the unique solution). There
are arguments for your approach, but it's not a slam-dunk and is also
not particularly germane to this thread. It's a discussion worth having
in a clam manner in another thread.

Thank you for your future consideration,
Malcolm

Jon Colverson

unread,
Mar 14, 2007, 11:10:07 PM3/14/07
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> It's a discussion worth having in a clam manner in another thread.

Did anyone else picture mollusks having a conversation when they read
that? ;-)

--
Jon

Benjamin Slavin

unread,
Mar 14, 2007, 11:30:44 PM3/14/07
to django-d...@googlegroups.com
On 3/14/07, James Bennett <ubern...@gmail.com> wrote:
> 1. Manually seeding the RNG with a value unique to the server process,
> or to the request, would move the probability of key collision closer
> to what it "should" be, thus dramatically reducing the problem.

Do we really need to re-seed the RNG? What if we did something like:
session_key = md5.new("%x%s%s" % (random.getrandbits(64),
datetime.datetime.utcnow(), settings.SECRET_KEY)).hexdigest()

datetime.datetime.utcnow could be replaced with whatever would be used
for the seed. You could include os.getpid if you desire.

This brings the differentiation into the key generation directly, not
the seed. If two seeds are the same, the result will be an identical
random number; however, this way, the two RNGs would have to be in
perfect sync -and- the functions would have to be called at the exact
same time to generate a pre-hash collision. This seems far less
likely than seeding at the same instant.


> 2. Changing the session framework's behavior so that it stores the
> session in the database as soon as it finds an unused key (e.g., by
> looping with 'get_or_create' until it gets a new object) would
> drastically reduce the odds of two independent processes with the same
> key each thinking that their key is unused.

A quick look at the code shows that "get_or_create" has the exact same problem.
It checks the DB, then, if the entry doesn't exist, follows the same
"obj.save()" path.
As a result, this solution would still result in a race condition very
similar to the current implementation.

> Options 1 and 2 are relatively easy and could be implemented quickly.

I think that the general spirit of option 1 is in the right direction.
Option 2 might be an acceptable stop-gap, but it's still not
bulletproof.

> Option 3 is attractive, but would be harder to do.

I agree it would be harder, but it is probably the best option (and
the only bulletproof one we've discussed to date). I understand that
some of the refactoring work taking place will make it easier, so it
may be worth waiting on those changes.

- Ben

ak

unread,
Mar 15, 2007, 1:44:36 AM3/15/07
to Django developers
Guys
The only thing I don't understand at the moment is: everyone
understands that current implementation of ORM together with current
implementation of sessions table doesn't allow to force insert and
throw exception if it fails. So I can state this as a special case.
Every framework has limitations and exceptions, here we get one of
them. So why don't just redesign the sessions table and change the
session implementation a bit to make it match current abilities of
current ORM ? It's so simple and even more: it's already done!
IP checking stuff touched just 3 lines of code, it requires just one
minute to remove it. Or 4 minutes to make it optional. We already
spent much more time discussing than it took to write and debug this
code to me.

Malcolm Tredinnick

unread,
Mar 15, 2007, 2:12:07 AM3/15/07
to django-d...@googlegroups.com

Design discussion is healthy. For example, there has been quite a good
amount of feedback on why IP-locking is not really a sensible default.
Of course, some of the discussion has gone around in circles with the
same points being raised again and again, but that's normal for any
thread longer than four or five posts. :-)

Most of your proposed changes seem reasonable. At the moment, it's
post-0.96 from my perspective, since the patch needs to be rewritten,
reviewed, etc and hopefully we are really close to a 0.96 release.

The current version (in #3716) is not a 3 line patch. We're not likely
to commit an entire parallel version of session management. Either the
current version is sufficient or it should be fixed. If you could
rewrite it as a small fix against the existing session module, making
sure it remains backwards compatible, that would be great and help move
it forwards.

Regards,
Malcolm


ak

unread,
Mar 15, 2007, 6:09:48 AM3/15/07
to Django developers
Malcolm, ofcourse I can rewrite it as a patch to current session
module but I think you should understand that anyone who will do svn
update would get broken session module because of extra field in
django_session table.
If it is ok, please answer and I will prepare patch in an hour
(without ip checking stuff)

Michael Radziej

unread,
Mar 15, 2007, 8:08:29 AM3/15/07
to django-d...@googlegroups.com

The point is: It's not possible to extend the session in a
upwards-compatible way, because there's no way to make sure
that INSERT is used when you need to set the pk before save().

get_or_create() first tries a get(), and calls save() if the get() failed.
This still leaves an opportunity for a different process to add the same
session key in the time between get() and save().

It's possible to get this done with help of another table, but
that would still need a syncdb() after an update, and it's also
a rather dirty hack.

So, it's either a change in the database API or a change in the
database table for sessions.

I'm +1 for extending the API to allow explicit inserts.

Arvind Singh

unread,
Mar 15, 2007, 11:54:57 AM3/15/07
to django-d...@googlegroups.com
> One reason it isn't there is because many of the use cases that require
> differentiating between updating and creating a new object can already
> be implemented without extra code. Requiring an application developer to
> always specify update or create anew when it can usually be inferred
> just adds to their workload without reason.

I never suggested to throw away save(). save() (in semantics) should
remain as it. Just its code should be broken into insert() and
update() and it can insert() or update() depending on whether it got
primary key as argument or not.

Check here for simple example:
http://groups.google.com/group/django-developers/browse_thread/thread/6ddfb7da8319c6df/c2d3671b6a6fec5f#msg_56de2a0ef2b884cc


> If you create a new object, leaving the primary key field empty and call
> save() on it, Django will never overwrite existing data. You will only
> overwrite existing data if you also specify the primary key field. So we
> have a way to enforce "create" already for the typical uses (e.g.
> comments, forums, feedbacks).

I do understand that, but when one gets input from the user, one
typically iterates through keys in REQUEST dictionary and populates
the necessary fields. In present way of doing things, the application
developer has to check and ignore id values if he doesn't want an
update in a view. This is unnecessary manual checking. Please think
about it for a moment.

> The problem being discussed in this thread is not about update vs.
> creation, it's about enforcing uniqueness and how best to ensure that
> when we save the object we won't have a validity constrain violation.

IMHO, the core problem if absence of insert() and update(). Without
availability of insert() like semantics, one has to do an inefficient
(and still not perfect) manual checking to enforce uniqueness of
primary keys. This problem occurs with any model that wants to use
custom primary key.

> You might find your concerns listened to a bit more sympathetically if
> you adopted a less confrontational tone in your email. Calling the
> developers stupid and being sarcastic about perfectionism is probably
> not the best approach to suggesting an alternative (particularly when,
> despite your belief, it's not necessarily the unique solution). There
> are arguments for your approach, but it's not a slam-dunk and is also
> not particularly germane to this thread. It's a discussion worth having
> in a clam manner in another thread.

Sorry for the sarcasm. But if you check the previous thread:
http://groups.google.com/group/django-developers/browse_thread/thread/6ddfb7da8319c6df/c2d3671b6a6fec5f
you may understand my frustration. I love Django and that's why I
cared to report back the problem (with solution).

The design of ORM (in this regard) is unnecessarily restrictive and
leads to uniqueness problems whenever custom primary keys (anything
other than auto-incrementing integers supplied by underlying database)
are used. Other proposed solutions to sessions problems are either
very inefficient or not perfect.

IMHO, the problem should be fixed at the ORM level (especially, when
the fix is trivial). It would also be useful for people (like me) who
want to use custom primary keys in their models.


Arvind

Waylan Limberg

unread,
Mar 15, 2007, 12:42:02 PM3/15/07
to django-d...@googlegroups.com
On Wed, 14 Mar 2007 20:34:11 -0500, Arvind Singh <arvind...@gmail.com>
wrote:

Because doing so would introduce some backward incompatibilities.
Therefore, if this solution is considered it will *only* be considered
*after* 0.96 is released. That said, this is still a bug that needs to be
addressed at some level prior to 0.96 and that is were options 1 & 2 come
in.

>
> The problem is not alone with sessions but any CRUD view.

And that is why it has been said a few times already that this will be
looked at when QuerrySet is refactored - again, *after* 0.96. I'd really
like to see separate insert() and update() as well, but to push for them
before 0.96 would be wasting my breath as the devs have made is very clear
(here and elsewhere) that that is not going to happen as they've promised
backward compatability on 0.96.

>
> Say you get input from normal users. And your current policy is that
> users may only add new data but can't change existing data (typical
> for comments, forums, feedbacks etc.). Now, if you just construct an
> object from user supplied data and simply call save() -- on
> maliciously crafted data -- you may end up overwriting existing data!
>
> Without availability of INSERT, the only way you can save yourself is
> by doing get_or_create() (or something similar) -- which is twice as
> slow. Why imitate C when we have exceptions in Python? And I heard
> something about being "perfectionist"...
>
> Silently overwriting existing data has always been stupid, especially
> when the data has come from humans!
>
> Regards,
> Arvind
>
> >

--
Waylan Limberg
way...@gmail.com

David Danier

unread,
Mar 15, 2007, 1:01:30 PM3/15/07
to django-d...@googlegroups.com
> Because doing so would introduce some backward incompatibilities.

Thats not true, if you provide a save()-method that calls either
update() or insert(). save() should not be removed, it makes many things
very easy. So only additional flexibility is added without removing or
changing any interface.

Greetings, David Danier

James Bennett

unread,
Mar 15, 2007, 12:47:13 PM3/15/07
to django-d...@googlegroups.com
On 3/14/07, Arvind Singh <arvind...@gmail.com> wrote:
> What I don't understand is: Why the developers don't do the correct
> thing by providing separate insert() and update() in addition to
> save() ? All it requires is a trivial refactoring and actually
> simplifies the code (also clarifying the intent of the programmer).

I can't speak for anyone else, but...

Notice how I haven't said "I don't think we should do this". I've been
saying "I think the real source of the sessions bug is somewhere else,
and we should fix that first, then look at INSERT vs. UPDATE when we
do the QuerySet refactoring".

I agree that the INSERT/UPDATE distinction is something we should look
at, but I feel like this isn't really the right time; making an API
change in response to a bug -- even when it seems like a trivial
change -- often has unpleasant consequences, simply because it means
making the change quickly and under pressure..

So I'd prefer to deal with the key-collision problem at the source by
doing what we can to stop it from happening in the first place, and
then deal with INSERT/UPDATE when we don't have the specter of a nasty
bug hanging over our heads and pushing us to rush things. And since
we're already going to be looking at QuerySet after 0.96, I feel like
that's a natural time to sit down and look at model saving and INSERT
versus UPDATE.

Tom Barta

unread,
Mar 15, 2007, 1:25:55 PM3/15/07
to django-d...@googlegroups.com
I only get digest messages from this group, so I've responded to
several messages inline. Apologies in advance if this breaks
threading.

James Bennett said:
> On 3/13/07, Michael Radziej <m...@noris.de> wrote:
> > a) the client can sit behind a NAT that might hand out different IPs
> > b) the server can be behind NAT and not see the true IP at all. It
> > might see different IPs for the same client over time.
> > c) a crazy load balancer might get in the way
>
> (or any proxy, in which case REMOTE_IP will be the same for every
> request and you'd want to be looking at the 'X-Forwarded-For' header
> -- this is one of the problems with IP-based sessions)

Also, X-Forwarded-For is not a standard header - it's something Squid
does. Granted, Squid is the most common proxy, and most other proxies
will follow its lead in supplying this header, but there is absolutely
no guarantee that a HTTP/1.1 proxy will send this header.

Simon G. said:
> I actually passed this onto the security mailing address, as I thought
> it was better to be safe than sorry. Adrian's response was that tying
> things to the IP address is not a good idea (for the reasons that
> others have stated in this thread).

Agreed.

> One thing that I would like to suggest is that we do link the SID to a
> user-agent (see Chris Shiflett's excellent article on this at [1]), as
> this provides some protection (-via-obscurity) against these
> collisions, since the attacker not only needs to get a valid SID, but
> match it to the correct UA string. There's also no reason for a
> "normal" user to change UA strings without needing to login again.
>
> This can easily be added to the existing session framework by adding
> it to the hashing

I think the problem with doing that is a bit of a false sense of
security about collisions. With a full apartment complex behind a
NAT, there are easily 100 people surfing the net using the same
version of Internet Explorer, so adding user-agent doesn't solve
anything, it just pushes it down to "fewer people get screwed".

James Bennett said:
> Python's random-number generator uses the Mersenne Twister, and in
> Python 2.3 it seeds itself with the current system time (and in 2.4 it
> tries system-supplied randomness facilities first, then falls back to
> system time).
>
> I'm not an expert on randomness by any means, but the problem we have
> here is the same pair of random numbers being generated at the same
> time or within a very short time of one another, and my first
> inclination is to look at the fact that we're using a deterministic
> RNG seeded -- by default -- with the system time.
>
> So instead of trying to ensure safety at the database level (because,
> frankly, unless we go to extremes we're not going to manage that), why
> not look at better seeds for the RNG and solve the real problem?

If this race condition occurs between different processes, os.getpid()
will provide randomness. If this race condition occurs between
different threads in the same process, the file descriptor (integer)
of the connected socket will be different, because they are different
connections. Does Django have access to the Python socket object so
it can get sock.fileno() ?

Some combination of current time, current process id, and current file
descriptor should be enough to avoid MT seed race conditions.

ak said:
> 1. This option [IP-based session checking]
> can be disabled by default and only enabled optionally
> by developer (like in php, as i said)

One of the benefits I see in Django is allowing non-web-application
develpers to "get things right" without really having to understand
everything that they're doing. Giving developers an option to use
IP-based checking has several downsides:
- it encourages using obscurity as an alternative to security
- it encourages developers to do something that is known to have very
real problems (the old AOL-proxy argument).

Just because PHP does something doesn't mean that it's a good idea.

If you really are worried about XSS attacks and the like, you should
look at Wordpress. All of their administrative actions are protected
by nonces, mitigating the damage that can be done by a successful
cookie theft (and it doesn't depend on proxies or clients "following
the rules"). Not that WP is a shining example of secure programming,
but their strict adherence to nonces does provide a good second layer
of defense.

> 2. I have an ethernet connection @home and I sometimes log in to our
> private web apps from home. Any 'c00l hacker' is able to scan network
> traffic, get my session id and use it to join to my session too just
> because there is absolutely no checking who uses session. We added ip
> checking at our application level code but why don't implement this at
> django side as an option ? Noone says this is 100% guarantee of
> getting rid of man-in-the-middle, but at least it prevents from
> successful hacks from kiddies when using untrusted networks (ie public
> wi-fi, ethernet, etc).

HTTPS for public sites or a secure VPN for company intranets.
Anything else I've seen suggested in this thread is defeatable by
someone sufficiently dedicated.

--
Tom Barta

Malcolm Tredinnick

unread,
Mar 15, 2007, 3:04:07 PM3/15/07
to django-d...@googlegroups.com

It would be better to be backwards compatible. Certainly no backwards
incompatible solution will go in before 0.96 and once we make that
decision, it's worth taking the time required to make sure we have
examined all options.

Maybe we need to looks at explicit insert() vs. update(), maybe one of
the alternatives is possible. Adding an extra field to the sessions
table would be a last resort since no real extra information is carried
in that field -- it's an implementation workaround, by the looks of it.
So it looks like there is some still design work and implementation
experimenting that needs to be done here.

Regards,
Malcolm


James Bennett

unread,
Mar 15, 2007, 3:14:26 PM3/15/07
to django-d...@googlegroups.com
On 3/14/07, Benjamin Slavin <benjami...@gmail.com> wrote:
> Do we really need to re-seed the RNG? What if we did something like:
> session_key = md5.new("%x%s%s" % (random.getrandbits(64),
> datetime.datetime.utcnow(), settings.SECRET_KEY)).hexdigest()

That works too :)

I'm just wanting anything which reduces the odds of a collision,
because it feels like some deployments are getting lots of them (I
have a suspicion this is system-specific because of how Python
initially seeds its RNG).

> This brings the differentiation into the key generation directly, not
> the seed. If two seeds are the same, the result will be an identical
> random number; however, this way, the two RNGs would have to be in
> perfect sync -and- the functions would have to be called at the exact
> same time to generate a pre-hash collision. This seems far less
> likely than seeding at the same instant.

I think the collision problems are largely coming from groups of
server processes being spawned at the same time (anecdotally, someone
mentioned to me that on Windows Python can't get sub-second precision
for the initial time seed which, if correct, may explain a lot of
this).

> A quick look at the code shows that "get_or_create" has the exact same problem.
> It checks the DB, then, if the entry doesn't exist, follows the same
> "obj.save()" path.
> As a result, this solution would still result in a race condition very
> similar to the current implementation.

Yes and no. Part of the problem right now, I think, is that we wait to
store the session, which increases the window in which a collision can
cause problems -- immediately saving the session as soon as we hit an
unused key reduces that window to, most likely, a fraction of a
second, which requires stupendous coincidences to screw up. I'd feel
comfortable with a combination of this and the above option as the
initial fix.

> I agree it would be harder, but it is probably the best option (and
> the only bulletproof one we've discussed to date). I understand that
> some of the refactoring work taking place will make it easier, so it
> may be worth waiting on those changes.

Agreed; again, I feel like trying to change the API or the mechanics
of model saving while we have a bug like this hanging over our heads
is a recipe for trouble down the line -- better to solve the
key-collision problem right now, and then have a little breathing room
to work out a model-save solution we can hang on to.

James Bennett

unread,
Mar 15, 2007, 10:49:11 PM3/15/07
to django-d...@googlegroups.com
On 3/15/07, Michael Radziej <m...@noris.de> wrote:
> The point is: It's not possible to extend the session in a
> upwards-compatible way, because there's no way to make sure
> that INSERT is used when you need to set the pk before save().

Again, I'm not disagreeing with the INSERT/UPDATE suggestion, I'm just
saying that a change to the model API should not take place while we
have a bug hanging over our heads; let's fix the random-number
collision, reduce the window between getting a session key and saving
the session, and then look at model saving once we can breathe again.
Making a design decision like this under the pressure of a bug that we
all want fixed is not a good idea ;)

Reply all
Reply to author
Forward
0 new messages