Identity, tgrepozewho user permissions

Janzert

unread,

Jul 3, 2008, 12:57:46 AM7/3/08

to turbo...@googlegroups.com

Today while working on a project not using TG I was poking around the
identity code looking to see what the method was to get all of a user's
permissions. I was rather surprised to see that every group the user is
a member of is iterated through to build up the permissions[1]. Is there
a reason that a sql join isn't used instead[2]? I would have thought a
join was better performing.

Just wondering what I'm overlooking or misunderstanding,
Janzert

1. All the TG related code I found uses something similar to this:
@property
def permissions(self):
p = set()
for g in self.groups:
p |= set(g.permissions)
return p

2. Why not something like the following (untested code)?
@property
def permissions(self):
p_q = session.query(Permission).join(['tg_group', 'tg_user'])
p_q.filter_by(user_id = self.user_id)
return p_q.all()

Christopher Arndt

unread,

Jul 3, 2008, 8:48:37 AM7/3/08

to turbo...@googlegroups.com

Janzert schrieb:

> 1. All the TG related code I found uses something similar to this:
> @property
> def permissions(self):
> p = set()
> for g in self.groups:
> p |= set(g.permissions)
> return p
>
> 2. Why not something like the following (untested code)?
> @property
> def permissions(self):
> p_q = session.query(Permission).join(['tg_group', 'tg_user'])
> p_q.filter_by(user_id = self.user_id)
> return p_q.all()

No reason than simplicity and readability, I think. There's always a
tradeoff between optimization and readable code. Patches are welcome,
though. But please make sure that it works with all SQLAlchemy version
>=0.3.

Chris

Florent Aide

unread,

Jul 3, 2008, 9:04:42 AM7/3/08

to turbo...@googlegroups.com

I'll read that one and test, I like the idea of sql optimizing the
identity check sounds interesting for tg1 and tg2.... :)
Could you provide a patch in our trac system so that I don't lose that nugget?

Florent.

Dean Landolt

unread,

Jul 3, 2008, 11:31:38 AM7/3/08

to turbo...@googlegroups.com

For what it's worth, I would argue the generative SA query should be more legible to a TG coder than the ior logic (which, frankly, I'm still a little iffy on).

Christoph Zwerschke

unread,

Jul 3, 2008, 5:25:57 PM7/3/08

to turbo...@googlegroups.com

Janzert schrieb:

> 2. Why not something like the following (untested code)?
> @property
> def permissions(self):
> p_q = session.query(Permission).join(['tg_group', 'tg_user'])
> p_q.filter_by(user_id = self.user_id)
> return p_q.all()

I would write something like that in a long statement, without the
arbitrary p_q variable and avoid using hard coded table names.

The other problem is that this code doesn't work at all.

Also, the method must return a set, not a list with duplicates.

There are basically two ways to collect the permissions via SQL.

Either you make the two joins, creating all possible ways a user can
have a certain permission, and then remove the duplicates, like that:

def permissions(self):
return set(Permission.query.join(Permission.groups,
Group.users).filter_by(user_id=self.user_id).all())

Or you create the query using exist clauses, so that the query does not
produce any duplicates in the first place:

def permissions(self):
return set(Permission.query.filter(Permission.groups.any(
Group.users.any(User.user_id==self.user_id))).all())

The problem with this latter solution only works with SA >= 0.4 (and the
former solution even only with SA >= 0.5).

(Also, both solutions do a last join with the user table that is
actually unnecessary. I currently see no elegant way to avoid this. It
does not harm much, though.)

-- Christoph

Janzert

unread,

Jul 4, 2008, 4:22:20 AM7/4/08

to turbo...@googlegroups.com

Christoph Zwerschke wrote:
> Janzert schrieb:
>> 2. Why not something like the following (untested code)?
>> @property
>> def permissions(self):
>> p_q = session.query(Permission).join(['tg_group', 'tg_user'])
>> p_q.filter_by(user_id = self.user_id)
>> return p_q.all()
>
> I would write something like that in a long statement, without the
> arbitrary p_q variable and avoid using hard coded table names.
>
> The other problem is that this code doesn't work at all.
>

Like I said it was untested. :/

> Also, the method must return a set, not a list with duplicates.
>
> There are basically two ways to collect the permissions via SQL.
>
> Either you make the two joins, creating all possible ways a user can
> have a certain permission, and then remove the duplicates, like that:
>
> def permissions(self):
> return set(Permission.query.join(Permission.groups,
> Group.users).filter_by(user_id=self.user_id).all())
>
> Or you create the query using exist clauses, so that the query does not
> produce any duplicates in the first place:
>
> def permissions(self):
> return set(Permission.query.filter(Permission.groups.any(
> Group.users.any(User.user_id==self.user_id))).all())
>
> The problem with this latter solution only works with SA >= 0.4 (and the
> former solution even only with SA >= 0.5).
>
> (Also, both solutions do a last join with the user table that is
> actually unnecessary. I currently see no elegant way to avoid this. It
> does not harm much, though.)
>
> -- Christoph
>

My understanding and testing seem to show that an inner join doesn't
return duplicates. Wrapping the result in a set doesn't seem necessary.

After playing around with SA 0.3.10, 0.4.6 and 0.5.0beta1 and taking
into account your comments I came up with this.

def permissions(self):
return Permission.query.join(["groups",
"users"]).filter_by(user_id=self.user_id).all()
permissions = property(permissions)

I tested by setting up a user belonging to two groups with each of them
containing two permissions. One of the permissions was unique to the
group and one was the same in both groups.

I put a diff from a TG 1.0.5 quickstarted project to what I used for
testing at http://paste.turbogears.org/paste/3056 and the full project
at http://janzert.com/files/jointest.zip

If this seems the right way to go I can create a trac issue and try to
work up a real patch.

Janzert

Christoph Zwerschke

unread,

Jul 4, 2008, 5:29:23 AM7/4/08

to turbo...@googlegroups.com

Janzert schrieb:

> My understanding and testing seem to show that an inner join doesn't
> return duplicates. Wrapping the result in a set doesn't seem necessary.

The query creates duplicates, and they are pulled from the database, but
they are removed by SQLAlchemy because you are querying object instances
and it makes no sense for SQLAlchemy to deliver duplicates in this case.

Nevertheless, the result should be converted to a set for backward
compatibility and performance reasons - you usually check for a certain
permission with "in", and that will be faster for a set than for a list.

> After playing around with SA 0.3.10, 0.4.6 and 0.5.0beta1 and taking
> into account your comments I came up with this.
>
> def permissions(self):
> return Permission.query.join(["groups",
> "users"]).filter_by(user_id=self.user_id).all()
> permissions = property(permissions)

Yes, this seems to work with all these SA versions. And the names in the
join here are attribute names, not table names, so this is ok.

Still, this join query produces more results than necessary. You can
verify this by using count() instead of all() - you will see more
results counted than elements in the list. As I explained above, the
duplicates are filtered away by SQLAlchemy because the query requests
object instances, that's why you don't see them. So a solution using
"any" (exist clauses) may still be more performant.

(Of course ehis is all a bit theoretical since users are usually not in
hundreds of groups, but we should try to do it right anyway.)

-- Christop

Christoph Zwerschke

unread,

Jul 4, 2008, 6:03:45 AM7/4/08

to turbo...@googlegroups.com

Christoph Zwerschke wrote:
> Still, this join query produces more results than necessary.

As a compromize we could eliminate the duplicates on the database level
already so that they are not returned by the query:

def permissions(self):
return set(Permission.query.distinct().join(['groups',
'users']).filter_by(user_id=self.user_id))

Note that I also removed the all(): It is not necessary if we convert to
a set, since the query itself is iterable.

-- Christoph

Janzert

unread,

Jul 4, 2008, 6:33:08 AM7/4/08

to turbo...@googlegroups.com

Christoph Zwerschke wrote:
> Janzert schrieb:
>> My understanding and testing seem to show that an inner join doesn't
>> return duplicates. Wrapping the result in a set doesn't seem necessary.
>
> The query creates duplicates, and they are pulled from the database, but
> they are removed by SQLAlchemy because you are querying object instances
> and it makes no sense for SQLAlchemy to deliver duplicates in this case.

Ah yes, by running the generated sql directly I see that now. I'll try
and see if there is a way to generate the exists clause that is
compatible with 0.3 on up.

Janzert

Christopher Arndt

unread,

Jul 4, 2008, 8:00:21 AM7/4/08

to turbo...@googlegroups.com

Janzert schrieb:

> I put a diff from a TG 1.0.5 quickstarted project to what I used for
> testing at http://paste.turbogears.org/paste/3056 and the full project
> at http://janzert.com/files/jointest.zip
>
> If this seems the right way to go I can create a trac issue and try to
> work up a real patch.

Please provide a patch to the quickstart templates resp. the identity
module then. Please also provide tests and mention in your ticket if
anything needs to be added / altered in the Identity docs

http://docs.turbogears.org/1.0/UsingIdentity

Thx, Chris

Janzert

unread,

Jul 4, 2008, 11:36:54 AM7/4/08

to turbo...@googlegroups.com

I created a ticket at http://trac.turbogears.org/ticket/1879 no patch or
anything yet though.

One thing looking at the current identity tests, 1.1 is only testing the
soprovider. Should a second ticket be opened to create saprovider tests?

There shouldn't need to be any documentation changes (or really even
test changes if the current tests had covered it) because the change is
functionally equivalent just better performing.

Janzert

unread,

Jul 4, 2008, 11:38:09 PM7/4/08

to turbo...@googlegroups.com

Before working on getting a SA 0.3 compatible version I decided to take
a look at the speed differences. I initialized the database with a user
that belonged to 20 groups with each group having 11 out of 60
permissions. The timing below were done with SA 0.5 and postgresql
8.3.1. I also tried sqlite and the timings were basically the same.

python -m timeit -s"import perm_test;perm_test.iter()"
"perm_test.iter()" 10000 loops, best of 3: 94.9 usec per loop

python -m timeit -s"import perm_test;perm_test.join()"
"perm_test.join()" 100 loops, best of 3: 13.1 msec per loop

python -m timeit -s"import perm_test;perm_test.exist()"
"perm_test.exist()" 100 loops, best of 3: 14.7 msec per loop

As you can see the current method of iterating through the groups is
actually 130 to 150 times faster than single query methods.

I've attached my test project and scripts to the trac issue* if someone
wants to double check my methods.

Just now I also tried a test with 500 groups, 800 permissions and 250
permissions per group, for what should be a torture test to the current
method.

python -m timeit -s"import perm_test;perm_test.iter()" "perm_test.iter()"
100 loops, best of 3: 16.9 msec per loop

python -m timeit -s"import perm_test;perm_test.join()" "perm_test.join()"
10 loops, best of 3: 808 msec per loop

python -m timeit -s"import perm_test;perm_test.exist()" "perm_test.exist()"
10 loops, best of 3: 256 msec per loop

While a little closer still the current method is much better. As a side
note while watching the process usage during these test, the iterative
methods has most of the cpu usage in python, the join query mostly in
postgres and the exist query is split about evenly.

Janzert

* http://trac.turbogears.org/ticket/1879

Christoph Zwerschke

unread,

Jul 5, 2008, 10:23:56 AM7/5/08

to turbo...@googlegroups.com

Janzert schrieb:

> I've attached my test project and scripts to the trac issue* if
> someone wants to double check my methods.

I'm using the following add-on, maybe it's useful for you, too ;-)
https://addons.mozilla.org/en-US/thunderbird/addon/5759

> As you can see the current method of iterating through the groups is
> actually 130 to 150 times faster than single query methods.

I think this is because the current method uses the group.permissions
lists in the database session if possible. I.e. if you run it in a loop,
it will become very efficient. However, in reality this function will be
called only once or a few times per request, always in a fresh session.
You can simulate this with turbogears.database.session.expire_all().
Doing so, I measured that the join method is about 3 times faster and
the exists method is about 2 times faster than the old method (similar
for both SQLite and Postgres 8.3).

So it depends on how people use the permissions attribute. If used more
than 2 times per request, the old method is faster. One solution for
getting good performance in both cases is using the join method and
caching the permissions in the User instance:

def permissions(self):
try:
p = User._permissions
except AttributeError:
p = set(Permission.query.distinct().join(['groups',
'users']).filter_by(user_id=self.user_id))
User._permissions = p
return p

Another interesting question is why the join method is in fact faster
than the exists method, though you would expect the opposite. See also:

http://wrschneider.blogspot.com/2004/12/postgresql-performance-of-where-exists.html

In this case we have a chained exists, maybe this makes it difficult to
optimize for the database.

-- Christoph

Janzert

unread,

Jul 5, 2008, 9:00:16 PM7/5/08

to turbo...@googlegroups.com

Christoph Zwerschke wrote:
> Janzert schrieb:
> > I've attached my test project and scripts to the trac issue* if
> > someone wants to double check my methods.
>
> I'm using the following add-on, maybe it's useful for you, too ;-)
> https://addons.mozilla.org/en-US/thunderbird/addon/5759
>

Yeah, I probably would find that useful. But in this case I really did
mean "I've attached ... to the trac issue*". With the url at the end of
the message. :)

* http://trac.turbogears.org/ticket/1879

>> As you can see the current method of iterating through the groups is
>> actually 130 to 150 times faster than single query methods.
>
> I think this is because the current method uses the group.permissions
> lists in the database session if possible. I.e. if you run it in a loop,
> it will become very efficient. However, in reality this function will be
> called only once or a few times per request, always in a fresh session.
> You can simulate this with turbogears.database.session.expire_all().
> Doing so, I measured that the join method is about 3 times faster and
> the exists method is about 2 times faster than the old method (similar
> for both SQLite and Postgres 8.3).
>

Ah, yes some sort of caching going on would certainly make sense. I'll
take another look at it.

Janzert

Christoph Zwerschke

unread,

Jul 6, 2008, 5:58:52 AM7/6/08

to turbo...@googlegroups.com

Janzert schrieb:

> Christoph Zwerschke wrote:
>> Janzert schrieb:
>> > I've attached my test project and scripts to the trac issue* if
>> > someone wants to double check my methods.
>>
>> I'm using the following add-on, maybe it's useful for you, too ;-)
>> https://addons.mozilla.org/en-US/thunderbird/addon/5759
>
> Yeah, I probably would find that useful. But in this case I really did
> mean "I've attached ... to the trac issue*". With the url at the end of
> the message. :)

Oh well :) I think I need another add-on to help me not reading over
such important details. Somehow I totally missed you created a ticket.
So I suggest we continue the discussion there.

-- Chris

Reply all

Reply to author

Forward