How to map pg_user and pg_group?

430 views
Skip to first unread message

Christoph Zwerschke

unread,
Oct 16, 2006, 12:16:26 PM10/16/06
to sqlal...@googlegroups.com
I'm trying to tweak the TurboGears identity management to use the
PostgreSQL users and groups instead of its own user and group tables.
The users/groups must be mapped to a "User"/"Group" class with standard
attribute names and additional properties "groups"/"users" to determine
the groups of which a user is a member / the members of a group.

I have realized it as follows, but the artificial association table
pg_groupusers feels a bit awkward. Is there an easier way?

pg_user = Table('pg_user', metadata,
Column('usesysid', Integer, primary_key=True),
Column('usename', String, unique=True),
Column('usesuper', Boolean))

pg_group = Table('pg_group', metadata,
Column('grosysid', Integer, primary_key=True),
Column('groname', String, unique=True),
Column('grolist', String))

pg_groupusers = select(
[pg_user.c.usesysid, pg_group.c.grosysid],
pg_user.c.usesysid==func.any(pg_group.c.grolist)
).alias('pg_groupusers')

class User(object):
def __init__(self, user_id, user_name, is_super):
self.user_id = user_id
self.user_name = user_name
self.is_super = is_super

class Group(object):
def __init__(self, group_id, group_name, group_list):
self.group_id = group_id
self.group_name = group_name
self.group_list = group_list

mapper(User, pg_user, properties={
'user_id': pg_user.c.usesysid,
'user_name': pg_user.c.usename,
'is_super': pg_user.c.usesuper,
'groups': relation(Group, secondary=pg_groupusers,
primaryjoin=pg_user.c.usesysid==pg_groupusers.c.usesysid,
secondaryjoin=pg_group.c.grosysid==pg_groupusers.c.grosysid)})

mapper(Group, pg_group, properties={
'group_id': pg_group.c.grosysid,
'group_name': pg_group.c.groname,
'users': relation(User, secondary=pg_groupusers,
primaryjoin=pg_group.c.grosysid==pg_groupusers.c.grosysid,
secondaryjoin=pg_user.c.usesysid==pg_groupusers.c.usesysid)})

Michael Bayer

unread,
Oct 16, 2006, 12:31:20 PM10/16/06
to sqlal...@googlegroups.com
in theory you should be able to do this:

'groups':relation(Group, primaryjoin=pg_user.c.usesysid==func.any
(pg_group.c.grolist), viewonly=True)

notice the "viewonly" flag which is in the trunk only, which will
tell SA not to try persisting that mapping (since it cant)...i would
gather that was the problem you had if you had tried it this way before.

Christoph Zwerschke

unread,
Oct 17, 2006, 5:38:26 AM10/17/06
to sqlal...@googlegroups.com
Michael Bayer schrieb:

> in theory you should be able to do this:
>
> 'groups':relation(Group, primaryjoin=pg_user.c.usesysid==func.any
> (pg_group.c.grolist), viewonly=True)
>
> notice the "viewonly" flag which is in the trunk only, which will
> tell SA not to try persisting that mapping (since it cant)...i would
> gather that was the problem you had if you had tried it this way before.

Without the "viewonly" flag, I got "No syncrules generated" errors.
These have disappeared, but there are still problems here.

Again, assume the following setup:

pg_user = Table('pg_user', metadata,
Column('usesysid', Integer, primary_key=True),
Column('usename', String, unique=True),
Column('usesuper', Boolean))

pg_group = Table('pg_group', metadata,
Column('grosysid', Integer, primary_key=True),
Column('groname', String, unique=True),
Column('grolist', String))

class User(object):
pass

class Group(object):
pass

Now when I map as follows:

mapper(User, pg_user, properties={
'user_id': pg_user.c.usesysid,
'user_name': pg_user.c.usename,
'is_super': pg_user.c.usesuper,

'groups': relation(Group, viewonly=True,
primaryjoin=pg_user.c.usesysid==func.any(pg_group.c.grolist))})

I get this error:

sqlalchemy.exceptions.ArgumentError: On relation 'groups', can't figure
out which side is the foreign key for join condition 'pg_user.usesysid =
any(pg_group.grolist)'. Specify the 'foreignkey' argument to the relation.

Shouldn't it be clear what the foreign key is in this situation?

Now when I explicitly specify the foreign key (as pg_group.grosysid or
pg_group.grolist, doesn't matter),

mapper(User, pg_user, properties={
'user_id': pg_user.c.usesysid,
'user_name': pg_user.c.usename,
'is_super': pg_user.c.usesuper,

'groups': relation(Group, viewonly=True,
primaryjoin=pg_user.c.usesysid==func.any(pg_group.c.grolist),
foreignkey=pg_group.c.grosysid)})

Then the groups property returns all existing groups, not the groups of
the corresponding user (the following query is echoed by the engine):

SELECT pg_group.grolist, pg_group.grosysid, pg_group.groname
FROM pg_user, pg_group
WHERE pg_user.usesysid = any(pg_group.grolist)
ORDER BY pg_group.grosysid

-- Christoph

Michael Bayer

unread,
Oct 17, 2006, 6:43:39 AM10/17/06
to sqlal...@googlegroups.com

On Oct 17, 2006, at 5:38 AM, Christoph Zwerschke wrote:
> mapper(User, pg_user, properties={
> 'user_id': pg_user.c.usesysid,
> 'user_name': pg_user.c.usename,
> 'is_super': pg_user.c.usesuper,
> 'groups': relation(Group, viewonly=True,
> primaryjoin=pg_user.c.usesysid==func.any
> (pg_group.c.grolist))})
>
> I get this error:
>
> sqlalchemy.exceptions.ArgumentError: On relation 'groups', can't
> figure
> out which side is the foreign key for join condition
> 'pg_user.usesysid =
> any(pg_group.grolist)'. Specify the 'foreignkey' argument to the
> relation.
>
> Shouldn't it be clear what the foreign key is in this situation?

to a human, maybe. to a python interpreter the right side of the
binary '==' expression is just a sqlalchemy.sql.Function, which looks
nothing like the sqlalchemy.schema.Column type which it expects to
locate as a foreign key.

> Now when I explicitly specify the foreign key (as pg_group.grosysid or
> pg_group.grolist, doesn't matter),
>
> mapper(User, pg_user, properties={
> 'user_id': pg_user.c.usesysid,
> 'user_name': pg_user.c.usename,
> 'is_super': pg_user.c.usesuper,
> 'groups': relation(Group, viewonly=True,
> primaryjoin=pg_user.c.usesysid==func.any(pg_group.c.grolist),
> foreignkey=pg_group.c.grosysid)})
>
> Then the groups property returns all existing groups, not the
> groups of
> the corresponding user (the following query is echoed by the engine):
>
> SELECT pg_group.grolist, pg_group.grosysid, pg_group.groname
> FROM pg_user, pg_group
> WHERE pg_user.usesysid = any(pg_group.grolist)
> ORDER BY pg_group.grosysid

for the "lazy clause" generation, which is when it takes "x=y" and
converts it into "x=?", this is the same issue as the foreign key.
but for this, i have committed in rev 2001 a more thorough search for
a "Column" in each side of the clause so that it can identify which
side of a "=" operation it can apply a bind parameter to, so a test
program can now generate:

SELECT pg_group.groname AS pg_group_groname, pg_group.grolist AS
pg_group_grolist, pg_group.grosysid AS pg_group_grosysid
FROM pg_catalog.pg_group
WHERE %(lazy_bcba)s = any(pg_group.grolist) ORDER BY pg_group.grosysid

a similar approach might apply to the "foreignkey" detection issue
although i like to keep these "clause analysis" functions as
conservative as possible, since forcing explicitness in the case of
unusual configurations (joining on a function is an unusual
configuration) reduces the chances of surprise behavior. in the case
of the "lazy clause" i would rather not have users start to worry
about manually defining those.


Christoph Zwerschke

unread,
Oct 17, 2006, 7:34:42 AM10/17/06
to sqlal...@googlegroups.com
Michael Bayer wrote:
> Christoph Zwerschke wrote:
>> mapper(User, pg_user, properties={
>> 'user_id': pg_user.c.usesysid,
>> 'user_name': pg_user.c.usename,
>> 'is_super': pg_user.c.usesuper,
>> 'groups': relation(Group, viewonly=True,
>> primaryjoin=pg_user.c.usesysid==func.any
>> (pg_group.c.grolist))})
>>
>> I get this error:
>>
>> sqlalchemy.exceptions.ArgumentError: On relation 'groups', can't
>> figure out which side is the foreign key for join condition
>> 'pg_user.usesysid = any(pg_group.grolist)'. Specify the
>> 'foreignkey' argument to the relation.
>>
>> Shouldn't it be clear what the foreign key is in this situation?
>
> to a human, maybe. to a python interpreter the right side of the
> binary '==' expression is just a sqlalchemy.sql.Function, which looks
> nothing like the sqlalchemy.schema.Column type which it expects to
> locate as a foreign key.

But that expression needs to be parsed to SQL anyway, and doing so all
involved columns could be tracked. Another clue is that the class for
the relation object has been stated as 'Group' which is mapped to
pg_group. Why do you actually need a Foreign*Key*, not a Foreign*Table*?

> for the "lazy clause" generation, which is when it takes "x=y" and
> converts it into "x=?", this is the same issue as the foreign key.
> but for this, i have committed in rev 2001 a more thorough search for
> a "Column" in each side of the clause so that it can identify which
> side of a "=" operation it can apply a bind parameter to, so a test
> program can now generate:
>
> SELECT pg_group.groname AS pg_group_groname, pg_group.grolist AS
> pg_group_grolist, pg_group.grosysid AS pg_group_grosysid
> FROM pg_catalog.pg_group
> WHERE %(lazy_bcba)s = any(pg_group.grolist) ORDER BY pg_group.grosysid

Thanks, this works great now!

-- Christoph

Michael Bayer

unread,
Oct 17, 2006, 12:35:32 PM10/17/06
to sqlalchemy

Christoph Zwerschke wrote:

> But that expression needs to be parsed to SQL anyway, and doing so all
> involved columns could be tracked. Another clue is that the class for
> the relation object has been stated as 'Group' which is mapped to
> pg_group. Why do you actually need a Foreign*Key*, not a Foreign*Table*?

yes, i could absolutely parse it out. you should dig around the source
code a little bit to see how that works, including the diffs for the
revsion i just made for this one which illustrate how its done.

with regards to "foreignkey", i am just not totally comfortable
throwing some more "automatic" behavior in there just yet, for reasons
already stated, since its easy enough to state it explicitly.

the "foreignkey" parameter is still undergoing an evolution and a
clarification, which is one reason its still a little murky as to what
its used for and what it needs to know. it accomplishes multiple
things and i have not yet worked out a really good way to clarify its
individual roles.

currently, heres why its a list of columns:

- for a self-referential mapper or other self-referring table
relationship, where theres only one table anyway...we need to know
specific columns in that case in order to determine what kind of
relationship we are looking at.

- it can account for all the foreign key columns in a more complicated
join condition, where maybe only one of those columns actually points
to the target table and therefore has meaning.

- it is used in the determination of the "lazy clause" in a
self-referential table relationship so that it can determine what
columns in the clause get converted into a bind parameter for a lazy
load.

- i have recently stepped up its role to also indicate which
column-mapped attributes should actually be "synchronized" when objects
are connected together or detached during flush time. here is a
snippet of the recent example someone had (which is also present in the
"relationships.py" unit test if you want to play with it):

pageversions = Table("pageversions", metadata,
Column("jobno", Unicode(15), primary_key=True),
Column("pagename", Unicode(30),
primary_key=True),
Column("version", Integer, primary_key=True,
default=1),
)
)
pages = Table("pages", metadata,
Column("jobno", Unicode(15),
ForeignKey("jobs.jobno"), primary_key=True),
Column("pagename", Unicode(30),
primary_key=True),
Column("current_version", Integer))

mapper(Page, pages, properties={
'currentversion': relation(PageVersion,
foreignkey=pages.c.current_version,

primaryjoin=and_(pages.c.jobno==pageversions.c.jobno,

pages.c.pagename==pageversions.c.pagename,

pages.c.current_version==pageversions.c.version),
post_update=True),
'versions': relation(PageVersion, cascade="all,
delete-orphan",

primaryjoin=and_(pages.c.jobno==pageversions.c.jobno,

pages.c.pagename==pageversions.c.pagename),
order_by=pageversions.c.version,
backref=backref('page', lazy=False,

primaryjoin=and_(pages.c.jobno==pageversions.c.jobno,

pages.c.pagename==pageversions.c.pagename)))
})

above, the "Page" object points to a list of PageVersion objects, via
the "versions" relation. however, there is also a second relationship
between them called "currentversion", which refers to the "current"
entry in the "versions" relation. When a given PageVersion becomes the
"currentversion", the "current_version" column in Page gets set to its
"version" value, and when it is removed as the "currentversion", the
"version" value gets removed. However, the other columns in the join
condition (jobno and pagename) do not change. In this example they are
primary key columns, and nulling them out raises an error on flush.
the "foreignkey" here refers only to "current_version" and indicates
the only column that actually needs to be changed when
attaching/detaching the "currentversion" to "Page".

its pretty impressive to me how far SA has managed to go with its
current relationship-definition model, which is literally a SQL
expression and a list of columns...because it is so sparse yet it is
more powerful than any other system I have ever seen (including
hibernate by a mile). it might be time for a less sparse approach to
the whole thing, for reasons of clarity, but it hasnt really dawned on
me yet what that might look like.

Reply all
Reply to author
Forward
0 new messages