Postgresql and reflection: system tables VS information_schema

460 views
Skip to first unread message

rdunklau

unread,
Oct 4, 2011, 12:17:49 PM10/4/11
to sqlalchemy
Hello.

I'm having a problem with sqlalchemy and postgresql, and I really
don't know what should I do to correct it.

I'm using sqlalchemy. 0.7.3 with postgresql 2.7.2, with the psycopg2
driver and dialect.

I'm using the ORM with reflection, and as I renamed a primary key
column (from id to product_id), I'm now getting the following
exception:


sqlalchemy.exc.ArgumentError: Mapper Mapper|Product|product could not
assemble any primary key columns for mapped table 'product'

Looking through the logs, I've found that the following query is used
to find primary keys:


SELECT attname FROM pg_attribute
WHERE attrelid = (
SELECT indexrelid FROM pg_index i
WHERE i.indrelid = %(table_oid)s
AND i.indisprimary = 't')
ORDER BY attnum

I've tried to run this query myself, and the result is 'id', and not
'product_id'.

However, when querying the information schema instead of the system
tables, I've got the following results:

select cu.column_name
from information_schema.table_constraints tc
inner join information_schema.key_column_usage cu
on cu.constraint_name = tc.constraint_name and
cu.table_name = tc.table_name and
cu.table_schema = tc.table_schema
where cu.table_name = 'product' and
constraint_type = 'PRIMARY KEY'
and cu.table_schema = 'public';


column_name
-------------
product_id

Is this a postgresql bug for not updating the attribute name in the
system catalog ?
Why did sqlalchemy choose to query the system tables instead of the
information_schema ?
I'm not really sure, but it looks like the system tables are not meant
to be used for this kind of introspection.
Does anyone have any workaround (besides the obvious: remap the table
manually) ?

Thank you.

--
Ronan Dunklau


Michael Bayer

unread,
Oct 4, 2011, 12:48:04 PM10/4/11
to sqlal...@googlegroups.com

On Oct 4, 2011, at 12:17 PM, rdunklau wrote:

> Hello.
>
> I'm having a problem with sqlalchemy and postgresql, and I really
> don't know what should I do to correct it.
>
> I'm using sqlalchemy. 0.7.3 with postgresql 2.7.2, with the psycopg2
> driver and dialect.

I'm not aware of what versioning system "2.7.2" might be a part of, that is not any such version I'm aware of. PG was at version 6 in 1997 and the current version is 9 - the oldest PGs in use would be version 8 or 7 (see http://en.wikipedia.org/wiki/PostgreSQL#Major_releases for a full history).

Can you confirm the version of Postgresql in use ? Data catalog errors tend to be version dependent.

>
> Looking through the logs, I've found that the following query is used
> to find primary keys:
>
>
> SELECT attname FROM pg_attribute
> WHERE attrelid = (
> SELECT indexrelid FROM pg_index i
> WHERE i.indrelid = %(table_oid)s
> AND i.indisprimary = 't')
> ORDER BY attnum
>
> I've tried to run this query myself, and the result is 'id', and not
> 'product_id'.
>

While this kind of error has been seen in other areas, currently I'm not able to reproduce - PG has altered how they handle their data catalogs across versions so the exact version of PG here is of key importance.

I've also committed the corresponding test for below to the current tip:

CREATE TABLE t (
id SERIAL NOT NULL,
PRIMARY KEY (id)
)
COMMIT

alter table t rename id to t_id
COMMIT

SELECT attname FROM pg_attribute
WHERE attrelid = (
SELECT indexrelid FROM pg_index i
WHERE i.indrelid = %(table_oid)s
AND i.indisprimary = 't')
ORDER BY attnum

INFO:sqlalchemy.engine.base.Engine:{'table_oid': 11096193}
DEBUG:sqlalchemy.engine.base.Engine:Col ('attname',)
DEBUG:sqlalchemy.engine.base.Engine:Row (u't_id',)

the correct value "t_id" is returned.

> Is this a postgresql bug for not updating the attribute name in the
> system catalog ?

probably

> Why did sqlalchemy choose to query the system tables instead of the
> information_schema ?

we used information_schema many years ago. At least back then, its performance was orders of magnitude slower and it also does not return complete data.

> I'm not really sure, but it looks like the system tables are not meant
> to be used for this kind of introspection.

The information schema views are extremely slow for big reflection jobs. The PG tables also have information on PG specific features, which we need to get at:

"The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views. "

Most articles/howtos/help/documentation I've ever seen on the subject use the pg_ tables, some examples:

http://crafted-software.blogspot.com/2011/03/get-all-column-names-of-table-in.html
http://www.polak.ro/postgresql-select-tables-names.html
http://www.alberton.info/postgresql_meta_info.html (illustrates using both INFO_SCHEMA and pg_ tables)

I've also analyzed the queries emitted by the pgAdmin3 tool and it uses the pg_ tables. We've had lots of bugs with PG reflection and the system tables have always ultimately had the right information, though sometimes very hard to get at.

> Does anyone have any workaround (besides the obvious: remap the table
> manually) ?

SQLAlchemy can attempt to work around the issue if a known workaround for the PG version in question is present, or alternatively you can provide the primary key column of the Table manually using the technique described at http://www.sqlalchemy.org/docs/core/schema.html#overriding-reflected-columns .


>
> Thank you.
>
> --
> Ronan Dunklau
>
>

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Michael Bayer

unread,
Oct 5, 2011, 10:39:58 AM10/5/11
to rdunklau, sqlalchemy
On Oct 5, 2011, at 2:58 AM, rdunklau wrote:


the correct value "t_id" is returned.

I ran this exact same test on my postgresql 9.1.1 install, and it
fails.

OK then its a PG 9 thing.  We had an almost identical issue involving indexes and this is likely the same thing, this bug can be tracked at http://www.sqlalchemy.org/trac/ticket/2291 .

For the moment which you can work around with the manual Column.





we used information_schema many years ago.  At least back then, its performance was orders of magnitude slower and it also does not return complete data.
The information schema views are extremely slow for big reflection jobs.   The PG tables also have information on PG specific features, which we need to get at:

"The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views.  "

Most articles/howtos/help/documentation I've ever seen on the subject use the pg_ tables, some examples:


I've also analyzed the queries emitted by the pgAdmin3 tool and it uses the pg_ tables.   We've had lots of bugs with PG reflection and the system tables have always ultimately had the right information, though sometimes very hard to get at.

Thank you for this comprehensive explanation.


Does anyone have any workaround (besides the obvious: remap the table
manually) ?

SQLAlchemy can attempt to work around the issue if a known workaround for the PG version in question is present, or alternatively you can provide the primary key column of the Table manually using the technique described athttp://www.sqlalchemy.org/docs/core/schema.html#overriding-reflected-....

I'll try that.

--
Ronan Dunklau

rdunklau

unread,
Oct 5, 2011, 11:08:49 AM10/5/11
to sqlalchemy
Thanks for everything.

On 5 oct, 16:39, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Oct 5, 2011, at 2:58 AM, rdunklau wrote:
>
>
>
> >> the correct value "t_id" is returned.
>
> > I ran this exact same test on my postgresql 9.1.1 install, and it
> > fails.
>
> OK then its a PG 9 thing.  We had an almost identical issue involving indexes and this is likely the same thing, this bug can be tracked athttp://www.sqlalchemy.org/trac/ticket/2291.
>
> For the moment which you can work around with the manual Column.
>
>
>
>
>
>
>
>
>
> >> we used information_schema many years ago.  At least back then, its performance was orders of magnitude slower and it also does not return complete data.
> >> The information schema views are extremely slow for big reflection jobs.   The PG tables also have information on PG specific features, which we need to get at:
>
> >> "The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views.  "
>
> >> Most articles/howtos/help/documentation I've ever seen on the subject use the pg_ tables, some examples:
>
> >>http://crafted-software.blogspot.com/2011/03/get-all-column-names-of-...using both INFO_SCHEMA and pg_ tables)

Michael Bayer

unread,
Oct 5, 2011, 11:56:09 AM10/5/11
to sqlal...@googlegroups.com
that bug is fixed, you can get the tip at http://hg.sqlalchemy.org/sqlalchemy/archive/default.tar.gz .

Jon Nelson

unread,
Oct 5, 2011, 11:58:35 AM10/5/11
to sqlal...@googlegroups.com
On Wed, Oct 5, 2011 at 10:56 AM, Michael Bayer <mik...@zzzcomputing.com> wrote:
> that bug is fixed, you can get the tip at http://hg.sqlalchemy.org/sqlalchemy/archive/default.tar.gz .

Cool.
Regarding the issues surrounding postgresql introspection and the
information_schema -- perhaps if patches were offered to
correct/improve information_schema the postgresql folks would accept
them? It seems to me that the current approach of tweaking sqlalchemy
for each postgresql release is ultimately a losing game, and that
getting the /right/ functionality into upcoming releases and versions
of postgresql is a win (for everybody).

--
Jon

Michael Bayer

unread,
Oct 5, 2011, 12:07:40 PM10/5/11
to sqlal...@googlegroups.com
its not clear to me that we're continuously tweaking things to suit new releases, as much as the PG folks considered the query we had in place to be incorrect in the first place (there's a thread somewhere to this effect I'd have to search for). The info schema views can't be relied upon 100% in any case as we need enums and domains too, they are still specific to PG (we tried, years ago, to share the same info schema tables with MS SQL and PG - there's no chance of that, they are still vendor-specific in a big way), and I'm not sure how their performance can be improved if they rely upon very awkward views on top of PG tables that aren't at all in the same structure.

It would also have to be re-established what the performance issues were in the first place with our old reflection code - someone would need to pull up old 0.2 or 0.3 code, whenever it was, see what we were doing, re-analyze all the issues involved, etc.

Reply all
Reply to author
Forward
0 new messages