I'm considering a project to model genomic variants and their associated
phenotypes. (Phenotype is a description of the observable trait, such as
disease or hair color.) There are many types of variation, many types of
phenotypes, and many types of association. By "type", I mean that they
have distinct structure (column names and inter-row dependencies). The
abstract relations might look like this:
variant association phenotype
------- ----------- ---------
variant_id --------- variant_id +------- phenotype_id
genome_id phenotype_id -----+ short_descr
strand origin_id (i.e., who) long_descr
start_coord ts (timestamp)
stop_coord
There are several types of variants, such as insertions, deletions,
inversions, copy-number variants, single nucleotide polymorphisms,
translocations, and unknowable future genomic shenanigans.
Phenotypes might come from ontologies or controlled vocabularies that
need a graph structure, others domains might be free text. Each is
probably best-served by a subclass table.
Associations might be quantitative or qualitative, and would come from
multiple origins.
The problem that arises is the combinatorial nature of the schema design
coupled with the lack of inherited primary keys. In the current state
of PG, one must (I think) make joining tables (association subclasses)
for every combination of referenced foreign keys (variant and phenotype
subclasses).
So, how would you model this data? Do I ditch inheritance?
Thanks,
Reece
--
Sent via pgsql-general mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is an association, for example, an experiment that establishes a
dependent relationship? So could there be multiple associations
between variant and phenotype?
> The problem that arises is the combinatorial nature of the schema design
> coupled with the lack of inherited primary keys. In the current state
> of PG, one must (I think) make joining tables (association subclasses)
> for every combination of referenced foreign keys (variant and phenotype
> subclasses).
>
Is your concern that the number of joins will grow exponentially in
the number of variants and phenotypes?
> So, how would you model this data? Do I ditch inheritance?
I've put some thought into representing an ontology via table
inheritance, and I've never been able to figure out a good way ( of
course, that's not to say one doesn't exist... ) .
If I understand your problem correctly, I would use composite keys (
ie ( variant type, id ) ) and then use an extra join to separate
ontology tables to restrict searches to specific branches. So all
variants would be stored in the variants table, all phenotypes are in
the phenotypes table, and you join through association. It's not as
elegant as inheritance, but it will be much more flexible in both the
types of queries that you can write and in case your requirements
change.
-Nathan
I'm missing what you're doing here that foreign keys don't cover.
Could you send along your DDL?
Just generally, I've only found table inheritance useful for
partitioning. "Polymorphic" foreign key constraints can be handled
other ways such as the one sketched out below.
http://archives.postgresql.org/sfpug/2005-04/msg00022.php
Cheers,
David.
--
David Fetter <da...@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david....@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Is an association, for example, an experiment that establishes a dependent relationship? So could there be multiple associations between variant and phenotype?
Is your concern that the number of joins will grow exponentially in the number of variants and phenotypes?
So all variants would be stored in the variants table, all phenotypes are in the phenotypes table, and you join through association.
I'm missing what you're doing here that foreign keys don't cover.
Could you send along your DDL?
Just generally, I've only found table inheritance useful for
partitioning. "Polymorphic" foreign key constraints can be handled
other ways such as the one sketched out below.
How many (order of magnitude) are we talking about here?
> > Just generally, I've only found table inheritance useful for
> > partitioning. "Polymorphic" foreign key constraints can be
> > handled other ways such as the one sketched out below.
>
> That answers the question -- I do want polymorphic foreign keys.
> Dang.
It solved some real-world problems I had at the time, mostly from the
game space. My biology is a little rusty :/
Cheers,
David.