How to find gene family in plant genome

9 views
Skip to first unread message

Qiang Lin

unread,
Sep 8, 2011, 7:41:31 PM9/8/11
to bioinformatics_linq26
I have searched the topic on website for help to deal with our work,
There are some original and summarized results:

1. (noun) Homologous proteins share a common ancestry, and may be
characterized as either
orthologs (which evolve by speciation only) ---can be used to
function inference
or
paralogs (which arise by gene duplication).
Orthologs typically retain similar domain architecture and occupy the
same functional niche following speciation, while (functionally
redundant) paralogs are likely to diverge with new functions through
point mutations and domain recombinations

2. “we not only recognized the usefulness of multiple and at times
contradictory criteria, but also the need for a common understanding
on their usage and interpretation. Finally, we agreed that adopting a
common dataset would eliminate inconsistent use of splicing variants,
IDs or data sources and, therefore, greatly facilitate benchmarking.” -
so not always the tool's output is reliable.
"Joining forces in the quest for orthologs",Genome Biol. 2009

3. Easy misleading viewpoints
a. orthology is a purely evolutionary concept; some orthologys can
be different functional by neo- or subfunctionalization
“the most important factor in the evolution of function is not
amino acid sequence, but rather the cellular context in which
protein act“ --Testing the Ortholog Conjecture with Comparative
Functional Genomic Data from Mammals”

b. orthology or paralogy relationship can be extend to their
descendants. But orthology, in contrast to homology, is not
transitive.
If a gene A is orthologous to B and B to C, A and C are not
necessarily orthologous to each other
c. protions in orthology relationships often comprise distinct
domains that may have followed different evolutionary histories

4. Method
1) The best reciprocal hits (BBH) approach can only account
for one-to-one orthology relationships. Therefore, if gene duplication
have taken place in any of the two compared lineages after their
divergence, a one-to-many or a many-to-many relationship will be
necessary to properly describe their orthology relationships. In such
cases the BBH approach will miss many true orthologs.
To avoid these pitfalls and extend the procedure to multiple genome
comparisons
EGO ; STRING
Inparanoid ; OrthoMCL (extend to multigenome compare)
COCO-CL ; OrthoDB
2) Phylogeny-based
main limitation is some species trees are unreliable;
"As a result, these methods are very sensitive to slight
variations in the topology or the rooting of the gene tree and, when
applied at a large scale they perform similarly to and even worse than
standard pairwise methods and need manual curation."
OrthologID; RIO
NOTUNG; "soft parsimony"
(Species-overlap methods) LOFT; PhylomeDB
“Recent analyses show that phylogeny-based methods are
less prone to error than similarity-based approaches”

"Large-scale assignment of orthology: back to phylogenetics?" Genome
Biol. 2008


Reference web:
http://biostar.stackexchange.com/questions/6788/popular-methods-or-tools-to-determine-gene-families-in-a-newly-sequenced-genome
"OPTIC: orthologous and paralogous transcripts in clades" NAR,2008
"Assessing Performance of Orthology Detection Strategies Applied to
Eukaryotic Genomes" PlosONE 2007
http://orthomcl.org/common/downloads/software/

Dawei Huang

unread,
Sep 9, 2011, 2:57:20 AM9/9/11
to bioinforma...@googlegroups.com
Great work!

DaWei Huang
Sent from my iPhone

Reply all
Reply to author
Forward
0 new messages