Dear zebrafish biologists,
For those of you interested in using or constructing more accurate gene
models for zebrafish, please consider this new set from EvidentialGene
project. If you have or know of anyone who may want to help me finish this
gene set or check its quality, contact me (Don Gilbert).
I've reconstructed an improved Zebrafish gene set using EvidentialGene's new
automated SRA2Genes pipeline. This collects several EvidentialGene tested
methods into a complete, automated (nearly) gene set reconstruction
pipeline for fetching public RNA-seq from NCBI SRA, over-assembling then
reducing to an accurate non-redundant gene set, with annotation and
formatting for public database submission.
Preliminary zebrafish17evigene gene set info is at
http://eugenes.org/EvidentialGene/vertebrates/zebrafish/
This EvidentialGene package including omnibus
evgpipe_sra2genes.pl
is available at
http://arthropods.eugenes.org/EvidentialGene/other/evigene_old/
evigene18jan01.tar (draft2 of evgpipe_sra2genes)
Zebrafish is a good case for automated gene reconstruction from RNA, as it
is in top 10 of those with public RNA-seq studies, and my prior work with
fish genes suggested the published zfish genes may be amenable to
improvements. That proved true .. this Evigene draft set is more complete
and accurate in representing zebrafish genes than Ensembl or NCBI gene
sets by measures of gene orthology. The Evigene set is built from RNA assembly
only, without using chromosomes or other species genes to reconstruct.
This draft gene set is missing or inaccurate for some genes, which I hope
to have time or help to correct. When one has *too much data*, as in this
case of zebrafish RNA, selecting the best subsets for a need can be
an effort. Michael Metzger asked here with the same sort of question
on "tons of zebrafish datasets in the SRA..",
> zebrafish line-specific genome resources
> Fri Dec 15 17:33:04 EST 2017
>
> Does anyone know of any line-specific genome resources?...
> There are tons of zebrafish datasets in the SRA
> databases, but very little documentation, even at the level of what strain
> was used or what study the reads come from. ..
>
> Michael J Metzger
As for documenting the strain of zfish data sets in SRA, my suggestion is
do it yourself to some degree: there appear to be strain-level (TL, AB,
other) chromosome assemblies at NCBI genbank for zfish, so one could
quick-align any SRA read set against those, count identities to decide the
strain of the SRA read set. NCBI now has a species-level identity check of
SRA read sets, but that won't get you the strain level.
- Don Gilbert, gilbertd At
indiana.edu or gilbert.bionet At
gmail.com
BTW, yes, I maintain this
zbra...@net.bio.net mail list, along with a few
score other Bionet biology groups. I also collaborated with ZFIN and
other model orgs to build first cross-model genome/gene summary database,
MEOW which became
eugenes.org (not that Eugenes in Oregon.) ZFIN folks
worked well with that initial collaboration, which did not succeed fully
but good luck with the new
alliancegenome.org.
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd@indiana.edu--
http://marmot.bio.indiana.edu/