Cavanaugh, Mark (NIH/NLM/NCBI) [E]
unread,Dec 21, 2012, 1:18:58 PM12/21/12You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to 'genbankb@net.bio.net' (genbankb@net.bio.net)
Greetings GenBank Users,
As described in the announcement for GenBank 193.0 availability,
we are providing files which catalog the contents of a release.
The genbank/catalog directory at the NCBI FTP site now contains
these files:
gb193.catalog.est.txt.gz
gb193.catalog.gss.txt.gz
gb193.catalog.other.txt.gz
gb193.gene_list.gss.txt.gz
gb193.gene_list.other.txt.gz
gb193.pmid_list.est.txt.gz
gb193.pmid_list.gss.txt.gz
gb193.pmid_list.other.txt.gz
The format and content of these files is described in Section 1.3.4
of the GenBank 193.0 release notes (gbrel.txt).
Note that there is no gene_list file for EST, because EST records
at the NCBI are not annotated with anything other than source
features.
There is one known issue involving the Division-Code field
of the catalog : Finished sequence records that originated
in clone-based high-throughput genome sequencing (HTG) projects
have a division code of "HTG", even though those sequence
records may have moved to (for example) the PRI division,
upon completion. We're considering a change that would make
this column contain multiple values, to reflect the fact
that a sequence can be categorized in multiple ways. For
example: "HTG,PRI" or "GSS,ENV" .
So obviously these products are still in a bit of flux.
Now would be a good time to pass along any suggestions
that you might have for the content and structure of these
catalog, and related, files.
Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS