Hi Marco,
Thank you for your question about the difference between the GENCODE
Basic and Comprehensive annotation sets. As you noted, the Basic
annotation set is a subset of the Comprehensive set. GENCODE
describes this Basic annotation set as:
What is the "basic" annotation in the GTF/GFF3?
The transcripts tagged as "basic" form part of a subset
of representative transcripts for each gene. This subset
prioritises full-length protein coding transcripts over partial
or non-protein coding transcripts within the same gene, and
intends to highlight those transcripts that will be useful to
the majority of users.
from here:
http://www.gencodegenes.org/faq.html. The GENCODE track
description page also contains information on the selection criteria
used for this Basic annotation set here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=wgEncodeGencodeV22#basicSetSelection.
If you have questions about these selection criteria, I would
recommend contacting GENCODE at
http://www.gencodegenes.org/contact.html.
I hope this is helpful. If you have any further questions, please
reply to
gen...@soe.ucsc.edu. All messages sent to that address are
archived on a publicly-accessible Google Groups forum. If your
question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group