gtfToGenePred error: Unpaired type(FLH191188.01X;)/val on end of gtf line[...]

174 views
Skip to first unread message

Christian Brueffer

unread,
Aug 5, 2016, 1:25:51 PM8/5/16
to gen...@soe.ucsc.edu
Hi everyone,

I have a GTF file which produces the following error when fed into
gtfToGenePred (latest from the UCSC website, so v336 I suppose):

~/gtfToGenePred -allErrors -ignoreGroupsWithoutExons -genePredExt
ref-transcripts.gtf ref-transcripts.genePred
Unpaired type(FLH191188.01X;)/val on end of gtf line 126843 of
ref-transcripts.gtf

The line in question is the following, and an input file containing
solely this line also provokes the problem:

chr1 hg38_knownGene exon 206586988 206589275 0 + . knownCanonicalChrom
"NA"; RZPDo839A0777D "Ras association (RalGDS/AF-6) domain family 5
(RASSF5) gene, encodes complete protein.""; description ""Synthetic
construct clone IMAGE:100005591"; protAcc; knownCanonicalChromStart
"NA"; knownToRefSeq "NA"; knownIsoformsClusterID "3367"; gene_id
"uc031vlt.1"; alias "DQ892961"; transcript_id "uc031vlt.1"; mRNA
"DQ892961"; knownCanonicalChromEnd "NA"; knownCanonicalClusterId "NA";
refseq; locusLink "NA"; geneSymbol "RASSF5"; FLH191188.01X;

The line looks like valid GTF to me, any idea?

Cheers,

Chris

Christopher Lee

unread,
Aug 11, 2016, 3:07:28 PM8/11/16
to Christian Brueffer, UCSC Genome Browser Discussion List

Hi Chris,

Thank you for your question about gtfToGenePred.

The 9th field of your GTF file has to start with: 'gene_id "value"; "transcript_id "value";'. I rearranged your problematic GTF line like so:

chr1    hg38_knownGene  exon    206586988       206589275       0       +       .       gene_id "uc031vlt.1"; transcript_id "uc031vlt.1"; knownCanonicalChrom "NA"; RZPDo839A0777D "Ras association (RalGDS/AF-6) domain family 5 (RASSF5) gene, encodes complete protein.""; description ""Synthetic construct clone IMAGE:100005591"; protAcc; knownCanonicalChromStart "NA"; knownToRefSeq "NA"; knownIsoformsClusterID "3367";  alias "DQ892961";  mRNA "DQ892961"; knownCanonicalChromEnd "NA"; knownCanonicalClusterId "NA"; refseq; locusLink "NA";

and gtfToGenePred worked correctly:
$ gtfToGenePred -allErrors -ignoreGroupsWithoutExons -genePredExt refTranscripts.gtf refTranscripts.genePred
$ cat refTranscripts.genePred 
uc031vlt.1    chr1    +    206586987    206589275    206589275    206589275    1    206586987,    206589275,    0    uc031vlt.1    none    none    -1,

For more information about the GTF format please see the following:
http://genome.ucsc.edu/FAQ/FAQformat.html#format4
http://mblab.wustl.edu/GTF2.html

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further
questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible forum. If your question includes sensitive data, you may send it instead
to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.


Christian Brueffer

unread,
Aug 15, 2016, 10:47:57 AM8/15/16
to UCSC Genome Browser Discussion List
Hi Christopher,

that was too obvious, thanks a lot!

Cheers,

Chris


On 2016-08-11 21:07, Christopher Lee wrote:
> Hi Chris,
>
> Thank you for your question about gtfToGenePred.
>
> The 9th field of your GTF file has to start with: 'gene_id "value";
> "transcript_id "value";'. I rearranged your problematic GTF line like so:
>
> chr1 hg38_knownGene exon 206586988 206589275 0 + . gene_id "uc031vlt.1"; transcript_id "uc031vlt.1"; knownCanonicalChrom "NA"; RZPDo839A0777D "Ras association (RalGDS/AF-6) domain family 5 (RASSF5) gene, encodes complete protein.""; description ""Synthetic construct clone IMAGE:100005591"; protAcc; knownCanonicalChromStart "NA"; knownToRefSeq "NA"; knownIsoformsClusterID "3367"; alias "DQ892961"; mRNA "DQ892961"; knownCanonicalChromEnd "NA"; knownCanonicalClusterId "NA"; refseq; locusLink "NA";
>
>
> and gtfToGenePred worked correctly:
>
> $ gtfToGenePred -allErrors -ignoreGroupsWithoutExons -genePredExt refTranscripts.gtf refTranscripts.genePred
> $ cat refTranscripts.genePred
> uc031vlt.1 chr1 + 206586987 206589275 206589275 206589275 1 206586987, 206589275, 0 uc031vlt.1 none none -1,
>
>
> For more information about the GTF format please see the following:
> http://genome.ucsc.edu/FAQ/FAQformat.html#format4
> http://mblab.wustl.edu/GTF2.html
>
> Thank you again for your inquiry and using the UCSC Genome Browser. If
> you have any further
> questions, please reply to gen...@soe.ucsc.edu
> <mailto:gen...@soe.ucsc.edu>. All messages sent to that address are archived
> on a publicly-accessible forum. If your question includes sensitive
> data, you may send it instead
> to genom...@soe.ucsc.edu <mailto:genom...@soe.ucsc.edu>.
> <mailto:genome%2Bunsu...@soe.ucsc.edu>.
>
>
Reply all
Reply to author
Forward
0 new messages