Warning when indexing genome

183 views
Skip to first unread message

Zoe Ward

unread,
Mar 6, 2017, 1:18:08 AM3/6/17
to rna-star
Hi Alex,

I'm trying to index the human Ensembl GRCh38 fasta with the NONCODE.gtf and get several warnings that the gene/transcripts ids in the gtf file are not recognised by the fasta file.
Do you know how I can get the two formats to recognise each other?? If I don't, any reads that should map to these features will go unmapped is that correct??

Thanks,

Zoe

Zoe Ward

unread,
Mar 6, 2017, 3:14:36 PM3/6/17
to rna-star
Apologies, I meant to attach the Log.out file.
Log.out.gz

Alexander Dobin

unread,
Mar 7, 2017, 12:25:43 PM3/7/17
to rna-star
Hi Zoe,

those warnings are about the transcript which reside on chromosomes not present in the FASTA file, such as
      3 10_GL383546v1_alt
      1 10_KI270825v1_alt
     18 11_KI270721v1_random
      2 11_KI270827v1_alt
      8 11_KI270829v1_alt
            ............
You can get them with
$ awk '$2=="Cufflinks" {print $1}' Log.out | sort  | uniq -c
You can, in principle, rename at least some the chromosome IDs in the GTF file to match those in the FASTA file.
I think it's also relatively safe to just ignore them, since they do not reside on major chromosomes, except for two transcript on "X" which I guess should correspond to chrX.

Cheers
Alex

Zoe Ward

unread,
Mar 7, 2017, 3:56:52 PM3/7/17
to rna-star
Thanks for that. I suspected this but wanted your second opinion. Regarding the two transcripts on the X cme, do you have any advice how to get around this as obviously the X chromosome is present in the fasta file???? (with the name matching):

cat GRCh38_all.fa | grep '^>'

>1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
>10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
>11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
>12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF
>13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF
>14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF
>15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF
>16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF
>17 dna:chromosome chromosome:GRCh38:17:1:83257441:1 REF
>18 dna:chromosome chromosome:GRCh38:18:1:80373285:1 REF
>19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF
>2 dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF
>20 dna:chromosome chromosome:GRCh38:20:1:64444167:1 REF
>21 dna:chromosome chromosome:GRCh38:21:1:46709983:1 REF
>22 dna:chromosome chromosome:GRCh38:22:1:50818468:1 REF
>3 dna:chromosome chromosome:GRCh38:3:1:198295559:1 REF
>4 dna:chromosome chromosome:GRCh38:4:1:190214555:1 REF
>5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF
>6 dna:chromosome chromosome:GRCh38:6:1:170805979:1 REF
>7 dna:chromosome chromosome:GRCh38:7:1:159345973:1 REF
>8 dna:chromosome chromosome:GRCh38:8:1:145138636:1 REF
>9 dna:chromosome chromosome:GRCh38:9:1:138394717:1 REF
>MT dna:chromosome chromosome:GRCh38:MT:1:16569:1 REF
>X dna:chromosome chromosome:GRCh38:X:1:156040895:1 REF
>Y dna:chromosome chromosome:GRCh38:Y:2781480:56887902:1 REF
Plus all of the scaffolds..... 

Alexander Dobin

unread,
Mar 7, 2017, 4:14:17 PM3/7/17
to rna-star
Aah, sorry, those two warnings were for a different problem:

WARNING: while processing sjdbGTFfile=NONCODE_gtf/noncode.gtf: no gene_id for line:
X       Cufflinks       exon    73501540        73501683        0       -       .       gene_id ""; transcript_id "NONHSAT147739.1"; FPKM "0"; exon_number 2;
WARNING: while processing sjdbGTFfile=NONCODE_gtf/noncode.gtf: no gene_id for line:
X       Cufflinks       exon    73504578        73504697        0       -       .       gene_id ""; transcript_id "NONHSAT147739.1"; FPKM "0"; exon_number 2;

Not sure why they have empty gene_id. You can try to check what gene the transcript  NONHSAT147739.1 corresponds to in the GTF file.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages