PASA annotation questions and error

811 views
Skip to first unread message

reema...@gmail.com

unread,
Feb 26, 2015, 2:31:44 PM2/26/15
to trinityrn...@googlegroups.com
Hello All,

I am using PASA for the annotation of our two de-novo assemblies for two different but related species using Trinity. PASA tools works perfectly in both cases. However, there is a little problem I am facing currently. So here's my question and error:-

1) Question :- After running "Launch_PASA_pipeline.pl" a bunch of files generated. And on of them is "alignment.validations.output". During the examination of this file, I find some "ERROR" entries in 9th column i.e. "alignment_valid coord_span". My question is what is this ERROR mean here? 

2) Error :- In the "Loading pre-existing protein-coding gene annotation" section - It has been mentioned that before laoding your gff3 file in the database- one has to check the validity of the gff3 file using "pasa_gff3_validator.pl". Now I am getting error when I tried to run this on both of my gff3 files. Here's the errors :-

Assembly_Sepecies1 = "Fatal Error: cannot parse ID from entry chrM dictyBase Curator CDS 36 1658 . + . Parent=DDB0201582 at /sw/opt/PASA/misc_utilities/pasa_gff3_validator.pl line 54, <$fh> line 1."

Assembly_Species2 = "Fatal Error, cannot locate data entry for ID: [PPA1271346] at /sw/opt/PASA/misc_utilities/pasa_gff3_validator.pl line 119"

Here is the gene that present in line 119:-

"GL290983 GenBank gene 43626 45778 . - . ID=PPA_G1268120;Name=PPL_00094 GL290983 GenBank mRNA 43626 45778 . - . ID=PPA1268122;Name=PPL_00094.t00;Parent=PPA_G1268120 GL290983 GenBank exon 43626 43645 . - . ID=PPA1268124;Name=exon-auto1268124;Parent=PPA1268122 GL290983 GenBank exon 43760 43798 . - . ID=PPA1268126;Name=exon-auto1268126;Parent=PPA1268122 GL290983 GenBank exon 43926 44396 . - . ID=PPA1268128;Name=exon-auto1268128;Parent=PPA1268122 GL290983 GenBank exon 44604 44752 . - . ID=PPA1268130;Name=exon-auto1268130;Parent=PPA1268122 GL290983 GenBank exon 44860 45138 . - . ID=PPA1268132;Name=exon-auto1268132;Parent=PPA1268122 GL290983 GenBank exon 45254 45326 . - . ID=PPA1268134;Name=exon-auto1268134;Parent=PPA1268122 GL290983 GenBank exon 45416 45592 . - . ID=PPA1268136;Name=exon-auto1268136;Parent=PPA1268122 GL290983 GenBank exon 45733 45778 . - . ID=PPA1268138;Name=exon-auto1268138;Parent=PPA1268122"

I tried to delete the gene completely(gene,mRNA,exon) from line 119. But then again it stuck with the same line 119. When I was searching on web for Assembly_Species2 error. I find out following post :- http://sourceforge.net/p/pasa/mailman/message/32505345/. I have tried to solve this by using the solution suggested by Brian in the same post - " misc_utilities/gff3_file_to_proteins.pl gff3_file genome_db". But here I am not sure what is genome_db here? Is this the original genome fasta file? I have also followd that Jessica mentioned to use gff3 files directly without validating. But the problem here is that - in the final output it gives me the results without any modification. In each entries it says -"original gene structure, not modified by PASA". So I am not sure about the output.

Could I ask everyone for there views about what's going on here. I would highly appreciate any suggestion/help.

Many Thanks,

Reema Singh
Post-doctoral Research Assistant
The Pauline Schaap Lab and The Barton Group
Division of Cell and Developmental Biology and Division of Computational Biology
College of Life Sciences University of Dundee, Dundee, Scotland, UK
www.lifesci.dundee.ac.uk/groups/pauline_schaap/
www.compbio.dundee.ac.uk
twitter : @ReemaSingh28


Brian Haas

unread,
Feb 28, 2015, 2:32:50 PM2/28/15
to Reema Singh, trinityrn...@googlegroups.com
Hi Reema,

responses included below:

On Thu, Feb 26, 2015 at 2:31 PM, <reema...@gmail.com> wrote:
Hello All,

I am using PASA for the annotation of our two de-novo assemblies for two different but related species using Trinity. PASA tools works perfectly in both cases. However, there is a little problem I am facing currently. So here's my question and error:-

1) Question :- After running "Launch_PASA_pipeline.pl" a bunch of files generated. And on of them is "alignment.validations.output". During the examination of this file, I find some "ERROR" entries in 9th column i.e. "alignment_valid coord_span". My question is what is this ERROR mean here? 



Each of the initial transcript alignments is assessed for having proper consensus splice sites, percent identity, and percent aligned. If minimum thresholds aren't met, they're set to being invalid, and the 'error' term captures the reason why they were set to being invalid.  No invalid alignments are assembled by PASA.

 
2) Error :- In the "Loading pre-existing protein-coding gene annotation" section - It has been mentioned that before laoding your gff3 file in the database- one has to check the validity of the gff3 file using "pasa_gff3_validator.pl". Now I am getting error when I tried to run this on both of my gff3 files. Here's the errors :-

Assembly_Sepecies1 = "Fatal Error: cannot parse ID from entry chrM dictyBase Curator CDS 36 1658 . + . Parent=DDB0201582 at /sw/opt/PASA/misc_utilities/pasa_gff3_validator.pl line 54, <$fh> line 1."

Assembly_Species2 = "Fatal Error, cannot locate data entry for ID: [PPA1271346] at /sw/opt/PASA/misc_utilities/pasa_gff3_validator.pl line 119"

Here is the gene that present in line 119:-

"GL290983 GenBank gene 43626 45778 . - . ID=PPA_G1268120;Name=PPL_00094 GL290983 GenBank mRNA 43626 45778 . - . ID=PPA1268122;Name=PPL_00094.t00;Parent=PPA_G1268120 GL290983 GenBank exon 43626 43645 . - . ID=PPA1268124;Name=exon-auto1268124;Parent=PPA1268122 GL290983 GenBank exon 43760 43798 . - . ID=PPA1268126;Name=exon-auto1268126;Parent=PPA1268122 GL290983 GenBank exon 43926 44396 . - . ID=PPA1268128;Name=exon-auto1268128;Parent=PPA1268122 GL290983 GenBank exon 44604 44752 . - . ID=PPA1268130;Name=exon-auto1268130;Parent=PPA1268122 GL290983 GenBank exon 44860 45138 . - . ID=PPA1268132;Name=exon-auto1268132;Parent=PPA1268122 GL290983 GenBank exon 45254 45326 . - . ID=PPA1268134;Name=exon-auto1268134;Parent=PPA1268122 GL290983 GenBank exon 45416 45592 . - . ID=PPA1268136;Name=exon-auto1268136;Parent=PPA1268122 GL290983 GenBank exon 45733 45778 . - . ID=PPA1268138;Name=exon-auto1268138;Parent=PPA1268122"

I tried to delete the gene completely(gene,mRNA,exon) from line 119. But then again it stuck with the same line 119. When I was searching on web for Assembly_Species2 error. I find out following post :- http://sourceforge.net/p/pasa/mailman/message/32505345/. I have tried to solve this by using the solution suggested by Brian in the same post - " misc_utilities/gff3_file_to_proteins.pl gff3_file genome_db". But here I am not sure what is genome_db here? Is this the original genome fasta file? I have also followd that Jessica mentioned to use gff3 files directly without validating. But the problem here is that - in the final output it gives me the results without any modification. In each entries it says -"original gene structure, not modified by PASA". So I am not sure about the output.



The validation tools still need some updating.  Try running the: misc_utilities/gff3_file_to_proteins.pl,  using your genome fasta file for the 'genome_db' parameter, and see if that works or if it errors out.

best,

~brian



 
Could I ask everyone for there views about what's going on here. I would highly appreciate any suggestion/help.

Many Thanks,

Reema Singh
Post-doctoral Research Assistant
The Pauline Schaap Lab and The Barton Group
Division of Cell and Developmental Biology and Division of Computational Biology
College of Life Sciences University of Dundee, Dundee, Scotland, UK
www.lifesci.dundee.ac.uk/groups/pauline_schaap/
www.compbio.dundee.ac.uk
twitter : @ReemaSingh28


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Reema Singh

unread,
Feb 28, 2015, 11:13:57 PM2/28/15
to Brian Haas, trinityrn...@googlegroups.com
Thanks very much for the quick response Bryan.

I have tried that using "misc_utilities/gff3_file_to_proteins.pl Dicty.gff DictyBase28Aug2014.fas", but it gives error "Error, no gene feature found for DDB0237470.... ignoring feature - Error, cannot find sequence for DDB0232432 at /sw/opt/PASA/misc_utilities/gff3_file_to_proteins.pl line 53"  for a number of entries.  I have attached the gff3 file that I have used here. Likewise, in case of second assembly,the  gff [misc_utilities/gff3_file_to_proteins.pl P_Pal.gff3 P_Pal.fa] also gives the same error. I also have a related question regarding UTR regions- Do I need to define the seperate parameters during annotation updation/comparison for extended UTR? 

Relating to my first question in the previous email- I have one more query. After transcriptomics assembly, we have aligned these transcripts with the known cDNA and the already existing genome. And export all these alignment information in gff [This I named as  A] . Now we have also generated the pasa assembly by using this new assembled transcriptomes[B]. Now when we export these gff files in IGB and compare with the existing annotation - We find that there is UTR extension example in case A that is missing in B[Figure 1] [ This is in one assembly] Whereas in the another assembly the UTR extension example present in case A and completely missing from case B [Figure 2]. Now why is this different? Case A or B which one is correct?  Why the extended UTR regions mentioned in Case A [aligning assembled transcript with the existing cDNA using Blat] not supported by PASA, eventhough its also uses blat for alignment?

Figure 1 [ Species1 Assembly] and Figure 2 [ Species 2 assembly]
 Blue = annotation after aligning assembled transcript with the already existing cDNA
 Green = Existing curated annotation
 Red = annotation generated by PASA.

Many Thanks,
Reema, 

--
test.gff
Figure1.jpeg
Figure2.jpeg

Brian Haas

unread,
Mar 1, 2015, 9:16:08 AM3/1/15
to Reema Singh, trinityrn...@googlegroups.com
Hi Reema,

For importing your own annotations to be used as part of the annotation update functionality, be sure to use gff3 format that contains only 'gene', 'mRNA', 'exon', and 'CDS' features, as per:


and look at the example data that comes with PASA to examine the gff3 formatting used.

With regard to the images, you should see PASA assemblies where you have 'valid' alignments.  Any alignments that weren't considered valid will not be incorporated into pasa assemblies and will not be used for annotation updates.

The valid and invalid alignments are provided by PASA in separate gff3/gtf/bed files so you can easily import them into your browser for comparison.  Again, for every valid alignment, there should be a pasa assembly that incorporates it.

best,

~brian

Reema Singh

unread,
Mar 2, 2015, 6:18:03 AM3/2/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Bryan

Thanks for pointing this ,  I didn't realize there is a problem in my gff3 file format. I will change the format accordingly and will try again. Also, I have checked the invalid/failed gff3 file for the example used in Figure 1 [attached with last email], but it seems that section is missing from the invalid gff3 file as well. I will get back to the thread again after cross checking all these files

Thanks very much Bryan

Cheers,
Reema,


Reema Singh

unread,
Mar 5, 2015, 1:03:37 PM3/5/15
to Brian Haas, trinityrn...@googlegroups.com
Hi Brian,

I was just looking and comparing PASA annotation with the existing curated set of genes and trinity assembled transcripts. I find an example where trinity assembled existing transcripts with extended UTR in shown figure [ http://www.compbio.dundee.ac.uk/user/rsingh/Figure_PASA.jpeg ].  Here the transcript “comp5893_c34_seq1” aligns very well with the chr1. Here’s the description:-


comp5893_c34_seq1 (a) = 1400 (Length) [Dark Blue in figure]

1-62 (a) = 2639972 2640033(chr1) 100% identity

63-381 = 2640305 2640623

382-1400 = 2640791 2641809


For the curated gene DDB0191262 [Orange colour] [DDB_G0269146] are exactly same chromosome coordinates. Please check [http://dictybase.org/gene/DDB_G0269146/feature/DDB0191262] for detail information.  As evidence, I have also check similarity between DDB0191262 protein sequence and the longest ORF encoded by comp5893_c34_seq1 transcript. Both are same protein so similarity is 100%.  The read depth [Green and dark red for control and knockout sample] for this gene is also adding more confidence in the accuracy of this new transcript. But if when I look at the PASA annotation- it only shown a fragment[Red colour] and the rest of the transcript part is not even in the valid [blue colour fragments] and the failed alignment. Now this makes me curious to know what’s happening here.. Any ideas or opinion?


Also, If I will go for annotation comparison and updation then there would be no updation in the existing annotation even though figure clearly showing the extended UTR  [if I am not interpreting this wrong].  Why there is this difference? 


Thanks,

Reema,


 

Brian Haas

unread,
Mar 5, 2015, 1:24:21 PM3/5/15
to Reema Singh, trinityrn...@googlegroups.com
Hi Reema,

Can you tell me what the different tier labels are in the picture?  (ie. what's light blue and dark blue)?  Are they all files generated by PASA?

I suspect the 'good' alignments you're seeing were put in the invalid pile for some reason, and we can figure out what that reason was once I know for sure.

best,

~brian

Reema Singh

unread,
Mar 5, 2015, 5:56:06 PM3/5/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Brian,

The light blue color is for pasa valid alignment and the dark blue color is trinity transcript alignment with genome[this the annotation alignment file generated after aligning trinity transcript with genome using blat]. Please find the attached picture with the labels.

Thanks,
Reema,
Figure_PASA_Label.jpeg

Brian Haas

unread,
Mar 5, 2015, 6:37:13 PM3/5/15
to Reema Singh, trinityrn...@googlegroups.com
I see.  Can you load up the 'failed' alignments?  I expect that's where you'll see the transcripts of interest.   Right now, the only 'valid' gmap or blat alignments involve those little blue regions, that aren't so helpful.

If you 'grep' out the relevant transcripts from the 'alignment.validations.output' file, we'll be able to see what the reason was for the alignments not being considered valid, and excluded by pasa.


Reema Singh

unread,
Mar 6, 2015, 10:32:15 AM3/6/15
to Brian Haas, trinityrn...@googlegroups.com
Please find attached image. I have highlighted the blat and gmap failed alignment with cyan color. This transcript is absent from blat and gmap failed file.

Here's the grep output from  'alignment.validations.output' :-

-bash-4.1$ less alignment.validations.output | grep "comp5893_c34_seq1"
blat    comp5893_c34_seq1       15867   15749   chr1    3       +       +       OK      2639991-2641809 100100     138100  orient(a+/s+) align: 2639991(1)-2640033(43)>GT....AG<2640305(44)-2640623(362)>GT....AG<2640791(363)-2641809(1381)
gmap    comp5893_c34_seq1       15867   49257   chr1    3       +       +       OK      2639991-2641809 100100     138100  orient(a+/s+) align: 2639991(1)-2640033(43)>GT....AG<2640305(44)-2640623(362)>GT....AG<2640791(363)-2641809(1381)

Actually it seems quite nice long gene in "pasa_dicty_Alt1.pasa_assemblies.gff3" as well. please see :- 

"chr1    assembler-pasa_dicty_Alt1       cDNA_match      2639991 2640033 .       +       .       ID=align_68908;Target=asmbl_1441 1 43 +
chr1    assembler-pasa_dicty_Alt1       cDNA_match      2640305 2640623 .       +       .       ID=align_68908;Target=asmbl_1441 44 362 +
chr1    assembler-pasa_dicty_Alt1       cDNA_match      2640791 2641809 .       +       .       ID=align_68908;Target=asmbl_1441 363 1381 +
"


Figure_PASA_Failed.jpeg
Figure_PASA_Failed1.jpeg

Reema Singh

unread,
Mar 6, 2015, 10:58:10 AM3/6/15
to Brian Haas, trinityrn...@googlegroups.com
Hi Brian,

I think i get an idea what's the problem [ If I am not getting it wrong]. This problem is appear when I tried to import pasa generated gff3 file in IGB. On uploading .bed and .gtf file it works great[see the attached figure].
bright green = Pasa annotation .bed format
light blue = pasa annotation .gtf format
red = pasa annotation .gff3 format.

This generate one more question for me . i.e. If this is the specific problem related with .gff3 then it means for the "annotation comparison and updates" instead of gff3 i should use .gtf or .bed file?

Many Thanks,
Reema,


Figure_PASA_Solved.jpeg

Brian Haas

unread,
Mar 6, 2015, 11:33:22 AM3/6/15
to Reema Singh, trinityrn...@googlegroups.com
It might be something about the gff3 that is incompatible with IGB for some reason... I don't know.  It's hard to find a tool that always does what you want with these different formats. BED is definitely more reliable based on my experiences with other viewers (I haven't tried IGB yet).

For the annotation update, you need to provide a very different kind of gff3 file (gene, mRNA, exon, and CDS) as compared to this alignment-style gff3 file.

It's reassuring that the pasa assembly is showing up where it's supposed to now. It would be a shame to miss such nice-looking gene structure. :)

best,

~brian




Reema Singh

unread,
Mar 7, 2015, 5:38:11 PM3/7/15
to Brian Haas, trinityrn...@googlegroups.com
Thanks Brian. PASA is definitely picking up nice gene structures. 

I will write back to the thread once I will finish with the pasa generated .gff3 file comparison and updation with existing annotation.

Many Thanks,
Reema,


Reema Singh

unread,
Mar 23, 2015, 9:14:44 AM3/23/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Brian,

I have got very good annotation after finishing PASA assembly comparison and updation with the existing annotation. The updated annotation shows extended UTRs and corrected gene models. However, I still have some questions related to updation:-

Question1:- Why PASA joined genes after comparison and updation with the existing annotation, even though rest of the evidence (read depth, trinity transcript and existing curated/predicted gene model) shows that these ate two genes? [Attached figure Updation_Join_1 and Updation_Join_2]


Question2:- Why PASA missed some annotation? All the features has been highlighted in the attached figure and the missed annotation highlighted with the black box [attached figure Updation_Missing_1]


Best,

Reema,


Updation_Join_1.jpg
Updation_Join_2.jpg
Updation_Missing_1.jpg

Brian Haas

unread,
Mar 23, 2015, 9:41:23 AM3/23/15
to Reema Singh, trinityrn...@googlegroups.com
Hi Reema,

responses below

On Mon, Mar 23, 2015 at 9:14 AM, Reema Singh <reema...@gmail.com> wrote:
Hello Brian,

I have got very good annotation after finishing PASA assembly comparison and updation with the existing annotation. The updated annotation shows extended UTRs and corrected gene models. However, I still have some questions related to updation:-

Question1:- Why PASA joined genes after comparison and updation with the existing annotation, even though rest of the evidence (read depth, trinity transcript and existing curated/predicted gene model) shows that these ate two genes? [Attached figure Updation_Join_1 and Updation_Join_2]




To fully be able to interpret what's going on, I generally need to see the following tiers of data:   1. valid alignments, 2. pasa assemblies of those alignments, 3.  existing coding annotations, and 4.  the PASA-updated gene annotations

However, in the case where there are alignments that provide read-through transcripts involving neighboring genes, PASA (under default parameters) can generate fusion assemblies.   If you have a gene-dense genome such as this, I an recommend other parameters, such as a gene-guided clustering/assembly instead of the PASA defaults (which cluster/assemble simply based on alignment overlaps).  Also, be sure to use the --TRANSDECODER option, which will involve PASA looking at the input transcript sequences to determine if any look like they encode full-length ORFs, in which case these transcripts will be privileged in certain ways that can further lead to annotation improvements.



 

Question2:- Why PASA missed some annotation? All the features has been highlighted in the attached figure and the missed annotation highlighted with the black box [attached figure Updation_Missing_1]



Here I'm not seeing any PASA assemblies. Perhaps the alignments at that location were deemed invalid?

Reema Singh

unread,
Mar 23, 2015, 12:41:28 PM3/23/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Bryan,

Please see the attached figures showing same tier of data as you mentioned. The first attachment[ Updation_Join_1_1 is same as the previously attached file Updation_Join_1] and likewise the another file. In all figures the 1) valid alignments [blat and gmap] is shown by blue color , 2) pasa assemblies of those alignments [Red], 3) existing coding annotation [pink] , 4) pasa-updated gene annotation [Green].

Generating another PASA assembly using --TRANS-DECODER option that's a very good point. I will certainly repeat both PASA assembly with this parameter and make a comparison for the fusion cases. However, we didn't use the gene-guided clustering because we are very sure about the fact that few of the gene models in the existing annotation are wrong. So we are using this information at the later stage for updation.

For question 2 : both blat and gmap shows valid alignment in this region. But when I upload the failed alignment for both are seems a bit different. The attached figure [ Updation_Missing_1_1_1] shows the both failed alignment in cyan color.

Many Thanks,

Updation_Join_1_1.jpeg
Updation_Missing_1_1.jpeg
Updation_Missing_1_1_1.jpeg

Brian Haas

unread,
Mar 23, 2015, 2:03:40 PM3/23/15
to Reema Singh, trinityrn...@googlegroups.com
Hi Reema,

Thanks for the extra images.  In general, it's curious that your existing annotations (pink) aren't showing introns in your viewer, so I'm wondering if there's a format recognition issue here... and hopefully it's not causing PASA trouble in addition to the viewer you're using.

In reference to your pictures:

updated_join_1_1:  I'm going to make a guess here that your original annotation is actually this merged structure, and you're just not seeing it that way because the pink sections aren't being shown with linked introns in your viewer.    PASA is likely not updating this annotation for several reasons. If each of the PASA assemblies represented full-length structures (and you ran the --TRANSDECODER option), then PASA should hopefully split that merged gene structure into two separate genes.

Updated_missing_1_1:  the original annotations corresponding to those short pinke pieces on the right side (assuming they're linked by introns) are on the opposite strand from the PASA assembly. If the PASA assembly is multi-exon, then it would be evidence for antisense - but it's probably more likely that your original annotation represents some false positives or a pseudogene (need to look at protein homology data to know for sure).  In any case, PASA will only update the structure of an existing annotation if it's modestly making changes to the existing open reading frame. If it's rather drastic, as would be in this case, it'll put it in a 'fail' category, requiring manual inspection and gene modeling.

Updated_missing_1_1:  similar arguments could be made fro this one as in _1_1 above.  It is curious that there are so many cyan (invalid) alignments that show this region as showing up in an intron.   It's hard to know what's going on here in this picture without seeing 'all' the data, but then again, that's what manual annotation would involve, and in this case, could be quite subjective depending on what other evidence you might have.  The transcription data is certainly not boding well for your existing annotation structures.

I hope this helps,

~brian

Reema Singh

unread,
Mar 26, 2015, 8:14:40 AM3/26/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Brian,

Thanks for pointing this out. I didn't realize that there is problem in visualization. I have checked this by uploading the same existing annotation that I have used for the PASA comparison and updation. But the scenario is still the same. And Yes, you are right, in both the examples[mentioned in the last email] the genes were falsely fused in the original annotation.  However, I have checked again by running the PASA assembly with  --TRANSDECODER option. And for these two examples again the results are same. At the moment, I am trying to understand what's happening here and will write back to the thread.

Thanks again.

Reema,

wangzhe...@gmail.com

unread,
Sep 17, 2016, 10:29:15 AM9/17/16
to trinityrnaseq-users
Hi brain,
   
           There are some error when I convert my gtf file to the one pasa can work with. Can you help me to solve this problem?
           My gff file is
           Contig262670    AUGUSTUS        gene    21054   22623   0       -       .       MMa09781
Contig262670    AUGUSTUS        CDS     21198   21271   0       -       0       transcript_id "MMa09781.t1"; gene_id "MMa09781";
Contig262670    AUGUSTUS        CDS     21775   21893   0       -       0       transcript_id "MMa09781.t1"; gene_id "MMa09781";
Contig326292    AUGUSTUS        gene    34142   35109   0       +       .       MMa25842
Contig326292    AUGUSTUS        CDS     34142   34218   0       +       0       transcript_id "MMa25842.t1"; gene_id "MMa25842";
Contig326292    AUGUSTUS        CDS     34373   34688   0       +       0       transcript_id "MMa25842.t1"; gene_id "MMa25842";
Contig326292    AUGUSTUS        CDS     34783   35109   0       +       0       transcript_id "MMa25842.t1"; gene_id "MMa25842";
Contig315689    AUGUSTUS        gene    201     10140   0       +       .       MMa16518
Contig315689    AUGUSTUS        CDS     201     239     0       +       0       transcript_id "MMa16518.t1"; gene_id "MMa16518";
Contig315689    AUGUSTUS        CDS     590     894     0       +       0       transcript_id "MMa16518.t1"; gene_id "MMa16518";
Contig315689    AUGUSTUS        CDS     9777    10140   0       +       0       transcript_id "MMa16518.t1"; gene_id "MMa16518";

               And I have use the command line:
$PASA_HOME/misc_utilities/gtf_to_gff3_format.pl   genes.gtf genome.fa >
genes.converted.gff3

There is an error:
-parsing GTF file: gene-models-v1.0.gff
of lineannot get gene_id from MMa09781
at /home/bio_soft/PASApipeline-2.0.2/misc_utilities/../PerlLib/GTF_utils.pm line 85, <$fh> line 1.
GTF_utils::GTF_to_gene_objs('gene-models-v1.0.gff') called at /home/bio_soft/PASApipeline-2.0.2/misc_utilities/../PerlLib/GTF_utils.pm line 30
GTF_utils::index_GTF_gene_objs_from_GTF('gene-models-v1.0.gff', 'HASH(0x1bd1a68)') called at ../misc_utilities/gtf_to_gff3_format.pl line 27

Can you help me to solve it? I do not know how to convert my file. Thank you very much!

在 2015年2月27日星期五 UTC+8上午3:31:44,Reema Singh写道:
Reply all
Reply to author
Forward
0 new messages