[maker-devel] substr outside of string in PhatHits_utils.pm

189 views
Skip to first unread message

Ole Kristian Tørresen

unread,
Nov 9, 2017, 4:44:31 AM11/9/17
to maker...@yandell-lab.org
Dear all,
I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues.

Thank you.

Sincerely,
Ole K. Tørresen

Error message:

#--------- command -------------#
Widget::augustus:
/projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak
er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2
.3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato
r.augustus
#-------------------------------#
deleted:0 genes
begin called get_best_alt_splices1
...processing 0 of 2
...processing 1 of 2
end called get_best_alt_splices1
...processing 0 of 20
...processing 1 of 20
...processing 2 of 20
...processing 3 of 20
...processing 4 of 20
...processing 5 of 20
...processing 6 of 20
...processing 7 of 20
...processing 8 of 20
...processing 9 of 20
...processing 10 of 20
...processing 11 of 20
...processing 12 of 20
...processing 13 of 20
...processing 14 of 20
...processing 15 of 20
...processing 16 of 20
...processing 17 of 20
...processing 18 of 20
...processing 19 of 20
substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850.
--> rank=NA, hostname=compute-31-18.local
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:GmG20150304_scaffold_8692

ERROR: Chunk failed at level:6, tier_type:0
FAILED CONTIG:GmG20150304_scaffold_8692

examining contents of the fasta file and run log
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Nov 9, 2017, 11:28:41 AM11/9/17
to Ole Kristian Tørresen, maker...@yandell-lab.org
My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue.

—Carson

Ole Kristian Tørresen

unread,
Nov 21, 2017, 8:58:12 AM11/21/17
to Carson Holt, maker...@yandell-lab.org
Thank you Carson.

After a bit of struggling, I can confirm that the same error occurs in MAKER 3.01.2 (I guess you meant that version, couldn’t find 3.02.02).

I am providing a GFF to est_gff, with match and match_part entries. For at least one of the scaffolds, the last coordinate (column 5) is the same number as the length of the scaffold. That should be allowed by the GFF3 standard, right?

How can I troubleshoot this? The error message is not so informative. It seems that PhatHit_utils.pm tries to find a stop codon. Snipped from that file, lines 849-850:
#fix stop codon by walking downstream
my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3));

The GFF I am using was the output of Mikado (https://www.biorxiv.org/content/early/2017/11/09/216994), which is GFF3, and then processed a bit to make it suitable for MAKER. First converted to GTF by 'mikado util convert mikado.loci.gff3 mikado.loci.gtf'

Then I selected only mRNA and exon entries, and changed mRNA to transcript to make it look like cufflinks output (and set a dummy score):
grep -P "\tmRNA\t|\texon\t" mikado.loci.gtf |sed "s/mRNA/transcript/g" |awk -F "\t" '{$9=$9"cov \"10.0\";"; OFS="\t"; print $1, $2, $3, $4, $5, $6, $7, $8, $9}' > mikado.loci.score.gtf

Before converting with cufflinks2gff3:
cufflinks2gff3 mikado.loci.score.gtf > ests.score.gff3

Thank you.

Ole

Jacques Dainat

unread,
Dec 11, 2017, 11:49:45 AM12/11/17
to Carson Holt, maker...@yandell-lab.org
Dear Carson,

I got exactly the same problem as Ole Kristian Torresen.
substr outside of string at /projects/cees/bin/maker/maker-3.01.1/bin/../lib/PhatHit_utils.pm line 850.
I also tried with the version 3.00.0 and got the same problem.

My run:
I was using gff3 alignements as gff ESTs, proteins and activating repeat masking and trna.

The code line in PhatHit_utils.pm is the following:
      my $transcript_seq  = maker::auto_annotator::get_transcript_seq($hit, $seq); # few line before
      …..
  #fix stop codon by walking downstream 
  my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3));   # <= line 850

So, I relaunched using the —debug option but didn’t find anything useful.

Consequently I modified the  code to display the $transcript_seq object.
I joined the log were you can see the printed object from line 10955. 
I also printed the size of the sequence (length($transcript_seq)) and $end.
We can see that it crashes because the sequence is 1369 bp long and we try to extract a stop codon from position 1372 (1376-1-3). 
log_bug.txt

Carson Holt

unread,
Apr 17, 2018, 11:58:43 AM4/17/18
to Jacques Dainat, Maker Mailing List
It runs fine to completion for me (on both 3.01.01 and 3.01.02). Since I’m using your output, no external tools are called, it just parses the reports already written in the directory and finishes.

This suggests that any issue is either with your version of Perl or a component MAKER is using (such as BioPerl). I am using Perl 5.16.3 and BioPerl 1.007002 (the CPAN version).

Note if you are using BioPerl live or the GitHub release it still shows version 1.007002 but will not necessarily match the CPAN version as the GitHub version counter does not get iterated with each commit. So make sure you are not accidentally using BioPerl live from GitHub (only use CPAN or let MAKER do the install of BioPerl if it’s not system wide). Also you are using Exonerate 2.4 instead of the stable 2.2 release. That shouldn’t make any difference since I am just parsing your output that is already in the folder and not running exonerate. But it may be worth an off chance look.

Finally what you may want to do is download a new version of Perl, then install that and run MAKER with that version just to make sure Perl or something installed inside your Perl is not generating thee issue.

I would try the current stable release of perl (since it comes all ready to go - no compiling needed). Alternatively you can also try perlbrew to get a specific version (but it will have to compile against local libraries).

—Carson

On Apr 17, 2018, at 6:54 AM, Jacques Dainat <jacques...@nbis.se> wrote:

I tried with the last version 3.01.02-beta, still the same error.

I agree that it could be something wrong with the input GFF3 file, but I don’t find what it could be.

I have loaded the whole folder as user guest_5602.

I’m looking forward to hearing from you.

/Jacques

On 16 Apr 2018, at 22:11, Carson Holt <cars...@gmail.com> wrote:

I can’t replicate your failure. Normally this error indicates there is something wrong with the input GFF3 file.

If you want, you can run this on your own machine to generate the failure, then tarball up the complete maker directory for the failure and upload it here —> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

Also try the most current version of MAKER (3.01.02 - October 2017). See if it happens for you there.

—Carson




On Apr 13, 2018, at 6:52 AM, Jacques Dainat <jacques...@nbis.se> wrote:

Dear Carson,

I come back to you still with the same problem: substr outside of string at /sw/bioinfo/maker/3.01.1-beta-OMPI/bin/../lib/PhatHit_utils.pm line 850.
Since our last conversation in January I have seen in the MAKER mailing list that one more person (seoanezonjic) had this issue.
In January, the only way I found to avoid the problem was to remove the gff files that were related to the issue.

I have again the problem for a new annotation project. On the 6 `EST` gff files I’m using (produced in the same way, with Stringtie and converted in gff3 alignment style), 2 of them are raising the error.
To try to better get where the problem come from I have minimise the tools used within MAKER. So no repeat masking and abinitio tools activated, only protein in fasta format and one EST file in gff format.
Using only the gff est file with est2genome=1 works
Using only protein in fasta with protein2genome works
Using the gff EST file and the protein in fasta format with  protein2genome and est2genome or only protein2genome doesn’t work 

The problem occurs when protein alignments try to be extended by the EST information.
I tried using the same tool versions as you (BioPerl 1.007002, BLAST+ 2.7.1, Exonerate 2.2.0) but still the same problem.
One of the interesting thing is that the problem does not occur when I used the protein in gff format.

Here is one gff file raising the error. If you want to try in the same conditions you will have to use/dowmload the swissprot database (uniprot rewieved only).


<9529.1.136329.CGATGT.gff3>
<scaffold_1017.fa>

I hope we will finish to find a solution to this problem…

Best regards,

Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address:
Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden




Jacques Dainat

unread,
Apr 19, 2018, 5:15:48 AM4/19/18
to Carson Holt, Maker Mailing List, seoane...@hotmail.com
Passing from Perl 5.10.1 to 5.16.3 seems to have fixed the  substr outside of string at PhatHit_utils.pm line 850. issue.

Thank you again for your help.

/Jacques

Reply all
Reply to author
Forward
0 new messages