Hi Carson,
A little while back you provided me with a script to edit my gff files in order to update them with IPRscan resutls. It seems I am able to do so when only maker models are present in the gff file, however, when I try to update an anlysis/evidence based gff file I recieve the errors below. I have attached the files for you to look at, any help is appreciated. ( i attached all sets of files, those that worked, 'models', and those that did not, 'abinit')
Thanks in advance,
Claudia
ERROR ( is associated with all lines in the IPRSCAN file)
Use of uninitialized value $gene_id in hash element at /usr/local/genome/maker/maker-2.10/bin/ipr_update_gff line 163, <$IN> line 61.blastp uniprot-sprot.fasta maker_proteins.fasta -mformat-=2 > uniprot-sprot.blast.out(order of files is important maker_proteins must come second)
Hi Carson,
Thanks for the reply, I see how it works now, I was generating a wu-blast output from the actual website.
One thing, if I am using NCBI blast, will this script still work? I think the formating is different from wu-blast.
Thanks, Claudia
On 11/05/2011 2:23 PM, Carson Holt wrote:
Re: updating gff3 note attribute with wu-blast hits using maker_functional_gff Usage:
>sp|Q76N59|ZYG1_DICMU Zygote formation protein zyg1 OS=Dictyostelium mucoroides GN=zyg1 PE=2 SV=1
[cholt@garrucha blastdb]$ /usr/local/ncbi-blast-2.2.23+/bin/makeblastdb -in uniprot_sprot.fasta
Building a new DB, current time: 05/11/2011 16:32:20
New DB name: uniprot_sprot.fasta
New DB title: uniprot_sprot.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 495880 sequences in 27.28 seconds
Hi,
So I am running the NCBI blastp locally, and the script asks you to specify the database on the command line, when I tried to specify using the uniprot-swissprot fasta file, it returns an error saying no alias or index id specified. To fix this, I specifically downloaded the swissprot db from the ftp site at NCBI (which has a particular index format that the script understands) and this db requires the parent database ( non-redundant proteins) to also be present, so basically the end result is that I get both the NCBI and swiss prot identifiers in the blast output. I could not run blastp with just swissprot as the db, which I assume would have given me just the sp identifier. So I am not sure how to get around this.
And, yes I am using that uniprot_sprot fasta file you have attached.
Claudia
On 11/05/2011 6:19 PM, Carson Holt wrote:
Re: updating gff3 note attribute with wu-blast hits using maker_functional_gff I’m not sure how you’re running this. Your results show a hit named “gi|19924280|sp|P49693.3|RL193_ARATH”.
Hi,
So I have attempted to use the add_utr_to_gff3 script and it seems to return many errors, see below: ( I have attached the gff3 file I was trying to update)
Does anyone have any idea how to control this so that my I.D's don't change, and/ or how to use the other script without getting the errors?
Thanks in advance,
Claudia
Hi Carson
I’m running maker version 2.09 and would like to add UTRs to the gff output. I’m getting errors for both the add_utr_start_stop_gff and add_utr_to_gff3 scripts.
I see from your previous post that an upgrade is recommended but I don’t think our license includes upgrades.
I was just wondering if you have any ideas on how to obtain UTR info using v2.09? The script errors are pasted below.
Cheers
Anar
add_utr_to_gff3 test_maker.orig.gff
Global symbol "$s" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 26.
Global symbol "@files" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 34.
Global symbol "@files" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 40.
Global symbol "@files" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 48.
Global symbol "@stuff" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 63.
Global symbol "@stuff" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 86.
Global symbol "@stuff" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 87.
Global symbol "@stuff" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 88.
Global symbol "$header" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 93.
Global symbol "$footer" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 96.
Global symbol "@stuff" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 111.
Global symbol "$footer" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 187.
Global symbol "@regions" requires explicit package name at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 445.
Missing right curly or square bracket at /usr/local/maker_2.09beta/bin/add_utr_to_gff3 line 489, at end of line
/usr/local/maker_2.09beta/bin/add_utr_to_gff3 has too many errors.
--------------
add_utr_start_stop_gff test_maker.orig.gff
Use of uninitialized value in split at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 89.
Use of uninitialized value in substr at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 305.
substr outside of string at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 305.
Use of uninitialized value in substr at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 306.
substr outside of string at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 306.
Use of uninitialized value in print at /usr/local/maker_2.09beta/bin/add_utr_start_stop_gff line 180.
Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.
<ATT00001..txt>
Hi
Thank you both for your replies.
Barry, you reminded me that I had in fact meddled with the gff output, so what I was passing to the add_utr scripts wasn’t actually bona fide MAKER output. I had stripped the fasta from the gff file, for some reason. Running add_utr_start_stop_gff on the original MAKER gff worked without errors. My mistake, I apologise! Thank you for picking this up.
Carson, thanks for explaining the workaround. We can’t afford to pay for an upgrade right now, but will obtain the new version when our current license expires and we renew. I’m happy to hear MAKER now adds UTRs by default.
Cheers
Anar
Hi
I apologise for asking another question about this script given it’s been deprecated, but I am curious about something and wonder whether you might be able to help. I ran add_utr_start_stop_gff on some maker output and no UTRs were added. For this species I have no est data, only altest data. Does this explain why no UTRs could be identified? What info is used to add UTR coordinates?
Thanks very much,
Anar
From: Carson Holt
[mailto:carso...@genetics.utah.edu]
Sent: Wednesday, 14 September 2011 2:02 p.m.
To: Barry Moore; Khan, Anar
Cc: Dinatale C; maker...@yandell-lab.org
Subject: Re: [maker-devel] usage of add_utr scripts
2.11 and above no longer have those scripts (UTR is always added by default now). So this is a round about way to do it --> upgrade MAKER, then provide the merged GFF3 (made with gff3_merge) to genome_gff in the makeropts.ctl file, then set all pass options to 1. Then make sure to clear all other options (repeatprotein model_org etc.). When you rerun MAKER, it will just update the GFF3.
--Carson
From: Barry Moore <bmo...@genetics.utah.edu>
Date: Tue, 13 Sep 2011 18:41:16 -0600
To: "Khan, Anar" <Anar...@agresearch.co.nz>
Cc: Carson Holt <carso...@genetics.utah.edu>,
Dinatale C <din...@uwindsor.ca>,
"maker...@yandell-lab.org"
<maker...@yandell-lab.org>
Subject: Re: [maker-devel] usage of add_utr scripts
Hi Anar,
I looks like the add_utr_to_gff3 script is completely hosed in that release. I think it's just there as a relic and add_utr_start_stop_gff is the script that you want. From those errors it looks like your gff3 file doesn't have the fasta sequence at the end of it.
If I've guessed correctly, the quick workaround would be to just add lines like the following to the end of your GFF3 file - where ,of course, the fasta header and sequence are replaced with the correct data for the chromosome(s)/contig(s) represented in your GFF3 file.
##FASTA
>chr1
ATCGATACGTAG...
B
Hi
I apologise for asking another question about this script given it’s been deprecated, but I am curious about something and wonder whether you might be able to help. I ran add_utr_start_stop_gff on some maker output and no UTRs were added. For this species I have no est data, only altest data. Does this explain why no UTRs could be identified? What info is used to add UTR coordinates?
Thanks very much,
Anar
From: Carson Holt
[mailto:carso...@genetics.utah.edu]
Sent: Wednesday, 14 September 2011 2:02 p.m.
To: Barry Moore; Khan, Anar
Cc: Dinatale C; maker...@yandell-lab.org
Subject: Re: [maker-devel] usage of add_utr scripts
2.11 and above no longer have those scripts (UTR is always added by default now). So this is a round about way to do it --> upgradeMAKER, then provide the merged GFF3 (made with gff3_merge) to genome_gff in the makeropts.ctl file, then set all pass options to 1. Then make sure to clear all other options (repeatprotein model_org etc.). When you rerun MAKER, it will just update the GFF3.
--Carson
From: Barry Moore <bmo...@genetics.utah.edu>
Date: Tue, 13 Sep 2011 18:41:16 -0600
To: "Khan, Anar" <Anar...@agresearch.co.nz>
Cc: Carson Holt <carso...@genetics.utah.edu>,
Dinatale C <din...@uwindsor.ca>,
"maker...@yandell-lab.org"
<maker...@yandell-lab.org>
Subject: Re: [maker-devel] usage of add_utr scripts
Hi Anar,
I looks like the add_utr_to_gff3 script is completely hosed in that release. I think it's just there as a relic and add_utr_start_stop_gff is the script that you want. From those errors it looks like your gff3 file doesn't have the fasta sequence at the end of it.
If I've guessed correctly, the quick workaround would be to just add lines like the following to the end of your GFF3 file - where ,of course, the fasta header and sequence are replaced with the correct data for the chromosome(s)/contig(s) represented in your GFF3 file.
##FASTA
>chr1
ATCGATACGTAG...
B
Hi Carson
Thanks a lot for the explanation, all clear now.
Cheers
Hi
I ran add_utr_start_stop_gff on some maker output and no UTRs were added. For this species I have no est data, only altest data. Does this explain why no UTRs could be identified –perhaps altest evidence isn’t used when attempting to define UTRs?
Cheers :-)
Anar
From: Khan, Anar
Sent: Wednesday, 14 September 2011 4:19 p.m.
To: 'Carson Holt'; Barry Moore
Cc: Dinatale C; maker...@yandell-lab.org
Subject: RE: [maker-devel] usage of add_utr scripts
Hi
Thank you both for your replies.
Barry, you reminded me that I had in fact meddled with the gff output, so what I was passing to the add_utr scripts wasn’t actually bona fide MAKER output. I had stripped the fasta from the gff file, for some reason. Running add_utr_start_stop_gff on the original MAKER gff worked without errors. My mistake, I apologise! Thank you for picking this up.
Carson, thanks for explaining the workaround. We can’t afford to pay for an upgrade right now, but will obtain the new version when our current license expires and we renew. I’m happy to hear MAKER nowadds UTRs by default.
Cheers
Anar
From: Carson Holt
[mailto:carso...@genetics.utah.edu]
Sent: Wednesday, 14 September 2011 2:02 p.m.
To: Barry Moore; Khan, Anar
Cc: Dinatale C; maker...@yandell-lab.org
Subject: Re: [maker-devel] usage of add_utr scripts
2.11 and above no longer have those scripts (UTR is always added by default now). So this is a round about way to do it --> upgradeMAKER, then provide the merged GFF3 (made with gff3_merge) to genome_gff in the makeropts.ctl file, then set all pass options to 1. Then make sure to clear all other options (repeatprotein model_org etc.). When you rerun MAKER, it will just update the GFF3.
--Carson
From: Barry Moore <bmo...@genetics.utah.edu>
Date: Tue, 13 Sep 2011 18:41:16 -0600
To: "Khan, Anar" <Anar...@agresearch.co.nz>
Cc: Carson Holt <carso...@genetics.utah.edu>,
Dinatale C <din...@uwindsor.ca>,
"maker...@yandell-lab.org"
<maker...@yandell-lab.org>
Subject: Re: [maker-devel] usage of add_utr scripts
Hi Anar,
I looks like the add_utr_to_gff3 script is completely hosed in that release. I think it's just there as a relic and add_utr_start_stop_gff is the script that you want. From those errors it looks like your gff3 file doesn't have the fasta sequence at the end of it.
If I've guessed correctly, the quick workaround would be to just add lines like the following to the end of your GFF3 file - where ,of course, the fasta header and sequence are replaced with the correct data for the chromosome(s)/contig(s) represented in your GFF3 file.
##FASTA
>chr1
ATCGATACGTAG...
B
On Sep 13, 2011, at 5:22 PM, Khan, Anar wrote:
From: maker-dev...@yandell-lab.org [mailto:maker-dev...@yandell-lab.org] On Behalf Of Carson Holt
To: Dinatale C; maker...@yandell-lab.org
Subject: Re: [maker-devel]
usage of add_utr scripts
The add_utr_to_gff3 script should have been pulled out, and somehow was left in the 2.10 release. The add_utr_start_stop_codon_gff script changes IDs, which normally isn’t too big a problem because chado will also change them (ID’s just need to be unique and are not necessarily informative).
If you need those to be unchanged, I’ll modify the script. It will
be a few days though because I’m defending my thesis on Monday.
You can also use MAKER 2.11-beta. Named UTR is supposed to be optional in GFF3 format because it is always implicit anyway, but MAKER 2.11 now adds explicit UTR by default to better work with JBrowse. Just give it theother MAKER’s GFF3 file and turn all the pass options to 1. Don’t use other input files including repeatmasker options.
Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.
<ATT00001..txt>
Hi
I’d like to try to add UTR coordinates predicted by the ab initio gene finders, given I can’t derive them from est alignments due to a lack of EST data. I see raw output is available in “the void” folder. Before writing a rather convoluted script which attempts to match snap/augustus/fgenesh predictions with maker predictions, does a script already exist; or can anyone offer suggestions/advice?
Cheers
Anar
From: maker-dev...@yandell-lab.org [mailto:maker-dev...@yandell-lab.org] On Behalf Of Khan, Anar
Sent: Thursday, 29 September 2011 2:28 p.m.
To: Carson Holt; Barry Moore
Hi
I’d like to try to add UTR coordinates predicted by the ab initio gene finders, given I can’t derive them from est alignments due to a lack of EST data. I see raw output is available in “the void”folder. Before writing a rather convoluted script which attempts to match snap/augustus/fgenesh predictions with maker predictions, does a script already exist; or can anyone offer suggestions/advice?
Cheers
Anar
From: maker-dev...@yandell-lab.org
[mailto:maker-dev...@yandell-lab.org] On Behalf Of Khan, Anar
Sent: Thursday, 29 September 2011 2:28 p.m.
To: Carson Holt; Barry Moore
Cc: maker...@yandell-lab.org; Dinatale C
Subject: Re: [maker-devel] usage of add_utr scripts
Hi Carson
Thanks a lot for the explanation, all clear now.
Cheers
Anar
From: Carson Holt
[mailto:carso...@genetics.utah.edu]
Sent: Thursday, 29 September 2011 2:13 p.m.
To: Khan, Anar; Barry Moore
Cc: Dinatale C; maker...@yandell-lab.org
Subject: Re: [maker-devel] usage of add_utr scripts
ESTs that overlap the splice sites of the prediction are used to extend the gene model. You can think of it as a kind of cut and pasteoperation. Alt_est does not match the splice sites correctly (there are several technical reasons why). Also since alt_est is aligned in protein space (tblastx) the UTR region may not line up as it is non-coding, so it is not under the same evolutionary constraints as the rest of the alignment.