Trouble with GFF3 file

58 views
Skip to first unread message

Dan Sapozhnikov

unread,
Nov 2, 2016, 7:22:11 PM11/2/16
to igv-help
Hi,

I am trying to load a GFF3 file along with my FASTA file and create a .genome file. The resulting file does not display the genes track.
I then tried to upload it as a track, and IGV gave me an error that it must first be indexed, but it could not index because there was a start position of 0 (it can only start from 1) and because the start positions were not in ascending order. I fixed both of these problems and it would proceed to make an index, but it still has an empty track. Loading this fixed file on the .genome again did not work either.

This is how my gff3 data looks (only scaffolds in the FASTA file):

##gff-version 3
C3619923 cflo_OGSv3.3 gene 2 100 44 + . ID=CFLO23628;Name=CFLO23628;

Please help me get this working!

Thanks :)

James Robinson

unread,
Nov 2, 2016, 10:27:14 PM11/2/16
to igv-help
I'll need to see the GFF.   If it has a position of zero it is not a properly formatted GFF file,  but beyond that I need to see the file to determine the issue,  as well as the fasta.  Could you send me links to these files?

The most common cause of this, if you want to look yourself,  is a mismatch between the sequence names in the fasta and GFF files.

Jim


--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/0feac9a2-bfe2-4cb1-9a8c-49d632bde3bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Sapozhnikov

unread,
Nov 2, 2016, 11:27:09 PM11/2/16
to igv-help
Sure! I don't see an issue in the naming.



On Wednesday, November 2, 2016 at 10:27:14 PM UTC-4, Jim Robinson wrote:
I'll need to see the GFF.   If it has a position of zero it is not a properly formatted GFF file,  but beyond that I need to see the file to determine the issue,  as well as the fasta.  Could you send me links to these files?

The most common cause of this, if you want to look yourself,  is a mismatch between the sequence names in the fasta and GFF files.

Jim

On Wed, Nov 2, 2016 at 4:22 PM, Dan Sapozhnikov <dansapo...@gmail.com> wrote:
Hi,

I am trying to load a GFF3 file along with my FASTA file and create a .genome file. The resulting file does not display the genes track.
I then tried to upload it as a track, and IGV gave me an error that it must first be indexed, but it could not index because there was a start position of 0 (it can only start from 1) and because the start positions were not in ascending order. I fixed both of these problems and it would proceed to make an index, but it still has an empty track. Loading this fixed file on the .genome again did not work either.

This is how my gff3 data looks (only scaffolds in the FASTA file):

##gff-version 3
C3619923 cflo_OGSv3.3 gene 2 100 44 + . ID=CFLO23628;Name=CFLO23628;

Please help me get this working!

Thanks :)

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Dan Sapozhnikov

unread,
Nov 2, 2016, 11:29:59 PM11/2/16
to igv-help
The original link I sent still has the 0 start positions, however.

James Robinson

unread,
Nov 2, 2016, 11:48:14 PM11/2/16
to igv-help
Hi,

The problem is the sequence names, but probably deeper than that.  The fasta has names like this

gnl|Cflo_3.3|scaffold9

The gff has a single sequence

cflo_OGSv3.3

These don't match.  Normally you could create an alias file mapping the 2 names, but these sequences don't look equivalent.   

Jim







To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/1619aec3-f4d7-424e-aa37-59c80bd0c5de%40googlegroups.com.

Dan Sapozhnikov

unread,
Nov 3, 2016, 12:13:36 AM11/3/16
to igv-help
Do you mean  the second column?
OK I was not aware that this had to match as well.
I believe that they must be equivalent. The scaffolds themselves do match and they are indicated as the same (and only) genome version. 
Could you help me make the sequence names match? It must be more than just changing the second column to say, Cflo_3.3, right?

There is also this GFF3 file, which has different IDs for the genes, and again a different text in the second column:

Thanks again!



James Robinson

unread,
Nov 3, 2016, 12:27:25 AM11/3/16
to igv-help
Hi,  sorry but there's very little chance that the positions in that GFF3 correspond to that fasta.  You would need to contact the producers of this data to sort it out.  Its not as simple as a name change.

To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/70b954f7-c5f2-4b8f-ae96-41e068ee9b55%40googlegroups.com.

Dan Sapozhnikov

unread,
Nov 3, 2016, 12:54:50 AM11/3/16
to igv-help
Thank you. I have contacted their team.
I am confused when you say that the names don't match. I clearly see for instance scaffold9, the example you mentioned, in the gff3 file.
Message has been deleted

James Robinson

unread,
Nov 3, 2016, 12:58:23 AM11/3/16
to igv-help
Oh wait,  sequence name is in first column, my bad..   Still the names don't match,  look at the fasta file.   You should be able to fix it though by renaming.

Apologies for 1st vs 2nd column confusion on my part,  but again the problem and fixes are the same.   Try renaming the sequence names in the fasta,  or create an alias file as described in the user doc.


On Wed, Nov 2, 2016 at 9:55 PM, James Robinson <jrob...@broadinstitute.org> wrote:
In the second column?  The sequence name in a gff file is in the second column.

On Wed, Nov 2, 2016 at 9:54 PM, Dan Sapozhnikov <dansapo...@gmail.com> wrote:
Thank you. I have contacted their team.
I am confused when you say that the names don't match. I clearly see for instance  scaffold9, the example you mentioned, in the gff3 file.
--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.

Jim Robinson

unread,
Nov 3, 2016, 1:06:06 AM11/3/16
to igv-help
The alias file is described here:  http://software.broadinstitute.org/software/igv/LoadData/#aliasfile.   So, for example,  the first line in the file for your case <might> be 

gnl|Cflo_3.3|scaffold9 <tab> scaffold9


This is a guess of course but if the files came from the same source it is probably correct.   Best to confirm with the data producers however that this is the case.  


Jim

Dan Sapozhnikov

unread,
Nov 3, 2016, 1:06:58 AM11/3/16
to igv-...@googlegroups.com

Great, I'm happy it's a simple fix. I was under the impression only the last part had to match (e.g. for human you only need to write chromosome number) but now that makes sense.

I'll give it a go in the morning. Thanks for clarifying.


You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/OPSEGn-PqhA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/CACOP%2BpuYT_hpr6DdGqosT%3DA4VUGROq5FzXBTrvpTrsWLwExYDQ%40mail.gmail.com.

James Robinson

unread,
Nov 3, 2016, 1:15:35 AM11/3/16
to igv-help
No problem.  I was a bit distracted by the Cubs.  Attached is an alias.tab file you can use in the .genome definition.  It was created as shown below.   Verify it yourself,  its only an example

grep '^>' Cflo_3.3_scaffolds.fa | sed 's/>//' > fastaNames.txt

cut -d '|' -f 3 fastaNames.txt > gffNames.txt

pasta fastaNames.txt gffNames.txt > alias.tab





alias.tab.zip

Dan Sapozhnikov

unread,
Nov 3, 2016, 11:24:06 AM11/3/16
to igv-help
Great, thank you for doing it for me! It works!
The managers of the database will be fixing the file on their end.
Interestingly, I can still load it into a .genome file without removing the 0's.

Thanks for your help.


On Thursday, November 3, 2016 at 1:15:35 AM UTC-4, Jim Robinson wrote:
No problem.  I was a bit distracted by the Cubs.  Attached is an alias.tab file you can use in the .genome definition.  It was created as shown below.   Verify it yourself,  its only an example

grep '^>' Cflo_3.3_scaffolds.fa | sed 's/>//' > fastaNames.txt

cut -d '|' -f 3 fastaNames.txt > gffNames.txt

pasta fastaNames.txt gffNames.txt > alias.tab




On Wed, Nov 2, 2016 at 10:06 PM, Dan Sapozhnikov <dansapo...@gmail.com> wrote:

Great, I'm happy it's a simple fix. I was under the impression only the last part had to match (e.g. for human you only need to write chromosome number) but now that makes sense.

I'll give it a go in the morning. Thanks for clarifying.

On Nov 3, 2016 00:58, "James Robinson" <jrob...@broadinstitute.org> wrote:
Oh wait,  sequence name is in first column, my bad..   Still the names don't match,  look at the fasta file.   You should be able to fix it though by renaming.

Apologies for 1st vs 2nd column confusion on my part,  but again the problem and fixes are the same.   Try renaming the sequence names in the fasta,  or create an alias file as described in the user doc.

On Wed, Nov 2, 2016 at 9:55 PM, James Robinson <jrob...@broadinstitute.org> wrote:
In the second column?  The sequence name in a gff file is in the second column.
On Wed, Nov 2, 2016 at 9:54 PM, Dan Sapozhnikov <dansapo...@gmail.com> wrote:
Thank you. I have contacted their team.
I am confused when you say that the names don't match. I clearly see for instance  scaffold9, the example you mentioned, in the gff3 file.

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.


--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/OPSEGn-PqhA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

James Robinson

unread,
Nov 3, 2016, 11:26:19 AM11/3/16
to igv-help
The "0"s wouldn't stop loading,  but its concerning since they shouldn't be there.  It could mean that all your features are off by 1 base pair.

To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/c5c8e45e-8e45-4f48-99b6-e21a12c840a5%40googlegroups.com.

Dan Sapozhnikov

unread,
Nov 3, 2016, 11:47:18 AM11/3/16
to igv-...@googlegroups.com

Yes, exactly. The folks that published this data are getting to the bottom of that.


To unsubscribe from this group and all its topics, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/CACOP%2Bpv4UGVNw1OVz%3DBX4n-GZ1cD-2v2fXV%2BbRfUGR5ikk9GFA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages