Visualizing blast searches in IGV

1,924 views
Skip to first unread message

Bolger

unread,
Sep 30, 2011, 7:57:57 AM9/30/11
to igv-help
We are looking for genes in the genome of a de novo sequenced organism
(no reference genomes or anything available) and regularly do blast
searches for specific genes. I have made a simple script to convert
the tabular blast output to GFF3-format, which IGV reads quite
politely, but I can't manage to display more information than the name
of the blasthit. It would be great to be able to see the e-value and
bitscore too in IGV through a pop-up window or in some other way.

My question is, how do I include this information in the GFF formated
file so that IGV can read and display it? I guess it needs to be done
in the column 9, "attributes", but I can't figure out which tags to
use.

If I am approaching this problem from a wrong angle, please feel free
to suggest other options.

/Henrik

Jim Robinson

unread,
Sep 30, 2011, 8:02:49 AM9/30/11
to igv-...@googlegroups.com
Hi Henrik,

You can add arbitrary attributes to column 9 and these should be displayed in the popup text.     Also, are you aware that IGV can read PSL files directly?   If you would like to post an example line of your blast output file and the converted GFF3 I can possibly advise further.

I'm assuming you are using IGV 2.0.

best,

Jim

Henrik

unread,
Sep 30, 2011, 9:42:36 AM9/30/11
to igv-help
Hi Jim

Thanks for the quick reply. I wasn't aware that IGV could read .psl
files, but I just tried importing a blat .psl search and it worked
fine. I think I still might want to import the blast-results in
addition though, and will add the needed info to column 9 as you say.
I guess I could convert the blast output to .psl too, but haven't
looked closely at the .psl format so far, will do so shortly.

Anyway, just in case you or anyone else has a good suggestion, this is
what my blast output looks like:

ctg220009120605 AN7090 38.32 167 98 4 24701 24216 89 253 2e-24 108
ctg220009120651 AN7084 28.22 241 170 4 257258 257971 1695 1924 2e-17
89.4
ctg220009120651 AN6791 27.83 230 163 3 257291 257971 1868 2089 2e-15
82.8

And this is how my converted .gff3 file looks (before adding the new
info to column 9):

#GFF version 3
ctg220009120605 BLAST match_part 24216 24701 2e-24 - . Name=AN7090
ctg220009120651 BLAST match_part 257258 257971 2e-17 + . Name=AN7084
ctg220009120651 BLAST match_part 257291 257971 2e-15 + . Name=AN6791

Any suggestions are very welcome,
Henrik

Jim Robinson

unread,
Oct 1, 2011, 10:46:45 PM10/1/11
to igv-...@googlegroups.com
Hi, 

I suggested PSL as its a common output of blat, so I thought it my be an option with blast as well.  Otherwise I would not have suggested it, its rather complex compared to GFF.   You could also use "BED" format, and shade the features by score by setting useScore=1 on a track line.   Strictly speaking BED does not have the equivalent of column 9,  but you can use the BED "name" field for GFF3 style attributes (column 9) by adding the following to the track line  gffTags=on.   This is an IGV extension, but the file is still a valid BED as there is no restriction on the contents of the "name" field.

Jim

Malcolm Cook

unread,
Aug 27, 2016, 12:47:18 PM8/27/16
to igv-help
Hi

I'm returning to this old thread as I am today similarly hoping to display blast alignment results that were given to me coded as gff3, with ID= and Parent= column 9 attributes that to my eye correctly link together features of type "match" with subordinate "match_part" (being blast's individual High-scoring Segment Pairs (HSP)).  I am hoping there is a way to get IGV to display the "match_part" (HSP) features with a line when they have the same value for Parent "match", much like exons are joined with a line when they have the same parent mRNA.  

By the way the gff3 encoded blast results were are output from a run of Maker gene annotation pipeline.

I can provide an example if that would help.... but perhaps you already have a position on this, like "you should not want to connect the hsps with a line - it is misleading to display blast results  this way", or "it already does work, you must be doing something wrong".

Thanks ,

Malcolm

Jim Robinson

unread,
Aug 27, 2016, 6:15:41 PM8/27/16
to igv-...@googlegroups.com
Hi,

I looked up these terms in SO and as they are defined there with a
"part-of" relation will include them as recognized terms, but I don't
have an ETA. To increase the odds (i.e. make them > 0) I would suggest
opening a git issue with a small example file attached.

Parent-child relations are undefined in GFF, they could mean
anything. Sometimes for example they mean "kind of". The Sequence
Ontology helps by defining terms and relationships. It so happens
match and match_part are defined here, so you are in luck.

In the meantime if you want to make a copy of the files for visual use
copy them and replace the terms with exon and transcript.
Reply all
Reply to author
Forward
0 new messages