What are columns in BLAST output?

6,181 views
Skip to first unread message

Stephen Turner

unread,
Nov 15, 2012, 7:57:53 AM11/15/12
to metaphl...@googlegroups.com
I'm saving the intermediate BLAST output (pasted below for an example). The documentation refers to outfmt 6 format, but after Googling around I couldn't find any documentation on the web consistent with this output format. There are 12 output columns - the first two are obviously my read name and the internal clade ID, and some of the others presumably correspond to score, max score, e-value, etc. Could someone please clarify what all 12 columns here are?

1. read name
2. internal clade ID
3. ?score
4. ?score
5.
6.
7.
8.
9.
10.
11.
12.

read.4 100366371 98.53 68 1 0 1 68 301 368 2e-27 122
read.24 100365510 100.00 32 0 0 1 32 973 942 1e-08 60.2
read.31 100027883 100.00 61 0 0 1 61 205 145 1e-24 113
read.31 100171847 98.31 59 1 0 3 61 269 211 6e-22 104
read.36 100366313 98.39 62 1 0 14 75 1 62 1e-23 110
read.44 100365434 97.33 75 2 0 1 75 183 257 4e-29 128
read.50 100365288 98.67 75 1 0 1 75 408 334 8e-31 134
read.53 100366269 100.00 62 0 0 1 62 589 650 3e-25 115
read.57 100366273 98.48 66 0 1 1 66 310 246 3e-25 115
read.72 100366324 100.00 75 0 0 1 75 226 152 2e-32 139

Also trying to map back from the clade id (e.g. 100366371) to the clade name. What's the best way to do this?

Thanks.

Nicola Segata

unread,
Nov 15, 2012, 11:55:22 AM11/15/12
to metaphl...@googlegroups.com
Hi Stephen,
 you can obtain the description of the 12 columns with "blastn -help". It's a rather long description, the important part is that the list of the twelve columns:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
with their legend
            qseqid means Query Seq-id
            sseqid means Subject Seq-id
            qstart means Start of alignment in query
              qend means End of alignment in query
            sstart means Start of alignment in subject
              send means End of alignment in subject
            evalue means Expect value
          bitscore means Bit score
            length means Alignment length
            pident means Percentage of identical matches
          mismatch means Number of mismatches
           gapopen means Number of gap openings

I hope this helps,
thanks
Nicola
 

EthanK Gough

unread,
Dec 5, 2013, 2:17:59 PM12/5/13
to metaphl...@googlegroups.com
Hi Nicola

Does the output from using bowtie2 also contain information regarding alignment quality? I've read the bowtie2 documentation online, and it normally produces a SAM file with some alignment quality information. Is this output saved by metaphlan? If so, how can I access it?

Thanks!
E

Nicola Segata

unread,
Dec 8, 2013, 2:58:37 AM12/8/13
to EthanK Gough, metaphl...@googlegroups.com
Hi Ethan,
BowTie2 does contain the alignment quality but we are currently
saving the alignment scores in the MetaPhlAn "--bowtie2out" file for
both consistency with the "--blastout" file and for space reasons.
However, you can definitely play with the alignment quality
information with the following procedure:
1. applying BowTie2 externally to MetaPhlAn (against the MetaPhlAn DB)
2. process the generating sam/bam with samtools as needed
3. obtain the reads to marker mapping ( cat file.sam | cut -f 1,3 |
grep -v "*" > reads2markers.txt )
4. provide the resulting reads-marker mapping (reads2markers.txt) to
metaphlan (--input_type bowtie2out)

I hope this helps,
many thanks
Nicola
> --
> You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages