need more information or manual about sam2sgr program for bisulfite alignments

36 views
Skip to first unread message

changjin

unread,
Aug 19, 2011, 1:34:18 PM8/19/11
to gnumap-users
Hi, Nathan

I need more information or manual about sam2sgr program for bisulfite
alignments.

I want to convert a sam-format file generated by the novocraft
bisulfite aligner (any 3rd party bisulfite alignment program).

Two things I tried.

1)
It seems that the sam2sgr reads a header in the novocraft sam file to
look up a genome index file which is recorded, while generating the
sam file. However, somebody else in my lab generated and I cannot
access the index file. So, I failed to convert to sgr(gmp) format.

2)
I removed the header section. The novocraft bisulfite aligner handles
alignments with both +tive and -tive ref strand. In my application, I
need separate each of them. After this, I did
sam2sgr -g myref.fa -o novoPlus -G -3 -b novoPlus.sam

novoPlus.gmp show something like this,
chr6 61250 1.00000 1.00000 0.00000 0.00000 0.00000 0.00000 N
chr6 61251 1.00000 0.00000 0.00000 0.00000 1.00000 0.00000 N:c->t
p_val=7.28e-02
chr6 61252 3.00000 0.00000 0.00000 0.00000 3.00000 0.00000 N
chr6 61253 4.00000 4.00000 0.00000 0.00000 0.00000 0.00000 N
chr6 61254 6.00000 6.00000 0.00000 0.00000 0.00000 0.00000 N
chr6 61255 6.00000 6.00000 0.00000 0.00000 0.00000 0.00000 N
chr6 61256 6.00000 0.00000 0.00000 0.00000 6.00000 0.00000 N
chr6 61257 6.00000 6.00000 0.00000 0.00000 0.00000 0.00000 N
chr6 61258 6.00000 0.00000 0.00000 0.00000 6.00000 0.00000 N
chr6 61259 6.00000 0.00000 0.00000 0.00000 6.00000 0.00000 N
chr6 61260 6.00000 0.00000 0.00000 0.00000 6.00000 0.00000 Y:c->t
p_val=1.11e-05

Can you help me interpreting this result? Esp, What is the last
column? What is Y and N? What does p_val mean here? Is sam2sgr aware
of the orientation of a ref genome strand? This is somewhat different
from a gmp file generated by bisulfite-enabled gnumap aligment where
only rows having non-zero values in the column C or T appear.

Nathan Clement

unread,
Aug 19, 2011, 5:10:24 PM8/19/11
to gnumap...@googlegroups.com
See below.

Nathan

The output for the .gmp file is as follows:
<CHR> <POSITION> <TOTAL_RECORDS> <AMT_A> <AMT_C> <AMT_G> <AMT_T> <AMT_N> <is_snp ? "Y:REF->SNP PVAL" : "N">

> Esp, What is the last column? What is Y and N? What does p_val mean here?

If GNUMAP has detected there is a SNP in this genomic location, it will print out "Y", followed by the reference character and an arrow to SNP call. If this is a monoploid SNP, there will only be one character: "Y:a->t". If it's diploid, two characters will be present, in this manner: "Y:a->t/c". The p-value following the SNP call is the probability of calling a SNP at this location due to error in the sequencing process (or chance). So, if the p-value is higher than a threshold (default: 1e-3), this position will be rejected as a SNP, and the result will be something like: "N:c->t p_val=7.28e-02".

> Is sam2sgr aware of the orientation of a ref genome strand?

Yes. This information is included in the SAM file.

> This is somewhat different from a gmp file generated by bisulfite-enabled gnumap aligment where
> only rows having non-zero values in the column C or T appear.

Correct. The .gmp file prints both strands, so it prints at every non-zero position.

Reply all
Reply to author
Forward
0 new messages