Nathan
The output for the .gmp file is as follows:
<CHR> <POSITION> <TOTAL_RECORDS> <AMT_A> <AMT_C> <AMT_G> <AMT_T> <AMT_N> <is_snp ? "Y:REF->SNP PVAL" : "N">
> Esp, What is the last column? What is Y and N? What does p_val mean here?
If GNUMAP has detected there is a SNP in this genomic location, it will print out "Y", followed by the reference character and an arrow to SNP call. If this is a monoploid SNP, there will only be one character: "Y:a->t". If it's diploid, two characters will be present, in this manner: "Y:a->t/c". The p-value following the SNP call is the probability of calling a SNP at this location due to error in the sequencing process (or chance). So, if the p-value is higher than a threshold (default: 1e-3), this position will be rejected as a SNP, and the result will be something like: "N:c->t p_val=7.28e-02".
> Is sam2sgr aware of the orientation of a ref genome strand?
Yes. This information is included in the SAM file.
> This is somewhat different from a gmp file generated by bisulfite-enabled gnumap aligment where
> only rows having non-zero values in the column C or T appear.
Correct. The .gmp file prints both strands, so it prints at every non-zero position.