Hi Amir,
Answers below.
On 3/23/13 4:07 PM, Amir Nikooienejad wrote:
> 1. As you mentioned in answer to question 4, there are some loci in which some
> samples show many read for allele 'a', but one or two reads for 'b' but they are
> not that much to call that locus heterozygote. What is the threshold for that
> call? does it relate to 'chi square significance level' in your SNP model?
>
When originally calling sites as polymorphic or monomorphic, the SNP model uses
the chi square significance level to decide when a call is statistically
significant. This happens at each individual when processed by ustacks or
pstacks. The significance level can be made more stringent at this stage if you
like.
The automated corrections runs in the genotypes program, and it looks at calls
across the whole population. These command line options to genotypes control
these thresholds:
Automated corrections options:
--min_hom_seqs: minimum number of reads required at a stack to call a
homozygous genotype (default 5).
--min_het_seqs: below this minor allele frequency a stack is called a
homozygote, above it (but below --max_het_seqs) it is
called unknown (default 0.05).
--max_het_seqs: minimum frequency of minor allele to call a
heterozygote (default 0.1).
So, if a locus in an individual has less than 5 reads, then the automated
corrector will change its genotype to '-' instead of assuming it is a homozygote.
If there is evidence that a site is polymorphic, e.g. there is at least one read
representing the second allele at a site, then the software calculates the
frequency of the second allele. If the frequency is below 5%, the software does
not consider it to be enough evidence and calls a homozygote. If the minor
allele frequency is between 5 and 10%, it does not have enough evidence to call
a heterzygote, but it is not likely a homozygote, so it is set to unknown, '-'.
If there is the allele has a frequency of 10% or higher, it it called a het.
So, the SNP model is looking for roughly a 50% representation of the two alleles
at a polymorphic site, however, the genotypes program, because it knows the SNP
is definitely in a bunch of other individuals in the population, only requires a
10% frequency to call it.
Of course, you can adjust these as you like.
> 2. In both ./stacks/ output files and web interface, those two reads of allele
> 'b' are counted as 'a' and the counts are (instead of 25 and 2) 27 for allele
> 'a'. It seems the genotype program is aware of those little counts to call it
> '-'. Is there anyway for us to extract those small counts too just like the
> genotype program. (I know it is possible by looking at that Stacks' read
> sequences, but I mean are those 25 and 2 also printed somewhere instead of 27?)
>
Yes, you can use the
export_sql.pl program to get the raw read counts. Specify
the '-d' option.
Best,
julian
> On Fri, Mar 22, 2013 at 7:27 PM, Julian Catchen <
jcat...@uoregon.edu
> <mailto:
jcat...@uoregon.edu>> wrote:
>
> Hi Amir,
>
> Answers below.
>
>
> On 3/21/13 3:52 PM, Amir Nikooienejad wrote:
>
> I want to make an input file for MapMaker and as you suggested previously in
> this thread, I use F2 cross in rqtl. Now I have some questions regarding
> that.
>
> 1. Two sets of files are generated: *_genotypes_1 and *_genotypes_3.
> What is the
> difference between these two and which should be used?
>
>
> The "_1" and "_3" correspond to the minimum number of progeny required to
> print a marker. When you run the pipeline through
denovo_map.pl
> <
http://denovo_map.pl>, it will set this value to 1, giving your first
> assigned by
denovo_map.pl <
http://denovo_map.pl> as it reads each sample
> into the pipeline.
>