Hi Fernando,
Cortex may not call at a particular region in the genome if:
1) It has low coverage (possibly caused by sequencing error)
- By aligning your genome to the cortex binaries, you can check if
there is sequence coverage in the graphs.
2) The genome region is too repetitive / similar to other regions of
the genome (low complexity)
- If you map your reads to the genome, they should have low mapping
scores where the genome has low complexity.
3) There is a high rate of variation between samples in the region
- This is the hardest to test for. If you dump supernodes and map
them to the genome, these regions will be covered by zero or many
supernodes -- although this may not be a very reliable test.
Another approach to find homozygous regions of the genomes may be to:
1) Use multiple calling approaches, taking the union set of all calls.
2) Map all your reads to the genome
Homozygous regions of the genome would then be regions that have no
variants called nearby and have high read coverage and high read
mapping scores. This isn't something I've looked into so other people
may have better suggestions.
Isaac
On 19 September 2012 11:51, Fer <
fernando...@gmail.com> wrote:
> Hi Isaac,
>
> PSMC requires to know whether a base is a confident homozygous reference or
> is not callable. Is there any way to know this using cortex?
>
> I do have 11 clean/unclean binaries for 20-26 X samplaes and did call of
> genotypes and variants (using cortex_var 5.0.3.1 without using reference
> genome).
>
> Cheers,
> Fernando