count block size with with only non-N's.

56 views
Skip to first unread message

michael fontaine

unread,
Apr 16, 2016, 8:06:06 AM4/16/16
to MafFilter
Hi Everybody,

I have been trying to use MafFilter for describing the quality of my MAF alignment including 10 reference genomes.
 
1- I begin with a some crude description of the number of blocks their size and length, easy.

2- Then, I wish to concatenate the blocks up to 20kb, and split them up in windows of 2kb, and calculate the number sequences aligned in each window. 

After concatenation and split, some windows can look like that:

Seq1    AT–GCCTTTGANA
Seq2    ATAGCCTTTGATA
Seq3    ATAGCCTTTGAAA
Seq4    NNNNNNNN–NN–N

If I use the stat BlockSize, MafFilter always returns 10 (which is the total number of sequence in my alignment), even if some sequence are only present as N's in a given window. Is there a way to ask MafFilter to count the size of my alignment that include sequenced that are actually defined as non-N's (non-Gaps) only? In other word, I want either to avoid count the sequences that only include N's and Gaps.

Thanks in advance
Michael


michael fontaine

unread,
Apr 16, 2016, 8:28:41 AM4/16/16
to MafFilter
More simply speaking, can MafFilter remove empty sequences (gap-only and N-only) from a block?

Julien Yann Dutheil

unread,
Apr 16, 2016, 3:07:37 PM4/16/16
to MafFilter
Hi Michael,

Actually no, we do not currently have a filter to remove unresolved or gap only sequences in a block, but that would be rather straightforward to do (and a useful addition). I can do that next week, would that be ok?

J.

michael fontaine

unread,
Apr 17, 2016, 9:25:20 AM4/17/16
to MafFilter
This would awesome!
Thanks!
Michael
Reply all
Reply to author
Forward
0 new messages