how to put an entropy threshold ?

32 views
Skip to first unread message

Karine Durand

unread,
Oct 23, 2018, 9:31:23 AM10/23/18
to MafFilter
Dear members,

I would like to filter my alignment to remove positions with more than 2 alleles. But I do not know how to configure maffilter
As you said in a previous answer  Entropy is calculated based on the frequencies of each state, if i wan't only two states at each position could i use max_ent ?
How calculate max_ent?

Here is my script :

maf.filter= EntropyFilter(     \
species=(xyl1199,xylco6c)   \
window.size=1,            \
input.file.compression=none,    \
window.step=1 ,                  \
max.ent=1 ,                    \
max.pos=2 ,                      \
missing_as_gap=yes ,             \
ignore_gaps=yes ,         \
compression=none ,               \
file=$(DATA)_filtered      )


I appreciate any help on this.
Best regards
Karine

Julien Y. Dutheil

unread,
Oct 23, 2018, 9:40:46 AM10/23/18
to MafFilter
Dear Karine,

Very good point, I'm afraid this is currently not (easily) feasible... The entropy filter will not really help you there, because (1) there is not a direct link between entropy and number of alleles and (2) entropy is computed per window and not per site (I have never tried to set window size to 1 site though...)
Removing multi-allelic sites could be a good feature request though. So far I have only done this type of filtering at a later step, after exporting to VCF for instance. The only current way to do something equivalent with maffilter would be first to compute all allele frequencies (SequenceStatistics with BlockCounts, in 1 nt windows), then process the (huge) result table to identify multi-allelic positions, list them in a BedGraph file, and then use a second pass of maffilter to filter out these positions (using FeatureFilter).

Best regards,

Julien.
Reply all
Reply to author
Forward
0 new messages