Order of filtering

9 views
Skip to first unread message

Nigus Belay

unread,
Apr 7, 2026, 3:14:40 AM (6 days ago) Apr 7
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi, is it possible to start SNPs data filtering first from individuals and followed by loci


Thanks

Nigus

Brandon Monier

unread,
Apr 8, 2026, 11:22:21 AM (4 days ago) Apr 8
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Nigus,

Yes, this is possible. TASSEL provides two separate filter plugins:
  • FilterTaxaBuilderPlugin - filters individuals/taxa (by name list, missing data proportion, heterozygosity, etc.)
  • FilterSiteBuilderPlugin -  filters loci/sites (by position, MAF, site count, heterozygosity, indels, etc.)
Simply apply Filter Genotype Table Taxa first to filter individuals, then apply Filter Genotype Table Sites on the resulting dataset to filter loci. In the CLI pipeline, place your taxa filter arguments before your site filter arguments. Note that filtering individuals first means site-level statistics (like MAF) will be recalculated based on your retained individuals, which is often the desired behavior. Here's a CLI example:

./run_pipeline.pl -Xmx4g \
    -fork1 \
        -h genotypes.hmp.txt \
        -FilterTaxaBuilderPlugin \
            -minNotMissing 0.8 \
            -minHeterozygous 0.0 \
            -maxHeterozygous 0.05 \
        -endPlugin \
        -FilterSiteBuilderPlugin \
            -siteMinAlleleFreq 0.05 \
            -siteMaxAlleleFreq 0.95 \
            -siteMinCount 100 \
            -maxHeterozygous 0.9 \
            -removeSitesWithIndels true \
        -endPlugin \
        -export filtered_output.hmp.txt \
    -runfork1
Reply all
Reply to author
Forward
0 new messages