filtering on command line

512 views
Skip to first unread message

Jeff Ross-Ibarra

unread,
Feb 21, 2016, 1:02:58 AM2/21/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Apologies if I've missed this somewhere, but I want to run filtering on the command line.  In the GUI, I filter site min count and then taxa on min proportion sites present.  I'd like to do this on the command line but can't seem to figure out which of

net.maizegenetics.analysis.filter.FilterAlignmentPlugin
net.maizegenetics.analysis.filter.FilterDataSetPlugin
net.maizegenetics.analysis.filter.FilterSiteBuilderPlugin
net.maizegenetics.analysis.filter.FilterSiteNamePlugin
net.maizegenetics.analysis.filter.FilterSubsetPlugin
net.maizegenetics.analysis.filter.FilterTaxaAlignmentPlugin
net.maizegenetics.analysis.filter.FilterTaxaPropertiesPlugin
net.maizegenetics.analysis.filter.FilterTraitsPlugin

I need to run and what the command line options are.  Did I miss this somewhere obvious?

Thanks!

-Jeff

Terry Casstevens

unread,
Feb 21, 2016, 9:21:48 AM2/21/16
to Tassel User Group
Hi Jeff,

FilterSiteBuilderPlugin is the preferred way to filter sites now.
There will be a FilterTaxaBuilderPlugin in the future also. But for
now, FilterTaxaPropertiesPlugin can be used.

./run_pipeline.pl -importGuess mdp_genotype.hmp.txt
-FilterSiteBuilderPlugin -siteMinCount 10 -endPlugin
-FilterTaxaPropertiesPlugin -minNotMissing 0.1 -endPlugin -export temp
-exportType Hapmap

Best,

Terry



./run_pipeline.pl -FilterSiteBuilderPlugin -help

./lib/ahocorasick-0.2.4.jar:./lib/batik-awt-util.jar:./lib/batik-css.jar:./lib/batik-dom.jar:./lib/batik-ext.jar:./lib/batik-gui-util.jar:./lib/batik-gvt.jar:./lib/batik-parser.jar:./lib/batik-svg-dom.jar:./lib/batik-svggen.jar:./lib/batik-util.jar:./lib/batik-xml.jar:./lib/biojava-alignment-4.0.0.jar:./lib/biojava-core-4.0.0.jar:./lib/biojava-phylo-4.0.0.jar:./lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:./lib/colt.jar:./lib/commons-codec-1.10.jar:./lib/commons-math3-3.4.1.jar:./lib/ejml-0.23.jar:./lib/forester_1034.jar:./lib/geoip2-0.9.0.jar:./lib/geronimo-spec-activation-1.0.2-rc4.jar:./lib/gson-2.3.1.jar:./lib/guava-14.0.1.jar:./lib/htsjdk-1.138.jar:./lib/itextpdf-5.1.0.jar:./lib/jackson-annotations-2.4.0.jar:./lib/jackson-core-2.4.2.jar:./lib/jackson-databind-2.4.2.jar:./lib/jargon-core-3.3.2-beta2.jar:./lib/javax.json-1.0.4.jar:./lib/javax.json-api-1.0.jar:./lib/jcommon-1.0.6.jar:./lib/je-6.0.11.jar:./lib/jfreechart-1.0.3.jar:./lib/jfxrt.jar:./lib/json-org.jar:./lib/json-simple-1.1.1.jar:./lib/junit-4.10.jar:./lib/log4j-1.2.13.jar:./lib/mail-1.4.jar:./lib/maxmind-db-0.3.4.jar:./lib/poi-3.0.1-FINAL-20070705.jar:./lib/postgresql-9.4-1201.jdbc41.jar:./lib/slf4j-api-1.7.10.jar:./lib/slf4j-simple-1.7.10.jar:./lib/snappy-java-1.1.1.6.jar:./lib/sqlite-jdbc-3.8.5-pre1.jar:./lib/trove-3.0.3.jar:./lib/xercesImpl.jar:./lib/xml.jar:./lib/xmlParserAPIs.jar:./dist/sTASSEL.jar

Memory Settings: -Xms512m -Xmx1536m

Tassel Pipeline Arguments: -FilterSiteBuilderPlugin -help

[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version:
5.2.20 Date: February 4, 2016

[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available
Memory Reported by JVM: 1365 MB

[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_40

[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Mac OS X

[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 8

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel
Pipeline Arguments: [-fork1, -FilterSiteBuilderPlugin, -help,
-runfork1]

[main] INFO net.maizegenetics.plugindef.AbstractPlugin -
setParameters: for:
net.maizegenetics.analysis.filter.FilterSiteBuilderPlugin Starting:
Feb 21, 2016 9:13:39

[main] ERROR net.maizegenetics.plugindef.AbstractPlugin - Unrecognized
argument: -help

[main] INFO net.maizegenetics.plugindef.AbstractPlugin -

Usage:

FilterSiteBuilderPlugin <options>

-filterName <Filter Name> : Filter Name (Default: Filter)

-siteMinCount <Site Min Count> : Site Minimum Count of Alleles not
Unknown [0‥+∞) (Default: 0)

-siteMinAlleleFreq <Site Min Allele Freq> : Site Minimum Minor Allele
Frequency [0.0‥1.0] (Default: 0.0)

-siteMaxAlleleFreq <Site Max Allele Freq> : Site Maximum Minor Allele
Frequency [0.0‥1.0] (Default: 1.0)

-removeMinorSNPStates <true | false> : Remove Minor S N P States
(Default: false)

-siteRangeFilterType <Site Range Filter Type> : True if filtering by
site numbers. False if filtering by chromosome and position (Default:
NONE)

-startSite <Start Site> : Start Site [0‥+∞) (Default: 0)

-endSite <End Site> : End Site [0‥+∞) (Default: 0)

-startChr <Start Chr> : Start Chr

-startPos <Start Pos> : Start Pos [0‥+∞) (Default: 0)

-endChr <End Chr> : End Chr

-endPos <End Pos> : End Pos [0‥+∞) (Default: 0)

-includeSites <true | false> : Include Sites (Default: true)

-positionList <Position List> : Filter based on position list.

-siteNames <Site Names> : Filter based on site names.

-bedFile <Bed File> : Filter based on BED file.

-chrPosFile <Chr Pos File> : Filter based on list of chromsome /
position in file.



./run_pipeline.pl -FilterTaxaPropertiesPlugin -help

./lib/ahocorasick-0.2.4.jar:./lib/batik-awt-util.jar:./lib/batik-css.jar:./lib/batik-dom.jar:./lib/batik-ext.jar:./lib/batik-gui-util.jar:./lib/batik-gvt.jar:./lib/batik-parser.jar:./lib/batik-svg-dom.jar:./lib/batik-svggen.jar:./lib/batik-util.jar:./lib/batik-xml.jar:./lib/biojava-alignment-4.0.0.jar:./lib/biojava-core-4.0.0.jar:./lib/biojava-phylo-4.0.0.jar:./lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:./lib/colt.jar:./lib/commons-codec-1.10.jar:./lib/commons-math3-3.4.1.jar:./lib/ejml-0.23.jar:./lib/forester_1034.jar:./lib/geoip2-0.9.0.jar:./lib/geronimo-spec-activation-1.0.2-rc4.jar:./lib/gson-2.3.1.jar:./lib/guava-14.0.1.jar:./lib/htsjdk-1.138.jar:./lib/itextpdf-5.1.0.jar:./lib/jackson-annotations-2.4.0.jar:./lib/jackson-core-2.4.2.jar:./lib/jackson-databind-2.4.2.jar:./lib/jargon-core-3.3.2-beta2.jar:./lib/javax.json-1.0.4.jar:./lib/javax.json-api-1.0.jar:./lib/jcommon-1.0.6.jar:./lib/je-6.0.11.jar:./lib/jfreechart-1.0.3.jar:./lib/jfxrt.jar:./lib/json-org.jar:./lib/json-simple-1.1.1.jar:./lib/junit-4.10.jar:./lib/log4j-1.2.13.jar:./lib/mail-1.4.jar:./lib/maxmind-db-0.3.4.jar:./lib/poi-3.0.1-FINAL-20070705.jar:./lib/postgresql-9.4-1201.jdbc41.jar:./lib/slf4j-api-1.7.10.jar:./lib/slf4j-simple-1.7.10.jar:./lib/snappy-java-1.1.1.6.jar:./lib/sqlite-jdbc-3.8.5-pre1.jar:./lib/trove-3.0.3.jar:./lib/xercesImpl.jar:./lib/xml.jar:./lib/xmlParserAPIs.jar:./dist/sTASSEL.jar

Memory Settings: -Xms512m -Xmx1536m

Tassel Pipeline Arguments: -FilterTaxaPropertiesPlugin -help

[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version:
5.2.20 Date: February 4, 2016

[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available
Memory Reported by JVM: 1365 MB

[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_40

[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Mac OS X

[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 8

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel
Pipeline Arguments: [-fork1, -FilterTaxaPropertiesPlugin, -help,
-runfork1]

[main] INFO net.maizegenetics.plugindef.AbstractPlugin -
setParameters: for:
net.maizegenetics.analysis.filter.FilterTaxaPropertiesPlugin Starting:
Feb 21, 2016 9:15:22

[main] ERROR net.maizegenetics.plugindef.AbstractPlugin - Unrecognized
argument: -help

[main] INFO net.maizegenetics.plugindef.AbstractPlugin -

Usage:

FilterTaxaPropertiesPlugin <options>

-minNotMissing <Min Proportion of Sites Present> : Min Proportion of
Sites Present [0.0‥1.0] (Default: 0.0)

-minHeterozygous <Min Heterozygous Proportion> : Min Heterozygous
Proportion [0.0‥1.0] (Default: 0.0)

-maxHeterozygous <Max Heterozygous Proportion> : Max Heterozygous
Proportion [0.0‥1.0] (Default: 1.0)
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/85a88339-05ef-4b8e-98ed-be4f4aa367fc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Elhan Ersoz

unread,
Mar 20, 2017, 10:11:29 PM3/20/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi all,
I am having trouble filtering with BED file- would you mind posting an example for the BED file format here?

I am using the standard BED format described here : https://genome.ucsc.edu/FAQ/FAQformat#format1

I tried having column headers( chrom\tchromStart\tchromEnd) or not- and in either case it doesn't read the files.
Here is what the BED file I have looks like

chr1    3190712         3200233
chr1    3699243         3708857
chr1    3731400         3742325
chr1    4055044         4062718
chr1    4445279         4453412
chr1    4545557         4557453
chr1    4741870         4766734
chr1    4994210         5014193
chr1    5525767         5542479

I tried adding or removing the "chr" before the chomosome ID,
I tried adding/ removing  the column headers
I checked the spacing to make sure it is tab delimited., I even converted from tab delimited to space delimited and tried it as such..
I also made sure the file was converted for linux with dos2unix- and still no go.

It just won't read the bed files I created. Stops with error "filterSitesbyBedFile: problem reading  <insert file name and path here> line <first line of the file>

Any insights into this issue would be appreciated.

-Elhan

Terry Casstevens

unread,
Mar 20, 2017, 10:23:58 PM3/20/17
to Tassel User Group
These fields need to be separated by tabs. I'm guess you have space somewhere.

Best,

Terry
> https://groups.google.com/d/msgid/tassel/0901680e-4aed-412e-84bd-be3cc89c6ad0%40googlegroups.com.

Elhan Ersoz

unread,
Mar 21, 2017, 9:49:36 AM3/21/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Terry,
Thanks for the help. I did a search and replace for space on my bed files, and it works now.
-Elhan
Reply all
Reply to author
Forward
0 new messages