question regarding top level nets

29 views
Skip to first unread message

Zev Kronenberg

unread,
Aug 24, 2016, 3:46:19 PM8/24/16
to gen...@soe.ucsc.edu
Greetings,  

I’ve been working on a pipeline for species level pairwise alignments. I’ve gone through standard chaining and netting.

There are two steps I cannot figure based on the docs.

1. I want to remove interlaced chain fills.  Lets say i have two chains A and B that look something like:

  AAAA BBB AAAA BB AAAAAAA BB AAAAAAAAAA

2.  After netting I’d like to have strictly increasing query coordinates.  so no overlapping query sequences.  Is that even possible while maintaining inversion information?


Essentially I just want netting to give me a one to one mapping of my query chains to the target genome.  

A pipeline excerpt is shown below:



rule finalMafToSam:
input : MAF="all_maf/all.maf"
output: temp("all_sam/all.sam")
params: sge_opts=config["cluster_settings"]["light"]
shell: "maf-convert sam {input.MAF} > {output}"
rule finalAxtToMaf:
input: AX="all_axt/all_filt_no_class.axt", TS="stats/target.sizes", QS="stats/query.sizes"
output: temp("all_maf/all.maf")
params: sge_opts=config["cluster_settings"]["light"]
shell: "axtToMaf {input.AX} {input.TS} {input.QS} {output}"
rule netToAxt:
input: NET="net/all-no-class.net", CHAIN="merged_chains/all.chain.filter.prenet.chain", TA2="2bits/target.2bit", QU2="2bits/query.2bit"
output: temp("all_axt/all_filt_no_class.axt")
params: sge_opts=config["cluster_settings"]["light"]
shell: "netToAxt {input.NET} {input.CHAIN} {input.TA2} {input.QU2} {output}"
rule filtNet:
input: "net/all-no-class.net"
output: "filt_net/all-no-class.filt.net"
params: sge_opts=config["cluster_settings"]["light"]
shell: "netFilter -chimpSyn {input} > {output} && netToBed {output} filt_net/all-no-class-filter.bed"
rule net:
input: CHAIN="merged_chains/all.chain.filter.prenet.chain", TS="stats/target.sizes", QS="stats/query.sizes"
output: "net/all-no-class.net"
params: sge_opts=config["cluster_settings"]["light"]
shell: "chainNet {input.CHAIN} -minSpace=1 {input.TS} {input.QS} stdout /dev/null | netSyntenic stdin {output}"
rule filter:
input: CHAINS="merged_chains/all.chain", TS="stats/target.sizes", QS="stats/query.sizes"
params: sge_opts=config["cluster_settings"]["light"]
output: "merged_chains/all.chain.filter.prenet.chain"
shell: "chainPreNet {input.CHAINS} {input.TS} {input.QS} {output}"
rule merge:
input: expand("chained_psl/{contig}.chained.psl", contig=REGIONS)
output: "merged_chains/all.chain"
params: sge_opts=config["cluster_settings"]["light"]
shell: "chainMergeSort {input} > {output}"
rule chain:
input: PSL="raw_psl/{contig}.psl", T2BIT="2bits/target.2bit", Q2BIT="2bits/query.2bit"
output: temp("chained_psl/{contig}.chained.psl")
params: sge_opts=config["cluster_settings"]["light"]
shell: "axtChain -linearGap=medium -psl {input.PSL} {input.T2BIT} {input.Q2BIT} {output}"
rule lav_to_psl:
input: "raw_lav/{contig}.lav"
output: "raw_psl/{contig}.psl"
params: sge_opts="-l mfree=10G -l h_rt=24:00:00 -q eichler-short.q"
shell: "lavToPsl {input} {output}"
rule runLastZ:
input: T={TARGET}, Q={QUERY}
output: "raw_lav/{contig}.lav"
params: sge_opts=config["cluster_settings"]["heaviest"]
shell: "{RS_LT} ; {RS_LQ} ; {RS_LTI} ; {RS_LQI} ; {FAIDX} {wildcards.contig} > $TMPDIR/q.fasta ; {LZ} $TMPDIR/q.fasta {LQ} {POST} > {output}"







Zev Kronenberg Ph.D.





Matthew Speir

unread,
Sep 1, 2016, 10:29:32 AM9/1/16
to Zev Kronenberg, gen...@soe.ucsc.edu
Hi Zev,

Thank you for your question about top-level nets.

One of our engineers notes that you may be able to use the utility "netFilter" to extract this information. You will have to experiment with the parameters and options to see which combination will extract what you're interested in.

You can download the "netFilter" utility for various operating systems here: http://hgdownload.soe.ucsc.edu/admin/exe/.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages