nLinesInBed=((bedFile.readLines().size()+30)/25).toInteger()
bedSplits = Channel.from(bedFile).splitText( by: nLinesInBed, file: true )
bams = Channel.fromPath("/productionTest/Bams.csv").splitCsv()
process haplotypeCaller {
maxForks 30
tag "$name "
echo true
input:
set name, bam, bai from bams
each bed from bedSplits
output:
file "${name}.part.vcf" into vcfs
"""
java -Xmx3000m -jar /PROGRAMS/gatk-local.jar HaplotypeCaller -R ${params.ref} -I ${bam} -O "${name}.part.vcf" -L $bed
"""
}
...
then somewhow:
java -Xmx3000m gatk \
GatherVcfs \
-I sample1.part1.vcf -I sample1.part2.vcf -I sample1.part3.vcf ... -I sample25.part4.vcf
-O sample1.raw.vcf
Hey Paolo,Greetings from Warsaw, I've just re-discovered NextFlow, it is great!!!I'd like to run a pipeline simultaneously with many samples, but there are moments, where each sample is "scattered" - then I need to gather results again, per sample - and here I'm confused how to proceed with it.
nLinesInBed=((bedFile.readLines().size()+30)/25).toInteger()
bedSplits = Channel.from(bedFile).splitText( by: nLinesInBed, file: true )
cegatBams = Channel.fromPath("/mnt/ssd_01/productionTest/cegatBams.csv").splitCsv()
process haplotypeCaller {
maxForks 30
tag "$name "
echo true
input:
set name, bam, bai from cegatBams
each bed from bedSplits
output:
file "${name}.part.vcf" into vcfs
"""
java -Xmx3000m -jar /PROGRAMS/gatk-local.jar HaplotypeCaller -R ${params.ref} -I ${bam} -O "${name}.part.vcf" -L $bed
"""
}
...
then somewhow:java -Xmx3000m gatk \
GatherVcfs \-I sample1.part1.vcf -I sample1.part2.vcf -I sample1.part3.vcf ... -I sample25.part4.vcf-O sample1.raw.vcf
so, per sample:1) I divide BED file to have 25 smaller .bed files2) I run 25 times HaplotypeCaller in parallel obtaining 25 .vcf files per sample3) somehow I need to combine these 25 .vcfs into 1 .vcf and still keep track which sample is which...Does it make sense to run analysis like this?Should I create kind of "Channel of Channels" or a "Channel with list of lists"?Or maybe should I run a NextFlow script from NextFlow script?Thanks for any hints!
--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
bedFile=file("/productionTest/test_head.bed")
nLinesInBed=((bedFile.readLines().size()+30)/25).toInteger()
bedSplits = Channel.from(bedFile).splitText( by:nLinesInBed, file:file("/productionTest/splits/" ))
bams = Channel.fromPath("/productionTest/Bams.csv").splitCsv()
process haplotypeCaller {
maxForks 5
tag "$name, $bn"
echo true
input:
set name, bam, bai from bams
each bed from bedSplits
output:
set val(name), file("${name}.${bn}.vcf") into vcfs
script:
bn=bed.baseName
"""
java -Xmx3000m -jar gatk.jar HaplotypeCaller -R ${params.ref} -I ${bam} -O ${name}.${bn}.vcf -L $bed
"""
}
process gatherVcfs {
maxForks 5
publishDir "productionTest/results/"
echo true
tag "$name"
input:
set val(name), file(vcf) from vcfs.groupTuple(sort:true)
output:
file("${name}.ready.vcf") into finalVCFs
script:
ins=vcf.join(" -I ") // <<<<<------------------ how to force this line to make in order??
"""
java -Xmx3000m -jar gatk.jar \
GatherVcfs \
-I $ins \
-O ${name}.ready.vcf
"""
}
java -jar gatk.jar GatherVcfs -I sample.test_head.6.vcf -I sample.test_head.17.vcf -I sample.test_head.7.vcf -I sample.test_head.1.vcf -I sample.test_head.8.vcf -I sample.test_head.12.vcf -I sample.test_head.13.vcf -I sample.test_head.10.vcf -I sample.test_head.11.vcf -I sample.test_head.2.vcf -I sample.test_head.14.vcf -I sample.test_head.3.vcf -I sample.test_head.16.vcf -I sample.test_head.9.vcf -I sample.test_head.15.vcf -I sample.test_head.4.vcf -I sample.test_head.5.vcf -O sample.ready.vcfjava -jar gatk.jar GatherVcfs -I sample.test_head.1.vcf -I sample.test_head.2.vcf -I sample.test_head.3.vcf -I sample.test_head.4.vcf ...ins=vcf.sort().join(" -I ")
ins=vcf.sort { it.tokenize('.')[2].toInteger() }.join(" -I ")
Caused by:
Cannot invoke method join() on null object
Source block:
ins=vcf.sort().join(" -I ")
"""
vcf-concat $ins > ${name}.ready.vcf
"""Caused by:
No signature of method: _nf_script_525f7296$_run_closure2$_closure19$_closure20.doCall() is applicable for argument types: (sun.nio.fs.UnixPath, sun.nio.fs.UnixPath) values: [52123MA.test_head.2.vcf.gz, 52123MA.test_head.3.vcf.gz]
Possible solutions: doCall(), doCall(java.lang.Object), findAll(), findAll()
Source block:ins=vcf.sort { it.name.tokenize('.')[2].toInteger() }.join(" -I ")
still doesn't work...
Caused by:
No signature of method: _nf_script_4c92845f$_run_closure2$_closure19$_closure20.doCall() is applicable for argument types: (sun.nio.fs.UnixPath, sun.nio.fs.UnixPath) values: [22789KS.test_head.2.vcf.gz, 22789KS.test_head.5.vcf.gz]
Possible solutions: doCall(), doCall(java.lang.Object), findAll(), findAll()
Source block:
ins=vcf.sort { it.name.tokenize('.')[2].toInteger() }.join(" -I ")
"""
echo $insagain:
Caused by:
No signature of method: _nf_script_a9bfb6a3$_run_closure2$_closure19$_closure20.doCall() is applicable for argument types: (sun.nio.fs.UnixPath, sun.nio.fs.UnixPath) values: [22789KS.test_head.10.vcf.gz, 22789KS.test_head.2.vcf.gz]
Possible solutions: doCall(), doCall(java.lang.Object), findAll(), findAll()
Source block:
ins = vcf instanceof Path ? vcf.name : vcf.sort { it.name.tokenize('.')[2].toInteger() }.join(" -I ")
"""
echo $ins Caused by:
No signature of method: _nf_script_e5d98854$_run_closure2$_closure19$_closure20.doCall() is applicable for argument types: (sun.nio.fs.UnixPath, sun.nio.fs.UnixPath) values: [52123MA.test_head.3.vcf.gz, 52123MA.test_head.2.vcf.gz]
Possible solutions: doCall(), doCall(java.lang.Object), findAll(), findAll()
Source block:
ins = vcf instanceof Path ? vcf : vcf.sort { it.tokenize('.')[2].toInteger() }.join(" -I ")
"""
echo $insERROR ~ Error executing process > 'gatherVcfs (name1)'Caused by:Process `gatherVcfs` input file name collision -- There are multiple input files for each of the following file names: name1.splits.vcf.gz.tbi, name1.splits.vcf.gz
ok, in the attachment there. You should be able to run this example and obtain similar error.Thanks!
--
process haplotypeCaller {
maxForks 5
tag "$name, $bn"
echo true
input:
set name, bam, bai from someBams
each bed from bedSplits
output:
set val(name), file("${name}.vcf.gz"), file("${name}.vcf.gz.tbi") into vcfParts
set val(name), file("${name}.HC.bam"), file("${name}.HC.bai") into hcBamParts
script:
bn=bed.baseName
"""
echo "uuu${bam}aaa" > ${name}.vcf.gz.tbi
echo "uuu${bam}aaa" > ${name}.HC.bam
echo "uuuuu" > ${name}.vcf.gz
echo "kkkkk" > ${name}.HC.bai
"""[89/b79c4d] Submitted process > haplotypeCaller (name3, test_head.17)
ERROR ~ Error executing process > 'gatherVcfs (name3)'
Caused by:
Process `gatherVcfs` input file name collision -- There are multiple input files for each of the following file names: name3.vcf.gz, name3.vcf.gz.tbi
process haplotypeCaller {
maxForks 5
tag "$name, $bn"
echo true
input:
set name, bam, bai from someBams
each bed from bedSplits
output:
set val(name), file("${name}.${bn}.vcf.gz"), file("${name}.${bn}.vcf.gz.tbi") into vcfParts
set val(name), file("${name}.${bn}.HC.bam"), file("${name}.${bn}.HC.bai") into hcBamParts
script:
bn=bed.baseName
"""
echo "uuu${bam}aaa" > ${name}.${bn}.vcf.gz.tbi
echo "uuu${bam}aaa" > ${name}.${bn}.HC.bam
echo "uuuuu" > ${name}.${bn}.vcf.gz
echo "kkkkk" > ${name}.${bn}.HC.bai
"""
}test_head.10.bed test_head.12.bed test_head.14.bed test_head.16.bed test_head.1.bed test_head.3.bed test_head.5.bed test_head.7.bed test_head.9.bed
test_head.11.bed test_head.13.bed test_head.15.bed test_head.17.bed test_head.2.bed test_head.4.bed test_head.6.bed test_head.8.bed[f9/37aef8] Submitted process > haplotypeCaller (name3, test_head.17)
ERROR ~ Error executing process > 'gatherVcfs (name3)'
Caused by:
Cannot invoke method join() on null object
Source block:
ins = vcf.sort().join(" -I ")
"""
echo $ins
bams = Channel.fromPath("/path/Bams.csv").splitCsv()
process proces1 {
input:
set name, bam, bai from bams
output:
set val(name), file("${name}.${bn}.vcf.gz"), file("${name}.${bn}.vcf.gz.tbi") into vcfsPartsprocess proces2 {
input:
set val(name), file(vcf), file(vcfIdx) from vcfParts.groupTuple()
script:
ins=vcf.sort { it.name.tokenize('.')[2].toInteger() }.join(" -I ")set name, vcf, vcfIdx from vcfParts.groupTuple()name1,/path/name1.bam,/path/name1.bam.bai
name2,/path/name2.bam,/path/name2.bam.bai
name3,/path/nama3.bam,/path/name3.bam.bai--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
ok, I've solved it in this way:
script:
"""
echo "${vcf.join('\n')}" | sort -V > ${name}.vcf.list
java -jar /path/gatk.jar GatherVcfs \
-I ${name}.vcf.list \
-O ${name}.raw.NF.vcf;
"""ins=vcf.sort { it.name.tokenize('.')[2].toInteger() }.join(" -I ")