--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.
Hi,Does the vcf header matter? i mean it's problem of the first chunk contains the vcf header?p
On 6 Apr 2017 12:31 p.m., "Marc Hoeppner" <mphoe...@gmail.com> wrote:
Hi,--this is a technical/design question, hoping to get a few pointers. I have a folder full of VCF files (let's assume they are uncompressed) and want toa) process all files in parallelb) split each vcf file into chunks of 5000 lines for parallelismc) annotate each chunk with e.g. VEPd) merge the chunks for each input file and create one output file per input file.I cannot seem to figure out how to do this with Channels or processes - I always lose the reference to the original input file for naming the output.Cheers,Marc
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
params.in_vcf = 'data/vcf'
params.out_dir = 'default/out/path'out_dir = file(params.out_dir)out_dir.mkdirs()Channel.fromPath(params.in_vcf).map { file -> tuple(file.name, file) }.splitText(by: 5000, file: true).set { chunks_ch }process annotate {input:set id, file(chunk) from chunks_choutput:set id, file('chunk.vep') into vep_chscript:"""VEP_annotation_command --in $chunk --out chunk.vep"""}vep_ch.collectFile().subscribe { merged_file -> merged_file.copyTo(out_dir) }
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
nextflow -c nextflow.config run main.nf --vcf '/path/to/files/*.vcf' --chunkSize 500This seems to produce all the expected chunked outputs - the error seems to be with how things are published in the collectVep and collectAnnovar processes - those seem to fail to run for all "branches".
--