Hi,
am struggling with doing a merging of FastQ files where libraries were split across multiple lanes. I would like to:
- group all fastq files by library
- concatenate the forward and reverse into one file each (something like : zcat $fw_reads | gzip > $merged_fw)
- and then continue my "normal" processing of PE reads
Read group and such are not important to preserve as I am not using this for i.e. GATK variant calling or whatever.
I am guessing that the "groupBy" operator plays a role, but cannot get it to work. Maybe someone can put me on the right track? As an example of input files:
F15657-L1_S11_L001_R1_001.fastq.gz F15657-L1_S11_L004_R1_001.fastq.gz F15658-L1_S12_L003_R1_001.fastq.gz F15659-L1_S13_L002_R1_001.fastq.gz
F15657-L1_S11_L001_R2_001.fastq.gz F15657-L1_S11_L004_R2_001.fastq.gz F15658-L1_S12_L003_R2_001.fastq.gz F15659-L1_S13_L002_R2_001.fastq.gz
F15657-L1_S11_L002_R1_001.fastq.gz F15658-L1_S12_L001_R1_001.fastq.gz F15658-L1_S12_L004_R1_001.fastq.gz F15659-L1_S13_L003_R1_001.fastq.gz
F15657-L1_S11_L002_R2_001.fastq.gz F15658-L1_S12_L001_R2_001.fastq.gz F15658-L1_S12_L004_R2_001.fastq.gz F15659-L1_S13_L003_R2_001.fastq.gz
F15657-L1_S11_L003_R1_001.fastq.gz F15658-L1_S12_L002_R1_001.fastq.gz F15659-L1_S13_L001_R1_001.fastq.gz F15659-L1_S13_L004_R1_001.fastq.gz
F15657-L1_S11_L003_R2_001.fastq.gz F15658-L1_S12_L002_R2_001.fastq.gz F15659-L1_S13_L001_R2_001.fastq.gz F15659-L1_S13_L004_R2_001.fastq.gz
Should produce something like, grouped into a channel as PE reads with the library ID as "key"
F15657-L1_S11_R1.fastq.gz
F15657-L1_S11_R2.fastq.gz
F15658-L1_S12_R1.fastq.gz
F15658-L1_S12_R2.fastq.gz
F15659-L1_S13_R1.fastq.gz
F15659-L1_S13_R2.fastq.gz
/M