I have a stage for alignment :
align = {
output.dir="/nesi/nobackup/uoa02461/data/intermediateFiles"
exec """bwa mem -t 12 -R '@RG\\tID:$SAMPLE\\tSM:$SAMPLE\\tLB:$SAMPLE\\tPL:ILLUMINA' $bwaIndex $input1 $input2 | samtools sort -@12 -O BAM -o $output.bam""", "align"
}
And I have samples for coming into the stage as $input1 and $input 2. When you echo $input1 and $input2, it gives the full filename i.e.
/nesi/nobackup/uoa02461/data/intermediateFiles/sample1_1.fastq.gz.trim
/nesi/nobackup/uoa02461/data/intermediateFiles/sample1_2.fastq.gz.trim
I'd like the sample name to be recorded in the bam files (not just in the name of the bam files), which is what $SAMPLE is meant to be for. How do I get from:
$input1 = /nesi/nobackup/uoa02461/data/intermediateFiles/sample1_1.fastq.gz.trim
to $SAMPLE = sample1
I've been trying to exec various bash pattern matching approaches, but it appears regex skills using bash have deteriorated or I'm attempting to integrate them incorrectly. And I never could grok pattern matching in Java.
Cheers
Ben.