Suggestion for additional section in the YAML

21 views
Skip to first unread message

Paulo Borges

unread,
Dec 15, 2020, 8:02:39 AM12/15/20
to pigx
Hi all!

I would like to suggest some extensions to the YAML file even though I am not sure if it is already handle by the pigx pipeline.

Is it possible to open a new section in the YAML file
#Patterns detection:
ptrn1: _R1
ptrn2: _R2
ext: .fq

The reason I am suggesting the following is that while I was trying to setup myself a pipeline using snakemake, I end with issues with the files extensions *.fq and *.fastq. I don't remember exactly which tool (some of the one used in pigx) could not detect one of the extensions. I had to rename it manually, it was ok because they were few. This however can be a extremely tedious work and prone to error.

Additionally, I notice that the _R1, _R2 pattern is hard coded (at least in pigx_rnaseq.py). I also had issues by assuming the same. The file patterns were _1, _2.

My solution was to setup these as arguments in the beginning of my snakefile:

#File pattern detection
ptrn1 = "_1"
ptrn2 = "_2"
#File extention detection
ext = ".fq"

Usage:
rule trim_galore_pe:
    input:
        [path_reads + "{sample}" + ptrn1 + ext + ".gz", path_reads + "{sample}" + ptrn2 + ext + ".gz"]
   .....

Hope this is useful for improving the usage of the pipeline.

Paulo

Paulo Borges

unread,
Dec 17, 2020, 8:06:58 AM12/17/20
to pigx
Hi again!

I perform a test by renaming the extension of the files provided for test from *fastq to *.fq.
The following is an example of the output! It generate one of this for each sample dataset!

*************************
[Thu Dec 17 13:58:01 2020]
Error in rule trim_galore_pe:
    jobid: 15
    output: /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1_R1.fastq.gz, /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1_R2.fastq.gz
    log: /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/logs/trim_galore_UHR_Rep1.log (check log file(s) for error message)
    shell:
        /gnu/store/lzkx6jcfgzwbm33n74lii5rk7x28jpdr-trim-galore-0.6.1/bin/trim_galore  -o /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads --paired /media/oem/Seagate/Projects/RNASeq_Test2/Data/UHR_Rep1.read1.fq.gz /media/oem/Seagate/Projects/RNASeq_Test2/Data/UHR_Rep1.read2.fq.gz >> /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/logs/trim_galore_UHR_Rep1.log 2>&1 && sleep 10 && mv /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1.read1.fq.gz /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1_R1.fastq.gz && mv /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1.read2.fq.gz /media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep1_R2.fastq.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

mv: cannot stat '/media/oem/Seagate/Projects/RNASeq_Test2/PIGX_Output_test/trimmed_reads/UHR_Rep3.read1.fq.gz': No such file or directory

-------------------------------
No errors were generated when using *.fastq as extension


Paulo

alex....@gmail.com

unread,
Feb 2, 2021, 5:28:00 AM2/2/21
to pigx
Hi Paulo,

Thank you for your suggestion to improve the handling of different fastq file extensions, but we already handle this in the pipelines. 
In general, if the given fastq files have common extensions like .fastq / .fq ( with or without .gz) the pipeline should work just fine. The pattern identifying the mates of the paired reads is also not relevant, as the order is determined from respective columns of the sample sheet. 

Concerning the error that you described in your second mail, it seems that your modification of the rule did not consider the filename pattern produced by trim_galore for the output files. You may check how we solved this in the pigx-rnaseq pipeline (https://github.com/BIMSBbioinfo/pigx_rnaseq/blob/1375b63be6d39a58388d8bf2adc62997645d81fa/snakefile.py#L258). 

Best,

Alex
Reply all
Reply to author
Forward
0 new messages