Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

multiple_join_paired_ends.py change output file name

25 views
Skip to first unread message

Anushka Khasnobish

unread,
Sep 11, 2019, 3:54:23 AM9/11/19
to qiime...@googlegroups.com
Hello,

I have Illumina Miseq paired end sequencing reads of 51 samples (102 fastq files; 1 forwards+1 reverse for each sample).

Initially I was using fastq-join to join each sample's forward and reverse strands but this is taking a lot of time. So instead I want to use  multiple_join_paired_ends.py and this was my command 

multiple_join_paired_ends.py -i ../0_data/ -o ../1_fastq_join/ -p fastqjoin_params.txt 
 
where all the forward and reverse sequence (2* .fastq files per sample) of every sample is present in the 0_data folder 
and  the parameters file contains this information 
join_paired_ends:min_overlap 50
join_paired_ends:perc_max_diff 8
the command runs successfully  however I find that the output is a folder for every sample that contains fastqjoin.join.fastq, fastqjoin.un1.fastq and fastqjoin.un2.fastq 
files.

I want to edit the script of multiple_join_paired_ends.py in such a way that the output are three files for each sample but the output filename will have 'filename'.join.fastq instead of fastqjoin.join.fastq. And if this happens, I dont need the outputs to be in separate folders in the output folder, they can be just output fastq files with individual filename.

since I am not good with programming or scripting, can anyone help me how to edit the script to achieve this above objective ?


In case the above is complicated, please check the Condense Paired Files subheading in the article http://ase.tufts.edu/chemistry/walt/sepa/WMresources/WM_dataWalkthrough_QIIME.pdf . In the text they mention a bash script file to condensed and renamed fastqjoin.join.fastq files from multiple_join_paired_ends.py output folders. Can anybody provide me this bash script file?


Thank you in advance.

TonyWalters

unread,
Sep 11, 2019, 4:49:16 AM9/11/19
to Qiime 1 Forum
Hello Anushka,

I would recommend, unless you have specific reasons for using this approach, to use QIIME2 and DADA2 for handling the paired end data (this will deal with stitching the reads, chimera checking, and denoising Illumina data).

installation, importing paired-end data, and denoising with dada2 section of an overview tutorial links (it's a good idea to go through the tutorial once, and visualize parts of it, like the raw sequence quality scores, as you can optimize the parameters for dada2 via quality drop-off regions):

I'm not sure of the bash script mentioned in that pdf is, I don't think it came from us.

I did find an example of a python script that might be helpful for renaming the files, if you specifically need to use qiime1 :
http://www.salemmarafi.com/code/recursively-rename-files-to-their-parent-folders-name/

There is also an example of a bash script, the first answer on this page: https://askubuntu.com/questions/529705/rename-files-in-subfolders-with-parent-folder-names
But you would have to be careful to specify: find mainDirectory -name "fastqjoin.join.fastq" -exec changeName.sh {} \; for executing the command so it only got the joined files (and I'd recommend creating a few test folders/files first before using it on the actual data).

I hope this helps,
Tony



Reply all
Reply to author
Forward
0 new messages