sdm quality filtering - output files

31 views
Skip to first unread message

jobl...@gmail.com

unread,
May 15, 2017, 2:47:36 AM5/15/17
to LotuS rRNA pipeline
Hi Falk,

I am trying to perform quality filtering in Lotus separately by using ./sdm. It's working great, but I have some questions about the output files. 

I am starting from already demultiplexed files from Miseq paired end reads. 
This is the command i run (I apologize for the horrible paths, but it may be more clear to you with absolute ones)

./sdm -i_path /vscmnt/gent_vulpix/_/user/data/gent/vscxx/vscxx/Input_fastq/Joe_data43_46/ \ 
-o_fastq /vscmnt/gent_vulpix/_/user/data/gent/vscxx/vscxx/Input_fastq/Joe_data43_46/qualfiltered/standard/f1.fq,/vscmnt/gent_vulpix/_/user/data/gent/vscxx/vscxx/Input_fastq/Joe_data43_46/qualfiltered/standard/r1.fq \ 
-o_demultiplex /vscmnt/gent_vulpix/_/user/data/gent/vscxx/vscxx/Input_fastq/Joe_data43_46/qualfiltered/standard -map /vscmnt/gent_vulpix/_/user/home/gent/vscxx/vscxx/LoTus/lotus_pipeline/automapJoe43-46.txt \ 
-options /vscmnt/gent_vulpix/_/user/home/gent/vscxx/vscxx/LoTus/lotus_pipeline/sdm_miSeq_withcomments_130417.txt \ 
-log /vscmnt/gent_vulpix/_/user/data/gent/vscxx/vscxx/Input_fastq/Joe_data43_46/qualfiltered/standard

The input folder contains forward and reverse reads from 5 samples (so 5x R1, 5x R2).

Questions:
1) Via o_demultiplex I get 1 file for each sample (so not a seperate for read 1 and 2): which filtered sequences are in here? (highqual or both highqual and midqual? read 1 and read 2?)
2) Via o_fastq I get 2 files (f1, r1 ; as I specified in the command): are these the original fastqs? In that case, should I define separate names in the command for all files (10 names)?
3) For running Lotus on the files, should I use the output of o_demultiplex? In that case should I change the mapping file for running Lotus in a way that the Fastq column refers to the 1 file/sample from O_demultiplex and not to R1,R2?

If you could help me clarify this (whenever is you find time), you would help me a lot!

Thanks!!

Jolien

Falk Hildebrand

unread,
May 15, 2017, 11:12:10 AM5/15/17
to LotuS rRNA pipeline
Hey Jolien,
1) these are only hiQual sequences
2) No, these are filtered. If you are only interested in separate files, you can leave the -o_fastq away. If you want the original reads, use -o_demultiplex, but don't define -o_fastq and -options.
3) Depends on what you want to do. In general I do not recommend to run lotus on the demultiplexed files, as you introduce an extra possible source of errors, but in practice it shouldn't make a difference. Note that you also have the option to let lotus create the demultiplexed files (that you might need for uploading the fata to sequence archieves):
  • -saveDemultiplex [1: Saves all demultiplexed & filtered reads in the [outputdir]/demultiplexed folder, for easier data upload. 2: Only saves quality filtered demultiplexed reads and continues LotuS run subsequently. 3: Saves demultiplexed file into a single fq, saving sample ID in fastq/a header. 0: No demultiplexed reads are saved. Default: 0]
hope I could answer all your questions,
Falk
Reply all
Reply to author
Forward
0 new messages