Mapping loop [main_samview] truncated file error

604 views
Skip to first unread message

Flo

unread,
Jan 23, 2018, 5:39:48 AM1/23/18
to Anvi'o
Dear Anvi'o masters,

I am using the following mapping loop to map the reads on the contigs, as suggested on Anvi'o website.

for sample in `awk '{print $1}' samples_all.txt`
do
    echo
"###Mapping Sample" $sample "Start###"
ls $READS_DNA
/$sample*QUALITY_PASSED_R1* $READS_DNA/$sample*QUALITY_PASSED_R2*
    bowtie2
--threads $NUM_THREADS \
   
-x 04_MAPPING/contigs \
   
-1 $READS_DNA/$sample*QUALITY_PASSED_R1* -2 $READS_DNA/$sample*QUALITY_PASSED_R2* \
   
-S 04_MAPPING/$sample.sam >04_MAPPING/$sample"_bowtie_report.txt" 2>&1
    samtools view
-F 4 -bS 04_MAPPING/$sample.sam -@ $NUM_THREADS -o 04_MAPPING/$sample-RAW.bam
    anvi
-init-bam 04_MAPPING/$sample-RAW.bam \
   
-o 04_MAPPING/$sample.bam
    rm
04_MAPPING/$sample.sam 04_MAPPING/$sample-RAW.bam
    echo
"###Mapping Sample" $sample "done###"

done

Sometimes, for no obvious reason, I got the following error related to my bam file,
[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] parse error at line 1190397
[main_samview] truncated file.and the bam file is tiny.

and when I try again with the same fastq files and same bowtie index, it works perfectly fine.

I was a bit exited to get my profiles and was planning to use the following code to check my bam before running anvi profile in order to redo the files with error but I do not really like that and will like to understand where the issue is coming from.

samtools quickcheck -v 04_MAPPING/*.bam > 04_MAPPING/bad_bams.fofn   && echo 'all ok' || echo 'some files failed check, see bad_bams.fofn'

Bug:
(anvio3) const@S620100019620:/media/DataDrive05/Flo/EZ/all_combined$ sh mapping-loop.sh                                                                                    ###Mapping Sample mFMbES001 Start###
[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] parse error at line 1190397
[main_samview] truncated file.
Sorted BAM File ..............................: /media/DataDrive05/Flo/EZ/all_combined/04_MAPPING/mFMbES001.bam                                                                   BAM File Index ...............................: /media/DataDrive05/Flo/EZ/all_combined/04_MAPPING/mFMbES001.bam.bai                                                                    rm: cannot remove ‘04_MAPPING/mFMbES001.sam’: No such file or directory
rm: cannot remove ‘04_MAPPING/mFMbES001-RAW.bam’: No such file or directory
###Mapping Sample mFMbES001 done###

working (few minutes after).
(anvio3) const@S620100019620:/media/DataDrive05/Flo/EZ/all_combined$ sh mapping-loop.sh
###Mapping Sample mFMbES001 Start###
/media/DataDrive05/Flo/EZ/all_combined/01_QC/mFMbES001-QUALITY_PASSED_R1.fastq.gz  /media/DataDrive05/Flo/EZ/all_combined/01_QC/mFMbES001-QUALITY_PASSED_R2.fastq.gz
Sorted BAM File ..............................: /media/DataDrive05/Flo/EZ/all_combined/04_MAPPING/mFMbES001.bam                                                                    BAM File Index ...............................: /media/DataDrive05/Flo/EZ/all_combined/04_MAPPING/mFMbES001.bam.bai                                                     ###Mapping Sample mFMbES001 done###

I am using anvio3 installed thanks to (conda create -n anvio3 -c bioconda -c conda-forge python=3.5.4 gsl anvio) on a Linux 14.04 LTS

If you have already experienced this or have any idea, that will be helpful!

Cheers,

Flo

A. Murat Eren

unread,
Jan 23, 2018, 10:51:00 AM1/23/18
to Anvi'o
Hi Flo,

There seems to be something wrong with one of the resulting SAM files from mapping as samtools is complaining about non-identical lengths of sequence and qual scores in it, but I don't know what can I advice. I would work on mFMbES001 step by step without any loop or anything to make sure the output of every step makes sense.

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/82ee10fc-f47a-4f32-bacb-15f5ac7eac59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Les Dethlefsen

unread,
Jan 23, 2018, 11:43:52 AM1/23/18
to an...@googlegroups.com
Hi Flo,

Don’t know if this is helpful, but I’ve seen occasional sam files out of dozens (all generated from a script applying the identical bowtie2 command) contain corrupted individual lines.  I learn about it when there are errors in the 'samtools view’ command converting sam => bam.  After a handful of lines (out of millions) are deleted, the sam file converts to bam without errors, so most of the file is fine.  Examination of the individual deleted lines show they are in fact flawed, with (apparently) random strings of characters deleted, which would be consistent with your error of unequal lengths of seq and qual data.

Of course, I’m not satisfied with throwing away a few lines out of millions, even though there’s a minuscule chance it would any practical difference, so I could confirm that reapplying the same bowtie2 command to the same input files generates uncorrupted sam file output…so nor is there any weirdness in the *.fq.gz files.

These incidents have happened when our servers are being used rather heavily, with IO collisions occurring while files from many processes are being written to disk…I’m sure that’s the cause of my issues.

Good luck and best wishes,
Les


Les Dethlefsen
Relman Lab
Stanford University
deth...@stanford.edu

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

Flo

unread,
Jan 24, 2018, 3:43:01 AM1/24/18
to Anvi'o
Hi Meren, Hi Lef,


Thanks a lot for your answers.
Yes, I am now checking all the bam files of the samples I have processed and realized that I missed some suspicious tiny bam files. So I need map, profile and merge again. :-(. Is there any easy way to export what is reported on the screen during the loop (sam -> bam conversion)  so it makes easier to investigate rather than checking size of the bam or checking files 1 by 1?

I definitely agree with Lef regarding the reason why this happened.
I am sharing the workstation using ssh connection with someone who is accessing it using VNC. I am going to configure a proper q system.

Thanks again  and best wishes,

Flo

Flo

unread,
Jan 24, 2018, 6:34:56 AM1/24/18
to Anvi'o
I am adding this code after each step to have a log and easily check for errors in mapping, samtools bam conversion ...

samtools view -F 4 -bS 04_MAPPING/$sample.sam -@ $NUM_THREADS -o 04_MAPPING/$sample-RAW.bam \
   
>> 04_MAPPING/$sample_samtools_report.log 2>&1
Reply all
Reply to author
Forward
0 new messages