Host sequences filtering

53 views
Skip to first unread message

lusine Khachatryan

unread,
Oct 22, 2021, 5:06:01 AM10/22/21
to SAMSA bioinformatics group
Hello! 

I wonder whether this pipeline removes somehow possible host contaminations?

Regards,
Lusine Khachatryan

Sam Westreich

unread,
Oct 22, 2021, 3:19:30 PM10/22/21
to lusine Khachatryan, SAMSA bioinformatics group
Hi Lusine,

No, there is not a built-in step for host contamination removal, but doing so is fairly straightforward.  If you know your host genome, you could simply run a tool like BWA-MEM, ideally after step 3 (ribodepletion) and before step 4 (annotation against the DIAMOND database).  You could add it to the master script at line 222: https://github.com/transcript/samsa2/blob/master/bash_scripts/master_script.sh

You'd simply get your reference genome and use the bwa mem command to align your sequence fastq against that reference.

With BWA-MEM, you'd get a BAM file that contains both mapped and unmapped reads.  You could then use SAMtools to extract the unmapped reads from the BAM, with a command like:

samtools view -f 4 file.bam > unmapped.sam

You'd then simply convert this sam file back into a fastq (this can be done with Unix command line commands), and then continue on with the rest of the SAMSA2 pipeline as normal.

Best,
Sam



--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/3605949b-e4ab-4fde-bc1c-0e1746acdf27n%40googlegroups.com.


--
Sam Westreich, PMP, PhD
Microbiome Scientist, DNAnexus, 

lusine Khachatryan

unread,
Oct 25, 2021, 4:35:04 AM10/25/21
to SAMSA bioinformatics group
Dear Sam,

Thank you! Yes, what you've described (mapping on human genome after the ribodepletion) is basically what I've done. But I did not get any host contaminations and therefore I thought there is a "hidden" filtering somewhere in your pipeline. Thank you one more time for the explanation.

Regards,
Lusine

Reply all
Reply to author
Forward
0 new messages