Module 3: SOAP mapping step: no alignment (0%)

17 views

Skip to first unread message

contact.pa...@gmail.com

unread,

May 2, 2019, 1:07:00 PM5/2/19

to Metatrans Forum

Hi,

I ran the MetaTrans pipeline successfully on the test data downloaded from the website (May 2019), and by updating the DESEQ2 R script according to this forum post: https://groups.google.com/forum/#!searchin/metatrans-forum/deseq2%7Csort:date/metatrans-forum/rBNqqIhWtWY/hWS_BHCEAQAJ

The pipeline worked on the test dataset all the way to doing the DE analysis and making the figures.

I'm now trying to run it on my own dataset, but I noticed that there is potentially something going wrong at the Step 3 Mapping. For each of the processed sample, when I look into the /m3-output/soap2/ folder, the part1.mh2014.igc.soap and part2.mh2014.igc.soap files are empty (0B), but the .mh2014.igc.soap.fasta is not (666.3KB). Therefore in the next step (/m3-output/mh14/abundance) , the output files are also empty (tsv file).

Here are my specs:

NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Output log:output_soap2_part1.log

Begin Program SOAPaligner/soap2
Thu May 2 09:51:39 2019
Reference: /MetaTrans/0-Databases/indexed_soap_db/mh_igc_2014_dna.part1.fasta.index
Query File a: /MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/cdhit_dna/control_sample01_m3.all.distinguishpair.cdhit95ov90.withcs.fasta
Output File: /MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/soap2/control_sample01_m3.aln.part1.mh2014.igc.soap
/MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/soap2/control_sample01_m3.un.part1.mh2014.igc.soap.fasta
Load Index Table ...
Load Index Table OK
Begin Alignment ...
5092 ok 0.49 sec
Total Reads: 5092
Alignment: 0 ( 0.00%)
Total Elapsed Time: 11.66
- Load Index Table: 11.14
- Alignment: 0.52
SOAPaligner/soap2 End
Thu May 2 09:51:51 2019

Output log: output_soap2_part2.log

Begin Program SOAPaligner/soap2
Thu May 2 11:44:23 2019
Reference: /MetaTrans/0-Databases/indexed_soap_db/mh_igc_2014_dna.part2.fasta.index
Query File a: /MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/cdhit_dna/control_sample01_m3.all.distinguishpair.cdhit95ov90.withcs.fasta
Output File: /MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/soap2/control_sample01_m3.aln.part2.mh2014.igc.soap
/MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/soap2/control_sample01_m3.un.part2.mh2014.igc.soap.fasta
Load Index Table ...
Load Index Table OK
Begin Alignment ...
5092 ok 0.46 sec
Total Reads: 5092
Alignment: 0 ( 0.00%)
Total Elapsed Time: 11.18
- Load Index Table: 10.68
- Alignment: 0.49
SOAPaligner/soap2 End
Thu May 2 11:44:34 2019

This is not something that happened previously for the sample test data available from the metatrans website. The reference database are the same are when I tested it on the sample test data. I check the query files (/MetaTrans/1-PROCESSED_SAMPLES/control_sample01/3-MAP/m3-output/cdhit_dna/control_sample01_m3.all.distinguishpair.cdhit95ov90.withcs.fasta) and it had a file size of 666.3 KB, and looked like a normal fasta file.

Do you know what can cause this no alignment issue? Is it the reference database that I'm using? Do you have suggestions on how to proceed instead?

Thank you very much.

metatr...@gmail.com

unread,

May 6, 2019, 4:53:24 AM5/6/19

to Metatrans Forum

Hi Patricia,

As you say, the input for the mapping file is this one (the clustered file):

3-MAP/m3-output/cdhit_dna/control_sample01_m3.all.distinguishpair.cdhit95ov90.withcs.fasta

It is used by the SOAPAligner to map against MetaHIT-2014 human gut genes database. Which produces these three relevant files:

3-MAP/m3-output/soap2/
SAMPLENAME.aln.merged.mh2014.igc.soap #Sequences mapped to database (since the database is too big for SOAP it had to be split. This file is the merge from aligned seqs to both parts of the db)
SAMPLENAME.un.part1.mh2014.igc.soap.fasta #The unaligned files are not merged. This file corresponds to the unaligned reads to part1 of the db
SAMPLENAME.un.part2.mh2014.igc.soap.fasta #This file corresponds to the unaligned reads to part2 of the db

The aligned reads are next post-processed in MetaTrans to obtain abundance files.

The unaligned reads in fasta format can then be processed/aligned to other db to try to reduce the unknown reads.

Thus, if you were able to run the MetaTrans test in the website successfully, meaning that you had ".aln." file and ".un.part1/.un.part2" files, then

it means the SOAPAligner is working and therefore, if you don't get anything in the ".aln." file, nor you have errors during the MetaTrans process, that

means your sequences are not being aligned to MetaHIT14 database. If that is the case, please re-check if MetaHIT14 database is filling your needs

according to the sequenced files you have.

However, as it has been commented in previous posts, I see you are using Ubuntu16, and MetaTrans was tested

in Ubuntu14.04. Unfortunately, so far we are not giving support for other releases/scenarios, sorry for the inconvenience.