Help needed! Bugs and fixes for metatrans pipeline (quality control and mapping)

92 views
Skip to first unread message

Makis Ladoukakis

unread,
Feb 20, 2018, 6:31:01 AM2/20/18
to metatra...@googlegroups.com

Hello,


I am trying to use metatrans in order to analyze 6 Samples of paired-end RNA-seq data (2 controls, 4 cases)Ā but I have encountered a number of issues for which I would like some help if possible.


Firstly these are some fixes i made by myself because of errors i encountered:


1) There was an error (not enough arguments)Ā that originated from run_qc.sh line 255:

rename 's/krakened/krakened_tally/' $DTEMP/kraken_results/*_record.txt >> $foutput 2>&1


I fixed that by commenting that line as I am under the impression that the renamed *_record.txt file is not needed anywhere further in the pipeline (correct me if i'm wrong)


2) Next issue I found was in lines 260 -263 of run_qc.sh. The mv commands seem to have the wrong filename and produce an error. I fixed that by changing theĀ 


mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_reaper_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1


to


mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1


(and did similar changes for the other 3 mv commands)


3) The same errors were encountered by lines 279-280. I fixed those by changingĀ 


mv $DOUTPUT/krakened_tally_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1


to


mv $DOUTPUT/krakened_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1


4)Next in line was the run_map.sh script. I got some errors originating from lines 251-253. I fixed those by changing the names of fin1-fin3.

eg. From


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fasta


to


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.faa

and so on.


5) In the same script the rename commandĀ in lineĀ 290 gave me an error of 'not enough arguments' so i fixed that by changing it fromĀ 


rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1


to


rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1




6)I encountered the same problem with issue 4) for amino acid filenames in lines 438-440 and fixed it by changing the names appropriately.


E.g.

from


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.aa.fasta


to


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.ffn


7)Same issue as 5) for the rename command in line 478. Fixed it by changing it fromĀ 


rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1

to

rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1

8)I also encountered some more errors from the reading of metadata.txt file. Apparently there has to be a F0.* column and at least to Fx.* columns after that.Ā  I worked around it by repeating the Fx.* columns with the same arguments ( I attach the metadata.txt file).


So far those changes worked and I got the pipeline to work properly up to differential analysis. But then I get some more errors. Apparently the DESeq2.R script fails after line 153 inĀ 
ddsMF <- DESeqDataSetFromMatrix(countData = CountTable,
Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  colData = CountData,
Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  design =Ā  as.formula(paste("~",paste(listoffactors,collapse="+"))))

The error i get isĀ 

Error: ncol(countData) == nrow(colData) is not TRUE

After searching a little bit i found out that theĀ CountTable variable is an empty data frame (0 columns and 0 rows) and I noticed that the files it's supposed to take the data from:

*_m3.mh2014.igc.cdhit95ov90.aln.orthids.abundance.tsv
*_m3.mh2014.igc.cdhit95ov90.aln.funcat.abundance.tsv
Ā 
are empty. And so are theĀ 

*m3.aln.merged.mh2014.igc.soap
*m3.aln.part2.mh2014.igc.soap
*m3.aln.part1.mh2014.igc.soap

for every sample i have. I can't think of anything else to do. Can you please help me out?

Kind regards,
EfthymiosĀ Ladoukakis

P.S.Ā Sorry for the long post



metadata.txt

makist...@gmail.com

unread,
Feb 20, 2018, 7:48:37 AM2/20/18
to Metatrans Forum
By the way I realize now that the source of this problem is the soap step. Apparently for each sample 0% of the reads are aligned. Any suggestions? Is it a database issue?

metatr...@gmail.com

unread,
Feb 22, 2018, 5:00:57 AM2/22/18
to Metatrans Forum

Hi Efthymios!



On Tuesday, February 20, 2018 at 12:31:01 PM UTC+1, Makis Ladoukakis wrote:

Hello,


I am trying to use metatrans in order to analyze 6 Samples of paired-end RNA-seq data (2 controls, 4 cases)Ā but I have encountered a number of issues for which I would like some help if possible.


Firstly these are some fixes i made by myself because of errors i encountered:


1) There was an error (not enough arguments)Ā that originated from run_qc.sh line 255:

rename 's/krakened/krakened_tally/' $DTEMP/kraken_results/*_record.txt >> $foutput 2>&1


I fixed that by commenting that line as I am under the impression that the renamed *_record.txt file is not needed anywhere further in the pipeline (correct me if i'm wrong)

Ā 

That instruction does a job whenĀ  using >1thread.


2) Next issue I found was in lines 260 -263 of run_qc.sh. The mv commands seem to have the wrong filename and produce an error. I fixed that by changing theĀ 


mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_reaper_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1


to


mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1


(and did similar changes for the other 3 mv commands)

Ā 

We have used this pipeline many times and we never found this to be a bug.

Ā 

3) The same errors were encountered by lines 279-280. I fixed those by changingĀ 


mv $DOUTPUT/krakened_tally_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1


to


mv $DOUTPUT/krakened_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1

Ā 

We have used this pipeline many times and we never found this to be a bug.

Ā 

4)Next in line was the run_map.sh script. I got some errors originating from lines 251-253. I fixed those by changing the names of fin1-fin3.

eg. From


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fasta


to


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.faa

and so on.
Ā 
We have used this pipeline many times and we never found this to be a bug.
Ā 


5) In the same script the rename commandĀ in lineĀ 290 gave me an error of 'not enough arguments' so i fixed that by changing it fromĀ 


rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1


to


rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1


We can?t understand how you fixed rename if you are not using conventional perl regexpr.


6)I encountered the same problem with issue 4) for amino acid filenames in lines 438-440 and fixed it by changing the names appropriately.


E.g.

from


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.aa.fasta


to


fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.ffn

Ā 

We have used this pipeline many times and we never found this to be a bug.
Since we do a ?rename 's/\.fgs.faa/\.aa\.fasta/'? in function ?FragGeneScan () ?, at this point we are wondering if «rename» command is working correctly. In our testing environment it is provided by Ubuntu.

Ā 

7)Same issue as 5) for the rename command in line 478. Fixed it by changing it fromĀ 


rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1

to

rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
Ā 
At this point, we think that everything is related to ?rename?.
Ā 

8)I also encountered some more errors from the reading of metadata.txt file. Apparently there has to be a F0.* column and at least to Fx.* columns after that.Ā  I worked around it by repeating the Fx.* columns with the same arguments ( I attach the metadata.txt file).
Ā 
Factor columns should never be the same, nor linear combinations. Factors starting with ?F0-whatever? are not parsed by the DESeq2 module.

So far those changes worked and I got the pipeline to work properly up to differential analysis. But then I get some more errors. Apparently the DESeq2.R script fails after line 153 inĀ 
ddsMF <- DESeqDataSetFromMatrix(countData = CountTable,
Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  colData = CountData,
Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  design =Ā  as.formula(paste("~",paste(listoffactors,collapse="+"))))

The error i get isĀ 

Error: ncol(countData) == nrow(colData) is not TRUE

After searching a little bit i found out that theĀ CountTable variable is an empty data frame (0 columns and 0 rows) and I noticed that the files it's supposed to take the data from:

*_m3.mh2014.igc.cdhit95ov90.aln.orthids.abundance.tsv
*_m3.mh2014.igc.cdhit95ov90.aln.funcat.abundance.tsv
Ā 
are empty. And so are theĀ 

*m3.aln.merged.mh2014.igc.soap
*m3.aln.part2.mh2014.igc.soap
*m3.aln.part1.mh2014.igc.soap

for every sample i have. I can't think of anything else to do. Can you please help me out?

Kind regards,
EfthymiosĀ Ladoukakis

P.S.Ā Sorry for the long post
Ā 

Ā 
Conclusion
=========

We don't think your issues have to do with SOAP.

Since everything seems related to "rename". Please check which linux distribution you have.

If it is a Debian based distro (like Ubuntu) "rename" in Perl regexp flavor should be provided and working as expected in MetaTrans. As mentioned here:
"https://unix.stackexchange.com/questions/175135/how-to-rename-multiple-files-by-replacing-string-in-file-name-this-string-conta"
"... Debian-based distributions include a rename utility with their Perl package while Red Hat-based distributions use the rename utility from the util-linux from the Linux Kernel Organization. ..."

If you are using a "RedHat/CentOS" distro, then we strongly recommend you to read the forum thread "Is the Metatrans pipeline works now in the RedHat/CentOS ?" were you fill find a patch for this distro.

If none of these cases apply to you, since "rename" is not a real POSIX command (which is a bug to take into account into future releases if such) , then you could try with an implementation of a "rename" function more POSIX complain. You can change this code:
----------------------------------------------------------------------------------------------
find . -type f
-name 'Lucky-*' | while read FILE ; do
Ā  Ā  newfile
="$(echo ${FILE} |sed -e 's/\\#U00a9/safe/')" ;
Ā  Ā  mv
"${FILE}" "${newfile}" ;
done

rename
's/\.fgs.ffn/\.fasta/' $DOUTPUT/fraggenescan/*.ffn 1>$foutput 2>&1


----------------------------------------------------------------------------------------------
to:
----------------------------------------------------------------------------------------------
function rename () {
i
=0 ;
for FILE in "$@"
Ā  Ā 
do
Ā  Ā  Ā  Ā 
if [ $i -eq 0 ]; then
Ā  Ā  Ā  Ā  Ā  Ā  pattern
="$FILE" ;
Ā  Ā  Ā  Ā 
else
Ā  Ā  Ā  Ā  Ā  Ā  newfile
="$(echo ${FILE} |sed -e $pattern)" ;
Ā  Ā  Ā  Ā  Ā  Ā  mv
"${FILE}" "${newfile}" ;
Ā  Ā  Ā  Ā 
fi
Ā  Ā  Ā  Ā  i
=$(($i+1))
Ā  Ā 
done
}


----------------------------------------------------------------------------------------------
and test with this example:
mkdir fraggenescan
Ā 
touch fraggenescan/aaa.fgs.ffn fraggenescan/bbb.fgs.ffn fraggenescan/ccc.fgs.ffn
Ā 
rename 's/\.fgs.ffn/\.fasta/' fraggenescan/*.ffn
rename 's/\.fasta/\.fgs.ffn/' fraggenescan/*.fasta


Greetings

Message has been deleted

makist...@gmail.com

unread,
Feb 23, 2018, 8:41:09 AM2/23/18
to Metatrans Forum
Thank you for your help, yes I am using a CentOS distribution so I'll make sure to check your patch for that.

Cheers,
Efthymios


Τη Τρίτη, 20 Φεβρουαρίου 2018 - 1:31:01 μ.μ. UTC+2, Īæ Ļ‡ĻĪ®ĻƒĻ„Ī·Ļ‚ Makis Ladoukakis έγραψε:
Reply all
Reply to author
Forward
0 new messages