Hello,
I am trying to use metatrans in order to analyze 6 Samples of paired-end RNA-seq data (2 controls, 4 cases)Ā but I have encountered a number of issues for which I would like some help if possible.
Firstly these are some fixes i made by myself because of errors i encountered:
1) There was an error (not enough arguments)Ā that originated from run_qc.sh line 255:
rename 's/krakened/krakened_tally/' $DTEMP/kraken_results/*_record.txt >> $foutput 2>&1
I fixed that by commenting that line as I am under the impression that the renamed *_record.txt file is not needed anywhere further in the pipeline (correct me if i'm wrong)
2) Next issue I found was in lines 260 -263 of run_qc.sh. The mv commands seem to have the wrong filename and produce an error. I fixed that by changing theĀ
mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_reaper_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1
to
mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1
(and did similar changes for the other 3 mv commands)
3) The same errors were encountered by lines 279-280. I fixed those by changingĀ
mv $DOUTPUT/krakened_tally_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1
to
mv $DOUTPUT/krakened_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1
4)Next in line was the run_map.sh script. I got some errors originating from lines 251-253. I fixed those by changing the names of fin1-fin3.
eg. From
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fasta
to
| fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.faa |
5) In the same script the rename commandĀ in lineĀ 290 gave me an error of 'not enough arguments' so i fixed that by changing it fromĀ
rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
to
rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
6)I encountered the same problem with issue 4) for amino acid filenames in lines 438-440 and fixed it by changing the names appropriately.
E.g.
from
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.aa.fasta
to
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.ffn
7)Same issue as 5) for the rename command in line 478. Fixed it by changing it fromĀ
Hello,
I am trying to use metatrans in order to analyze 6 Samples of paired-end RNA-seq data (2 controls, 4 cases)Ā but I have encountered a number of issues for which I would like some help if possible.
Firstly these are some fixes i made by myself because of errors i encountered:
1) There was an error (not enough arguments)Ā that originated from run_qc.sh line 255:
rename 's/krakened/krakened_tally/' $DTEMP/kraken_results/*_record.txt >> $foutput 2>&1
I fixed that by commenting that line as I am under the impression that the renamed *_record.txt file is not needed anywhere further in the pipeline (correct me if i'm wrong)
That instruction does a job whenĀ using >1thread.
2) Next issue I found was in lines 260 -263 of run_qc.sh. The mv commands seem to have the wrong filename and produce an error. I fixed that by changing theĀ
mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_reaper_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1
to
mv $DTEMP/kraken_results/krakened_${SAMPLE}_1/REAPER/krakened_${SAMPLE}_1.lane.clean.gz $DOUTPUT/krakened_reaper_${SAMPLE}_1.lane.clean.fastq.gz >> $foutput 2>&1
(and did similar changes for the other 3 mv commands)
We have used this pipeline many times and we never found this to be a bug.
3) The same errors were encountered by lines 279-280. I fixed those by changingĀ
mv $DOUTPUT/krakened_tally_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1
to
mv $DOUTPUT/krakened_${SAMPLE}_1.lane.tallied.fastq.gz $DOUTPUT/${SAMPLE}_1_m${module}.fastq.gz >> $foutput 2>&1
We have used this pipeline many times and we never found this to be a bug.
4)Next in line was the run_map.sh script. I got some errors originating from lines 251-253. I fixed those by changing the names of fin1-fin3.
eg. From
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fasta
to
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.faa
and so on.
We have used this pipeline many times and we never found this to be a bug.
5) In the same script the rename commandĀ in lineĀ 290 gave me an error of 'not enough arguments' so i fixed that by changing it fromĀ
rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
to
rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit_dna/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
We can?t understand how you fixed rename if you are not using conventional perl regexpr.
6)I encountered the same problem with issue 4) for amino acid filenames in lines 438-440 and fixed it by changing the names appropriately.
E.g.
from
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.aa.fasta
to
fin1=$DOUTPUT/fraggenescan/${SAMPLE}_j_m${module}.fgs.ffn
We have used this pipeline many times and we never found this to be a bug.
Since we do a ?rename 's/\.fgs.faa/\.aa\.fasta/'? in function ?FragGeneScan () ?, at this point we are wondering if «rename» command is working correctly. In our testing environment it is provided by Ubuntu.
7)Same issue as 5) for the rename command in line 478. Fixed it by changing it fromĀ
rename 's/\.cdhit95ov90/\.cdhit95ov90\.fasta/' $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
to
rename .cdhit95ov90 .cdhit95ov90.fasta $DOUTPUT/cdhit/$(basename $fin .fasta).cdhit95ov90 1>>$foutput 2>&1
At this point, we think that everything is related to ?rename?.
8)I also encountered some more errors from the reading of metadata.txt file. Apparently there has to be a F0.* column and at least to Fx.* columns after that.Ā I worked around it by repeating the Fx.* columns with the same arguments ( I attach the metadata.txt file).
Factor columns should never be the same, nor linear combinations. Factors starting with ?F0-whatever? are not parsed by the DESeq2 module.
So far those changes worked and I got the pipeline to work properly up to differential analysis. But then I get some more errors. Apparently the DESeq2.R script fails after line 153 inĀddsMF <- DESeqDataSetFromMatrix(countData = CountTable,Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā colData = CountData,Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā design =Ā as.formula(paste("~",paste(listoffactors,collapse="+"))))
The error i get isĀError: ncol(countData) == nrow(colData) is not TRUE
After searching a little bit i found out that theĀ CountTable variable is an empty data frame (0 columns and 0 rows) and I noticed that the files it's supposed to take the data from:
*_m3.mh2014.igc.cdhit95ov90.aln.orthids.abundance.tsv
*_m3.mh2014.igc.cdhit95ov90.aln.funcat.abundance.tsv
Āare empty. And so are theĀ
*m3.aln.merged.mh2014.igc.soap
*m3.aln.part2.mh2014.igc.soap
*m3.aln.part1.mh2014.igc.soap
for every sample i have. I can't think of anything else to do. Can you please help me out?
Kind regards,
EfthymiosĀ Ladoukakis
P.S.Ā Sorry for the long post
ĀConclusion
=========
We don't think your issues have to do with SOAP.
Since everything seems related to "rename". Please check which linux distribution you have.
If it is a Debian based distro (like Ubuntu) "rename" in Perl regexp flavor should be provided and working as expected in MetaTrans. As mentioned here:
"https://unix.stackexchange.com/questions/175135/how-to-rename-multiple-files-by-replacing-string-in-file-name-this-string-conta"
"... Debian-based distributions include a rename utility with their Perl package while Red Hat-based distributions use the rename utility from the util-linux from the Linux Kernel Organization. ..."
If you are using a "RedHat/CentOS" distro, then we strongly recommend you to read the forum thread "Is the Metatrans pipeline works now in the RedHat/CentOS ?" were you fill find a patch for this distro.
If none of these cases apply to you, since "rename" is not a real POSIX command (which is a bug to take into account into future releases if such) , then you could try with an implementation of a "rename" function more POSIX complain. You can change this code:
----------------------------------------------------------------------------------------------find . -type f
-name 'Lucky-*' | while read FILE ; do
Ā Ā newfile="$(echo ${FILE} |sed -e 's/\\#U00a9/safe/')" ;
Ā Ā mv "${FILE}" "${newfile}" ;
done
rename 's/\.fgs.ffn/\.fasta/' $DOUTPUT/fraggenescan/*.ffn 1>$foutput 2>&1
----------------------------------------------------------------------------------------------
to:
----------------------------------------------------------------------------------------------function rename () {
i=0 ;
for FILE in "$@"
Ā Ā do
Ā Ā Ā Ā if [ $i -eq 0 ]; then
Ā Ā Ā Ā Ā Ā pattern="$FILE" ;
Ā Ā Ā Ā else
Ā Ā Ā Ā Ā Ā newfile="$(echo ${FILE} |sed -e $pattern)" ;
Ā Ā Ā Ā Ā Ā mv "${FILE}" "${newfile}" ;
Ā Ā Ā Ā fi
Ā Ā Ā Ā i=$(($i+1))
Ā Ā done
}
----------------------------------------------------------------------------------------------
and test with this example:
mkdir fraggenescan
touch fraggenescan/aaa.fgs.ffn fraggenescan/bbb.fgs.ffn fraggenescan/ccc.fgs.ffn
rename 's/\.fgs.ffn/\.fasta/' fraggenescan/*.ffn
rename 's/\.fasta/\.fgs.ffn/' fraggenescan/*.fasta
Greetings