Fatal error when running for-loop to align multiple reads to single indexed genome

瀏覽次數:90 次
跳到第一則未讀訊息

Anthony Tercero

未讀,
2021年12月18日 晚上10:28:582021/12/18
收件者:rna-star
Hello,

I am trying to use a for-loop to iterate through a directory of 500+ SE seq files and create a directory for each output. From my understanding, a for-loop is the only way to accomplish this. When running my loop, I receive an input error "number of read mates files > 2". It seems that the STAR aligner is grabbing all the files in the directory rather than a single file and iterating down the list. 

Here is my loop script:

for i in *.fa.gz; do STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done

Would you please be able to review my script and assist me with this? I feel that my issue is with defining the variable i and its relation to the --outfileNamePrefix parameter. 
I appreciate your help.

Anthony 

Alexander Dobin

未讀,
2021年12月18日 晚上10:39:392021/12/18
收件者:rna-star
Hi Anthony,

it seems like $i variable contains more than one file.
I would recommend to echo the entire command line to check that it looks OK:

for i in *.fa.gz; do echo STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done


Cheers
Alex

Anthony Tercero

未讀,
2022年1月3日 下午6:47:362022/1/3
收件者:rna-star
Hello Alex,

Thank you for the timely response. I ran the echo command and my output is below:

STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG001.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG002.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG003.trim.fa.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix *.fa.gz

I agree that the $i variable contains more than one file as it is grabbing all three files in my test directory. I am confused about where to press on from here. I was under the assumption that the loop would see all files in my test directory (indicated by the *.fa.gz) command, then iterate through the list and create an output for every filename. Do you have any advice for constructing my for-loop?

Thank you for your time, it is greatly appreciated

Best,

Anthony

Alexander Dobin

未讀,
2022年1月13日 下午1:06:582022/1/13
收件者:rna-star
Hi Anthony,

if you need to map 3 separate experiments, you would need to run STAR 3 times. In each run you need to specify single FASTQ files (or 2 FASTQ files if paired-end) and single --outFileNamePrefix .

Cheers
Alex

回覆所有人
回覆作者
轉寄
0 則新訊息