Fatal error when running for-loop to align multiple reads to single indexed genome

Anthony Tercero

未讀,

2021年12月18日晚上10:28:582021/12/18

收件者：rna-star

Hello,

I am trying to use a for-loop to iterate through a directory of 500+ SE seq files and create a directory for each output. From my understanding, a for-loop is the only way to accomplish this. When running my loop, I receive an input error "number of read mates files > 2". It seems that the STAR aligner is grabbing all the files in the directory rather than a single file and iterating down the list.

Here is my loop script:

for i in *.fa.gz; do STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done

Would you please be able to review my script and assist me with this? I feel that my issue is with defining the variable i and its relation to the --outfileNamePrefix parameter.

I appreciate your help.

Anthony

Alexander Dobin

未讀,

2021年12月18日晚上10:39:392021/12/18

收件者：rna-star

Hi Anthony,

it seems like $i variable contains more than one file.

I would recommend to echo the entire command line to check that it looks OK:

for i in *.fa.gz; do echo STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done

Cheers

Alex

Anthony Tercero

未讀,

2022年1月3日下午6:47:362022/1/3

收件者：rna-star

Hello Alex,

Thank you for the timely response. I ran the echo command and my output is below:

STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG001.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG002.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG003.trim.fa.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix *.fa.gz

I agree that the $i variable contains more than one file as it is grabbing all three files in my test directory. I am confused about where to press on from here. I was under the assumption that the loop would see all files in my test directory (indicated by the *.fa.gz) command, then iterate through the list and create an output for every filename. Do you have any advice for constructing my for-loop?

Thank you for your time, it is greatly appreciated

Best,

Anthony

Alexander Dobin

未讀,

2022年1月13日下午1:06:582022/1/13

收件者：rna-star

Hi Anthony,

if you need to map 3 separate experiments, you would need to run STAR 3 times. In each run you need to specify single FASTQ files (or 2 FASTQ files if paired-end) and single --outFileNamePrefix .

Cheers

Alex

回覆所有人

回覆作者

轉寄