Fatal error when running for-loop to align multiple reads to single indexed genome

39 views
Skip to first unread message

Anthony Tercero

unread,
Dec 18, 2021, 10:28:58 PM12/18/21
to rna-star
Hello,

I am trying to use a for-loop to iterate through a directory of 500+ SE seq files and create a directory for each output. From my understanding, a for-loop is the only way to accomplish this. When running my loop, I receive an input error "number of read mates files > 2". It seems that the STAR aligner is grabbing all the files in the directory rather than a single file and iterating down the list. 

Here is my loop script:

for i in *.fa.gz; do STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done

Would you please be able to review my script and assist me with this? I feel that my issue is with defining the variable i and its relation to the --outfileNamePrefix parameter. 
I appreciate your help.

Anthony 

Alexander Dobin

unread,
Dec 18, 2021, 10:39:39 PM12/18/21
to rna-star
Hi Anthony,

it seems like $i variable contains more than one file.
I would recommend to echo the entire command line to check that it looks OK:

for i in *.fa.gz; do echo STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn ~/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/$i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix $i ; done


Cheers
Alex

Anthony Tercero

unread,
Jan 3, 2022, 6:47:36 PMJan 3
to rna-star
Hello Alex,

Thank you for the timely response. I ran the echo command and my output is below:

STAR --genomeDir Star_index/Mcal_index --runMode alignReads --runThreadN 10 --readFilesIn /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG001.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG002.trim.fa.gz /home/terceroa/Projects/M.cal/Data/Tagseq/Trim/Cat_trim/Star_subset/ATMCG003.trim.fa.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix *.fa.gz

I agree that the $i variable contains more than one file as it is grabbing all three files in my test directory. I am confused about where to press on from here. I was under the assumption that the loop would see all files in my test directory (indicated by the *.fa.gz) command, then iterate through the list and create an output for every filename. Do you have any advice for constructing my for-loop?

Thank you for your time, it is greatly appreciated

Best,

Anthony

Alexander Dobin

unread,
Jan 13, 2022, 1:06:58 PMJan 13
to rna-star
Hi Anthony,

if you need to map 3 separate experiments, you would need to run STAR 3 times. In each run you need to specify single FASTQ files (or 2 FASTQ files if paired-end) and single --outFileNamePrefix .

Cheers
Alex

Reply all
Reply to author
Forward
0 new messages