question about mapping multiple samples

1,681 views
Skip to first unread message

Jacob Musser

unread,
Mar 9, 2018, 3:45:54 PM3/9/18
to rna-star
Hello,

After reading the manual and running a number of tests I am still having difficult mapping multiple samples with STAR. My interpretation of the manual is that you can map multiple samples in a single job, and receive output files for each individual sample. Is this correct or am I misinterpreting the manual?

Here is the command I have been using to run STAR on our cluster:
STAR --runThreadN 8 --quantMode GeneCounts --outMultimapperOrder Random --genomeDir /star_genome_index_gtf/ --sjdbGTFfile /star_genome_index_gtf/Trichoplax_scaffolds_JGI_AUGUSTUS_maxtrack2.gtf --sjdbGTFfeatureExon CDS --readFilesCommand gunzip -c --readFilesIn /picked_cell_fastq/000000000-BL84Y_10_RC2_18s000021-1-1_Varoqueaux_lane118s000021_1_sequence.txt.gz,/picked_cell_fastq/000000000-BL84Y_11_CC9_18s000022-1-1_Varoqueaux_lane118s000022_1_sequence.txt.gz /picked_cell_fastq/000000000-BL84Y_10_RC2_18s000021-1-1_Varoqueaux_lane118s000021_2_sequence.txt.gz,/picked_cell_fastq/000000000-BL84Y_11_CC9_18s000022-1-1_Varoqueaux_lane118s000022_2_sequence.txt.gz

As you can see, I have two paired-end samples, which I input using two comma-delimited lists that are separated by a space (sample1_reads1.fq,sample2_reads1.fq sample1_reads2.fq,sample2_reads2.fq). When I run this I get a single sam alignment file, and one ReadsPerGene file with a single column. So, it looks like STAR is treating my different samples as a single sample, and lumping the results together. Is there a way to produce output files for each sample?

Jake

Alexander Dobin

unread,
Mar 9, 2018, 4:28:30 PM3/9/18
to rna-star
Hi Jake,

if you run multiple samples with one command, all the output will be written into the same files.
If you want separte outputs, you would have to write a loop over the input files, and start each run in a separate directory,
or add distinct --outFileNamePrefix to each run.

Alternatively, you can specify multiple RG tags which will be added to each read - then you can split the resulting BAM file into separate files by read group, e.g. --outSAMattrRGline --outSAMattrRGline ID:s1 , ID:s2 , ID:s3
Note that in this list the commas have to be separated by spaces.

Another convenience option implemented in 2.5.4a is --readFilesPrefix, where you can define a constant file prefix for all samples and thus simplify the --readFilesIn list.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages