question about mapping multiple samples

1,692 views

Skip to first unread message

Jacob Musser

unread,

Mar 9, 2018, 3:45:54 PM3/9/18

to rna-star

Hello,

After reading the manual and running a number of tests I am still having difficult mapping multiple samples with STAR. My interpretation of the manual is that you can map multiple samples in a single job, and receive output files for each individual sample. Is this correct or am I misinterpreting the manual?

Here is the command I have been using to run STAR on our cluster:

STAR --runThreadN 8 --quantMode GeneCounts --outMultimapperOrder Random --genomeDir /star_genome_index_gtf/ --sjdbGTFfile /star_genome_index_gtf/Trichoplax_scaffolds_JGI_AUGUSTUS_maxtrack2.gtf --sjdbGTFfeatureExon CDS --readFilesCommand gunzip -c --readFilesIn /picked_cell_fastq/000000000-BL84Y_10_RC2_18s000021-1-1_Varoqueaux_lane118s000021_1_sequence.txt.gz,/picked_cell_fastq/000000000-BL84Y_11_CC9_18s000022-1-1_Varoqueaux_lane118s000022_1_sequence.txt.gz /picked_cell_fastq/000000000-BL84Y_10_RC2_18s000021-1-1_Varoqueaux_lane118s000021_2_sequence.txt.gz,/picked_cell_fastq/000000000-BL84Y_11_CC9_18s000022-1-1_Varoqueaux_lane118s000022_2_sequence.txt.gz

As you can see, I have two paired-end samples, which I input using two comma-delimited lists that are separated by a space (sample1_reads1.fq,sample2_reads1.fq sample1_reads2.fq,sample2_reads2.fq). When I run this I get a single sam alignment file, and one ReadsPerGene file with a single column. So, it looks like STAR is treating my different samples as a single sample, and lumping the results together. Is there a way to produce output files for each sample?

Jake

Alexander Dobin

unread,

Mar 9, 2018, 4:28:30 PM3/9/18

to rna-star

Hi Jake,

if you run multiple samples with one command, all the output will be written into the same files.

If you want separte outputs, you would have to write a loop over the input files, and start each run in a separate directory,

or add distinct --outFileNamePrefix to each run.

Alternatively, you can specify multiple RG tags which will be added to each read - then you can split the resulting BAM file into separate files by read group, e.g. --outSAMattrRGline --outSAMattrRGline ID:s1 , ID:s2 , ID:s3

Note that in this list the commas have to be separated by spaces.

Another convenience option implemented in 2.5.4a is --readFilesPrefix, where you can define a constant file prefix for all samples and thus simplify the --readFilesIn list.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages