Is there way to add RG Tags on the fly?

669 views
Skip to first unread message

Raghu Prasad Rao Metpally

unread,
Jun 14, 2013, 6:17:36 PM6/14/13
to rna-...@googlegroups.com
Hi Alex,

Is it possible to add RG tags automatically to the header and the alignments during the STAR runs.

Thanks,
Raghu

German Leparc

unread,
Jun 17, 2013, 6:46:50 AM6/17/13
to rna-...@googlegroups.com
Hi, I would also like this feature, since GATK needs the RG tag in the input bams.

Alternatively Raghu, since one already often converts the SAM file to BAM before passing on to other pograms, you can use the AddOrReplaceReadGroups.jar program from the picard tools package to simulatenously do a coordinate sort, add RG tags, and convert to BAM file.


Hope that helps,
German

Alexander Dobin

unread,
Jun 17, 2013, 10:06:32 AM6/17/13
to rna-...@googlegroups.com
Hi Raghu, German,

I have got some requests for this feature already, and I guess it's time for me to make another patch -  hopefully within the next week.

Cheers
Alex

Jake Freimer

unread,
Jan 19, 2014, 4:30:25 PM1/19/14
to rna-...@googlegroups.com
Hi Alex,
Did this ever get added? I didn't see it in the changes log.
Thanks,
Jake

Santosh Anand

unread,
Jan 21, 2014, 4:48:05 PM1/21/14
to rna-...@googlegroups.com
It can be done using perl or sed; one example using perl:

 (echo -e "@RG\tID:1\tPL:illumina\tPU:NA\tLB:NA\tSM:NA" && STAR  --genomeLoad $genomeLoad --genomeDir $genomeDir --readFilesIn $INPUT_BASEDIR/$IP --outFileNamePrefix $OP --runThreadN $runThreadN --outStd SAM) | perl -pe ' !/^@/ && s/$/\tRG:Z:1/' > Me${i}_wRG.sam

In my trials, substitution using sed is considerably slower than using perl (around 5-8x), probably because perl is optimized for complex Regex operations (my naive guess!). I'd interested in knowing other's opinions and experiences regarding this..




On Saturday, June 15, 2013 12:17:36 AM UTC+2, Raghu Prasad Rao Metpally wrote:

Alexander Dobin

unread,
Jan 24, 2014, 4:34:36 PM1/24/14
to rna-...@googlegroups.com
I have just released a patch that allow to add the RG flag directly to STAR output:
Usage: --outSAMattrRGline <string>, where string is the RG line with any tags that you want there that goes into the SAM header (after @RG).
The first word must contain the read group identifier and must start with "ID:", e.g. --outSAMattrRG ID:xxx CN:yy "DS:z z z". Then xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted.

Cheers
Alex

Rory Kirchner

unread,
Feb 18, 2014, 4:28:43 PM2/18/14
to rna-...@googlegroups.com
Hi Alex,

Thanks for this nice feature, it saves a very time consuming call to Picard. This introduces an issue for me though where FastQC will not parse the header correctly. It might be an issue with FastQC but I thought I would post it here as well:

Here is a header that is broken:

@PG ID:STAR PN:STAR VN:STAR_2.3.1z_r395 CL:/n/hsphS10/hsphfs1/chb/local/bin/STAR   --runThreadN 1   --genomeDir /n/home05/kirchner/cache/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/mm9/star   --readFilesIn /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_1_fastq_trimmed.txt   /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_2_fastq_trimmed.txt      --outFileNamePrefix /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/align/Test1/1_110907_ERP000591   --outReadsUnmapped Fastx   --outSAMstrandField intronMotif   --outSAMunmapped Within   --outSAMattrRGline ID:1   PL:illumina   PU:1_110907_ERP000591   SM:Test1      --outFilterMultimapNmax 10   cl:/n/hsphS10/hsphfs1/chb/local/bin/STAR --genomeDir /n/home05/kirchner/cache/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/mm9/star --readFilesIn /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_1_fastq_trimmed.txt /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_2_fastq_trimmed.txt --runThreadN 1 --outFileNamePrefix /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/align/Test1/1_110907_ERP000591 --outReadsUnmapped Fastx --outFilterMultimapNmax 10 --outSAMunmapped Within --outSAMattrRGline ID:1 PL:illumina PU:1_110907_ERP000591 SM:Test1 --outSAMstrandField intronMotif

The command line looks repeated twice. If I remove the second repetition (by deleting everything after the cl:):

@PG ID:STAR PN:STAR VN:STAR_2.3.1z_r395 CL:/n/hsphS10/hsphfs1/chb/local/bin/STAR   --runThreadN 1   --genomeDir /n/home05/kirchner/cache/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/mm9/star   --readFilesIn /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_1_fastq_trimmed.txt   /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/trim/1_110907_ERP000591_2_fastq_trimmed.txt      --outFileNamePrefix /n/home05/kirchner/cache/bcbio-nextgen/tests/test_automated_output/align/Test1/1_110907_ERP000591   --outReadsUnmapped Fastx   --outSAMstrandField intronMotif   --outSAMunmapped Within   --outSAMattrRGline ID:1   PL:illumina   PU:1_110907_ERP000591   SM:Test1      --outFilterMultimapNmax 10

It works fine. I'm not sure what it is about the new option, if I run it without the --outSamAttr option it works fine as well. Is there any way that second cl: entry can get dropped in the final header?

Thanks, and thanks for keeping up with maintaining STAR.

Alexander Dobin

unread,
Feb 24, 2014, 7:08:07 PM2/24/14
to rna-...@googlegroups.com
Hi Rory,

the two commands are not exactly the same  - the 'cl' is exactly as the user's command line, while 'CL' is processed by STAR (mostly the order of the commands is changed).
I think the trouble could be that the ID, PL,PU,SM tags in the --outSAMattrRGline argument are recognized as duplicate tags by the FastQC - however, this should not happen since the tags are supposed to be separated by tabs, not by spaces.
I need to think how to deal with it best, I guess I could make an option to output just CL or cl.

Cheers
Alex

Rory Kirchner

unread,
Feb 26, 2014, 11:35:06 AM2/26/14
to rna-...@googlegroups.com
Hi Alex,

Great, thank you for the explanation.

Best,

Rory
Reply all
Reply to author
Forward
0 new messages