Keeping barcode information in bam file

487 views
Skip to first unread message

Jonathan Landry

unread,
Jan 15, 2016, 2:27:19 PM1/15/16
to rna-star
Dear all,

I have fastq files with barcode information in the header of each reads like:
@NS500188:164:H5WKHAFXX:1:11101:20350:1043 2:N:0:0AAGAGAGCAGA

Is there a way to keep this information (2nd field after space) in the bam file produced by STAR?

Thanks for your help,
Best,
Jonathan

Felix Schlesinger

unread,
Jan 15, 2016, 2:33:29 PM1/15/16
to rna-star
Something similar to be bwa-mem -C option (http://bio-bwa.sourceforge.net/bwa.shtml) could be useful. Of course the more general way to handle this is to start alignment from (unaligned) BAM files, keeping all tags and other info.

Alexander Dobin

unread,
Jan 15, 2016, 4:54:13 PM1/15/16
to rna-star
Hi Jonathan, Felix,

if you want to keep the barcode info in the header, you can simply remove the space between two fields, for instance:
awk 'NR%4==1 {print $1 "_" $2}'
or to keep just the barcode removing 7 first symbols from the 2nd field:
awk 'NR%4==1 {print $1 "_" substr($2,8)}'

To do it on the fly (wtihout creating intermediate files) you can add this command directly to --readFilesCommand.
Or if you have zipped files, you can make a script that you supply as --readFilesCommand /path/to/script.sh
zcat $1 | awk 'NR%4==1 {print $1 "_" $2}'

Felix,

it's not clear to me what output bwa-mem -C produces. Does it add a SAM attribute (tag) with the 2nd field of the name?
It should be easy to do - will add it to my list.

Cheers
Alex

Felix Schlesinger

unread,
Jan 16, 2016, 9:28:14 PM1/16/16
to rna-star
Hi Alex,

yes it essentially takes everything after the space in the fastq and makes it a SAM tag. Avoids having to parse it out of the readname in the BAM later. But it is not very general, so I actually prefer aligning from BAM files as input and copying over all tags.

Felix

Jonathan Landry

unread,
Jan 18, 2016, 8:34:57 AM1/18/16
to rna-star
Hi Alex, Felix,

Thanks for your answers.
Taking out the space could work nicely. I will give it a try.
Thanks a lot again.

Jonathan

Alexander Dobin

unread,
Jan 19, 2016, 3:46:42 PM1/19/16
to rna-star
Hi Felix,

I have it in my plans to implement STAR input from BAM files, a few users have requested that.
Can Illumina pipeline output directly into BAM instead of FASTQ?

Cheers
Alex

Felix Schlesinger

unread,
Jan 19, 2016, 4:40:05 PM1/19/16
to rna-star
There is no direct way to output unaligned BAMs directly instead of fastqs yet, but that will probably happen.
We already often only store BAM files long-term, so rerunning from BAM (instead of converting back to fastq) is already convenient, especially now with CRAM compression.

Felix
Reply all
Reply to author
Forward
0 new messages