Creating ubams from fastq files.

38 views
Skip to first unread message

Shawn Rynearson

unread,
Jan 23, 2018, 6:33:00 PM1/23/18
to biogo-user
Hello,

I'm still a bit of a noob to golang, but I've gotten hung up trying to write a ubam from a pair of fastq files.  The main issue I'm currently is that quality scores are being converted in some weird way.

If I start out with the following fastq record:
@HIDSEQ-D00784:30:HK5MNBCXX:1:1101:1163:1993 1:N:0:TGGAACAA
CGTTCCACACAAAGCTGGCTTCCCCGGAGTGACATGTGGCATTGTGCAAGGAAAGCCCCTCAGGGGTTTTTGCCAGAAAACTGCACCCACTGCATACAAA
+
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Here is a function that is doing some matching and writing out to the bam writer:

func readMatcher(file string, fq1 map[string]fqData, l string, sample string) {


 reader
, err := fastx.NewDefaultReader(file)
 
if err != nil {
 panic
(err)
 
}


 
// create readgroup for header.
 rg
, err := sam.NewReadGroup("8c6115b9_2", "", "", l, "", "", "", sample, "", "", time.Now(), 0)
 
if err != nil {
 panic
(err)
 
}


 
// create header and add values.
 samHeader
, err := sam.NewHeader(nil, nil)
 samHeader
.SortOrder = 1
 samHeader
.Version = "1.5"


 rgerr
:= samHeader.AddReadGroup(rg)
 
if rgerr != nil {
 panic
(err.Error())
 
}


 
var w io.Writer
 w
, err = os.Create("bigones.bam")
 
if err != nil {
 panic
(err)
 
}


 bamWriter
, err := bam.NewWriter(w, samHeader, 0)
 
if err != nil {
 panic
(err)
 
}
 defer bamWriter
.Close()


 
for {
 fq2
, err := reader.Read()
 
if err == io.EOF {
 
break
 
}


 h
:= sha256.New()
 h
.Write(fq2.ID)
 str
:= base64.StdEncoding.EncodeToString(h.Sum(nil)[:24])


 
/*
 fq2Match, ok := fq1[str]
 if !ok {
 fmt.Println("Can not file matching pair for: ", string(fq2.Name))
 os.Exit(1)
 }
 */



 
// set flag for pair data
 
var flag1 sam.Flags = 77
 
// var flag2 sam.Flags = 141


 samReference
, err := sam.NewReference("test", "", "", 100, nil, nil)
 
if err != nil {
 panic
(err)
 
}


 fq1Record
:= &sam.Record{
 
Name:  string(fq1[str].id),
 
Ref:   samReference,
 
Flags: flag1,
 
Seq:   sam.NewSeq(fq1[str].seq),
 
Qual:  fq1[str].qual,
 
}
 bamWriter
.Write(fq1Record)
 
}
}

The issue I'm facing is that a bam file is created but the quality scores are modifyed:

HIDSEQ-D00784:30:HK5MNBCXX:1:1101:2618:1999 77 * 1 0 * * 1 0 CTTCCTCATGACCACCTGGGGGTTCCAAGTCCTGGATCATTCACTCTGTGTCCCAGTGACAATGAGAACAATGTCTAGACACTCTCACCTGTGACCACGA eeeeejjjjjjjjjjjjjjjjjijjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj


I've checked all the types and sam.Record is getting what is expected, so I'm a bit puzzled here.  Any guidance could help.

Thanks,
--Shawn


Dan Kortschak

unread,
Jan 23, 2018, 6:46:12 PM1/23/18
to Shawn Rynearson, biogo-user
It's difficult to know without a complete reproducer (preferably
small), but given that it's a quality rendering change I imagine that
it has to do with choice of quality encoding used by the fastx FASTQ
reader (which is not biogo BTW). The biogo fastq reader allows
specification of the quality encoding in the provided template value, I
don't know how or if fastx allows that.

Dan

Shawn Rynearson

unread,
Jan 24, 2018, 1:54:24 PM1/24/18
to biogo-user
I can do a bit of testing on using biogo for the fastq parsing, do you know/have any good examples of how you set the seqio.SequenceAppender correctly?

func NewReader(r io.Reader, template seqio.SequenceAppender) *Reader

Dan Kortschak

unread,
Jan 24, 2018, 2:40:48 PM1/24/18
to Shawn Rynearson, biogo-user
All the sequence types I provide satisfy that interface. In your case,
a linear.QSeq would be most appropriate.
Reply all
Reply to author
Forward
0 new messages