Do you want the reads or the stats for the reads?
Assuming you want the stats for each position of the relevant reads then the choice of approach comes down to a number of factors: the size the genome, the sparsity of the relevant read mapping and the depth of reads at the relevant sites.
If you have a small genome, just use large slices of numeric values to store the stats. If it's large and sparse use a step vector like the one provided by biogo/store/step. If it's large and not sparse make sure you have a large enough machine, though if the ban is sorted you should be able to use mmapped data without too much pain since you would expect reasonable locality.
The numeric type depends on how deep the sequence is going to be at the relevant sites given how many attributes you are following.
Once you've decided all of those it's just a matter of iterating over a bam reader's output and counting positions.
--
afk
Dan
Getting that information is, as they say, "a simple matter of programming".
The relevant helper though is through Consume. See https://godoc.org/github.com/biogo/hts/sam#ex-Consume.
--
afk
Dan
From: Brent Pedersen
Sent: Thursday, 14 July, 07:27
Subject: Re: [biogo-user] pileup
To: Dan Kortschak
Cc: biogo...@googlegroups.com
My use-case is embedded in a streaming view of the data. With that, I can collect reads for each query site. Maybe I'm missing something in biogo but I was wondering about getting the match/mismatch/indel status for a given genomic coordinate for each read that overlaps it. On Wed, Jul 13, 2016 at 3:46 PM, Dan Kortschak wrote: > Do you want the reads or the stats for the reads? > > Assuming you want the stats for each position of the relevant reads then the > choice of approach comes down to a number of factors: the size the genome, > the sparsity of the relevant read mapping and the depth of reads at the > relevant sites. > > If you have a small genome, just use large slices of numeric values to store > the stats. If it's large and sparse use a step vector like the one provided > by biogo/store/step. If it's large and not sparse make sure you have a large > enough machine, though if the ban is sorted you should be able to use > mmapped data without too much pain since you would expect reasonable > locality. > > The numeric type depends on how deep the sequence is going to be at the > relevant sites given how many attributes you are following. > > Once you've decided all of those it's just a matter of iterating over a bam > reader's output and counting positions. > > -- > afk > Dan > > > > > On Thu, Jul 14, 2016 at 1:55 AM +0930, "Brent Pedersen" > wrote: > > In my case, I'd just like to get the REF+,REF-,ALT+,ALT- reads for > given sites using the CIGAR and MD:Z tags, preferably. > > > On Fri, Jul 8, 2016 at 6:40 PM, Dan Kortschak > wrote: >> A number of things I've written do this, but I find that a good >> generalised solution probably does not exists; depending on what you are >> doing you can make certain optimisations that the general case does not >> allow. >> >> Dan >> >> On Fri, 2016-07-08 at 09:36 -0600, Brent Pedersen wrote: >>> Just checking before I implement or (more likely) use something else; >>> has anyone written anything resembling a pileup tool in golang? >> >>
That's the joke.
--
afk
Dan
From: Brent Pedersen
Sent: Thursday, 14 July, 07:44
Subject: Re: [biogo-user] pileup
To: Dan Kortschak
Cc: biogo...@googlegroups.com
On Wed, Jul 13, 2016 at 4:11 PM, Dan Kortschak wrote: > Getting that information is, as they say, "a simple matter of programming". that's what I was trying to avoid.