pileup

39 views
Skip to first unread message

Brent Pedersen

unread,
Jul 8, 2016, 11:36:18 AM7/8/16
to biogo...@googlegroups.com
Just checking before I implement or (more likely) use something else;
has anyone written anything resembling a pileup tool in golang?

thanks,
-Brent

Dan Kortschak

unread,
Jul 8, 2016, 8:40:10 PM7/8/16
to Brent Pedersen, biogo...@googlegroups.com
A number of things I've written do this, but I find that a good
generalised solution probably does not exists; depending on what you are
doing you can make certain optimisations that the general case does not
allow.

Dan

Brent Pedersen

unread,
Jul 13, 2016, 12:25:51 PM7/13/16
to Dan Kortschak, biogo...@googlegroups.com
In my case, I'd just like to get the REF+,REF-,ALT+,ALT- reads for
given sites using the CIGAR and MD:Z tags, preferably.

Dan Kortschak

unread,
Jul 13, 2016, 5:46:09 PM7/13/16
to Brent Pedersen, biogo...@googlegroups.com

Do you want the reads or the stats for the reads?

Assuming you want the stats for each position of the relevant reads then the choice of approach comes down to a number of factors: the size the genome, the sparsity of the relevant read mapping and the depth of reads at the relevant sites.

If you have a small genome, just use large slices of numeric values to store the stats. If it's large and sparse use a step vector like the one provided by biogo/store/step. If it's large and not sparse make sure you have a large enough machine, though if the ban is sorted you should be able to use mmapped data without too much pain since you would expect reasonable locality.

The numeric type depends on how deep the sequence is going to be at the relevant sites given how many attributes you are following.

Once you've decided all of those it's just a matter of iterating over a bam reader's output and counting positions.

--
afk
Dan

Brent Pedersen

unread,
Jul 13, 2016, 5:57:23 PM7/13/16
to Dan Kortschak, biogo...@googlegroups.com
My use-case is embedded in a streaming view of the data. With that, I
can collect reads for each query site.

Maybe I'm missing something in biogo but I was wondering about getting
the match/mismatch/indel status for a given genomic coordinate for
each read that overlaps it.

On Wed, Jul 13, 2016 at 3:46 PM, Dan Kortschak

Dan Kortschak

unread,
Jul 13, 2016, 6:11:09 PM7/13/16
to Brent Pedersen, biogo...@googlegroups.com

Getting that information is, as they say, "a simple matter of programming".

The relevant helper though is through Consume. See https://godoc.org/github.com/biogo/hts/sam#ex-Consume.

--
afk
Dan

From: Brent Pedersen
Sent: Thursday, 14 July, 07:27
Subject: Re: [biogo-user] pileup
To: Dan Kortschak
Cc: biogo...@googlegroups.com

My use-case is embedded in a streaming view of the data. With that, I can collect reads for each query site. Maybe I'm missing something in biogo but I was wondering about getting the match/mismatch/indel status for a given genomic coordinate for each read that overlaps it. On Wed, Jul 13, 2016 at 3:46 PM, Dan Kortschak wrote: > Do you want the reads or the stats for the reads? > > Assuming you want the stats for each position of the relevant reads then the > choice of approach comes down to a number of factors: the size the genome, > the sparsity of the relevant read mapping and the depth of reads at the > relevant sites. > > If you have a small genome, just use large slices of numeric values to store > the stats. If it's large and sparse use a step vector like the one provided > by biogo/store/step. If it's large and not sparse make sure you have a large > enough machine, though if the ban is sorted you should be able to use > mmapped data without too much pain since you would expect reasonable > locality. > > The numeric type depends on how deep the sequence is going to be at the > relevant sites given how many attributes you are following. > > Once you've decided all of those it's just a matter of iterating over a bam > reader's output and counting positions. > > -- > afk > Dan > > > > > On Thu, Jul 14, 2016 at 1:55 AM +0930, "Brent Pedersen" > wrote: > > In my case, I'd just like to get the REF+,REF-,ALT+,ALT- reads for > given sites using the CIGAR and MD:Z tags, preferably. > > > On Fri, Jul 8, 2016 at 6:40 PM, Dan Kortschak > wrote: >> A number of things I've written do this, but I find that a good >> generalised solution probably does not exists; depending on what you are >> doing you can make certain optimisations that the general case does not >> allow. >> >> Dan >> >> On Fri, 2016-07-08 at 09:36 -0600, Brent Pedersen wrote: >>> Just checking before I implement or (more likely) use something else; >>> has anyone written anything resembling a pileup tool in golang? >> >>

Brent Pedersen

unread,
Jul 13, 2016, 6:14:11 PM7/13/16
to Dan Kortschak, biogo...@googlegroups.com
On Wed, Jul 13, 2016 at 4:11 PM, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
> Getting that information is, as they say, "a simple matter of programming".

that's what I was trying to avoid.


> The relevant helper though is through Consume. See
> https://godoc.org/github.com/biogo/hts/sam#ex-Consume.
>

ok. I'll check that out.

Dan Kortschak

unread,
Jul 13, 2016, 6:33:28 PM7/13/16
to Brent Pedersen, biogo...@googlegroups.com

That's the joke.

--
afk
Dan

From: Brent Pedersen
Sent: Thursday, 14 July, 07:44
Subject: Re: [biogo-user] pileup
To: Dan Kortschak
Cc: biogo...@googlegroups.com

On Wed, Jul 13, 2016 at 4:11 PM, Dan Kortschak wrote: > Getting that information is, as they say, "a simple matter of programming". that's what I was trying to avoid.

Brent Pedersen

unread,
Nov 9, 2016, 2:51:20 PM11/9/16
to biogo...@googlegroups.com
I just made the pileup library based on biogo public here:
https://github.com/brentp/bigly
The name is less amusing today than when I started the project, but
the functionality is pretty nice.

Dan Kortschak

unread,
Nov 9, 2016, 8:53:14 PM11/9/16
to Brent Pedersen, biogo...@googlegroups.com
Nice.

You could use gonum/plot for the plotting functionality.

I'll send a PR later fixing some typos I noticed.

Dan

Brent Pedersen

unread,
Nov 9, 2016, 8:59:02 PM11/9/16
to Dan Kortschak, biogo...@googlegroups.com
On Wed, Nov 9, 2016 at 6:52 PM, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
> Nice.
>
> You could use gonum/plot for the plotting functionality.

I should, I meant the main functionality to be a pileup-like tool with
the plot showing the utility.

>
> I'll send a PR later fixing some typos I noticed.
>

Please do. I just fixed the horrible ones at the top of the README.
Reply all
Reply to author
Forward
0 new messages