FRiP metric

TRypdal

unread,

Mar 6, 2013, 1:30:50 PM3/6/13

to idr-d...@googlegroups.com

Hi

I'm reading the Landt 2012 paper detailing the usage of the IDR in Encode projects and I stumbled on the definition of FRip (fraction of reads in peaks). I was wondering if you know how exactly it is defined, and if a suitable package to calculate it exists. I would imagine it's simply a matter of scanning the output macs bed file to count the reads/peaks, however I still thought I'd ask if something different it's done to obtain the metrics. Thanks!

Bob Thurman

unread,

Mar 6, 2013, 1:45:52 PM3/6/13

to idr-d...@googlegroups.com

I believe that's just how it is calculated, so with the appropriate utilities and peak calls it is pretty easy to do. In our lab we have developed and studied another version of that metric, called SPOT (signal portion of tags), that computes the fraction of reads in "hotspots," which are regions of tag enrichment (like peaks, but generally more arbitrarily sized). We have analysis (still unpublished, but hopefully on its way) that shows SPOT to track pretty well for a given assay/mark with the strand correlation metric that Anshul has developed. Unlike the correlation metric, FRiP-like metrics such as SPOT obviously depend on the peak caller used, so I think the key thing is to settle on a particular peak caller and use the same settings each time in order to properly compare scores. In other words, FRiP scores computed with MACS shouldn't be compared with those computed with SPP, or with SPOT scores. However, as you've discovered, it's conceptually a pretty easy metric to understand, and we've used SPOT for our own data (DNase-seq/ChIP-seq) for several years and found it to be pretty reliable.

If you're interested, code to calculate hotspots and the SPOT metric is available here:

http://www.uwencode.org/proj/hotspot/

Best,

Bob

On Wed, Mar 6, 2013 at 10:30 AM, TRypdal <giusepp...@gmail.com> wrote:

Hi

I'm reading the Landt 2012 paper detailing the usage of the IDR in Encode projects and I stumbled on the definition of FRip (fraction of reads in peaks). I was wondering if you know how exactly it is defined, and if a suitable package to calculate it exists. I would imagine it's simply a matter of scanning the output macs bed file to count the reads/peaks, however I still thought I'd ask if something different it's done to obtain the metrics. Thanks!

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

TRypdal

unread,

Mar 6, 2013, 2:35:27 PM3/6/13

to idr-d...@googlegroups.com

Bob

that is great info and thank you for providing a link to your code. I will try it out for sure.

Best,

T

Anshul Kundaje

unread,

Mar 6, 2013, 11:20:42 PM3/6/13

to Bob Thurman, idr-d...@googlegroups.com

Yup I second Bob. What we used were the Spp+idr peak calls to compute frip as they are very stable for tf chipseq data.

Andhul.

Sent from my Windows Phone

From: Bob Thurman
Sent: 3/6/2013 10:45 AM
To: idr-d...@googlegroups.com
Subject: Re: FRiP metric

TRypdal

unread,

Mar 21, 2013, 2:53:25 PM3/21/13

to idr-d...@googlegroups.com, Bob Thurman

Anshul - which IDR output provides the number of reads contributing to each peak?

Thanks

Anshul Kundaje

unread,

Mar 21, 2013, 3:27:09 PM3/21/13

to idr-d...@googlegroups.com, Bob Thurman

IDR does not do that. IDR will just give you a stable set of reproducible peaks. All you need to do is take the IDR peaks. Count the number of reads that fall within those peaks / total number of mapped reads. That will give you the FRiP (Fraction of reads in peaks).

-A

TRypdal

unread,

Mar 21, 2013, 3:47:08 PM3/21/13

to idr-d...@googlegroups.com

Right so I guess to make it even quicker I could just go back to my peak caller's raw read/peak output and use that - obviously only for those intervals existing in the IDR output?

Thanks

Georgi Marinov

unread,

Mar 21, 2013, 3:53:54 PM3/21/13

to idr-d...@googlegroups.com

You would want to calculate that using the regions you get from IDR and the BAM file, not the output of the peak caller.

Georgi

TRypdal

unread,

Mar 21, 2013, 3:58:36 PM3/21/13

to idr-d...@googlegroups.com

Georgi,

thanks. I was under the impression that the IDR-processed peaks were in all cases a subset of the peak caller's raw peak set. I will only use my bam and the IDR output then.

Anshul Kundaje

unread,

Mar 21, 2013, 4:08:31 PM3/21/13

to idr-d...@googlegroups.com

You are right that the IDR peaks are a subset of the total number of relaxed peaks. But what Georgi meant is that you use reads from your BAM/alignment files. The peak caller does not output read counts ..

-A

TRypdal

unread,

Mar 21, 2013, 4:22:47 PM3/21/13

to idr-d...@googlegroups.com

Actually some do. I might be wrong but I might have seen read counts for each peak in an .xls Macs2 output file .

Georgi Marinov

unread,

Mar 21, 2013, 4:25:16 PM3/21/13

to idr-d...@googlegroups.com

Some indeed do but it's not an universal feature so if one is to build a flexible pipeline that can incorporate different peak callers into the IDR framework, the time to calculate the FRiP scores is post-IDR.

Georgi

Reply all

Reply to author

Forward