When Signal Enrichment Value is absent

124 views
Skip to first unread message

Dataminer

unread,
Feb 19, 2013, 7:05:24 AM2/19/13
to idr-d...@googlegroups.com
Hi!

Well I know if the p value and qvalue are absent you can put -1, but what if signal enrichment is absent then what value can be put in?


Please reply, I have already applied for the group membership.

Thank you

Best regards

Anshul Kundaje

unread,
Feb 19, 2013, 7:13:11 AM2/19/13
to idr-d...@googlegroups.com
You can put any measure that can be used to rank peaks in the signal value, p-value or q-value columns of the narrowPeak peak file format and select which column you want IDR to use to analyze the rank behaviors. You dont have to use -1 for the purposes of IDR if the p-value or q-values are missing. This is more to do with the narrowPeak peak file format for compatibility with the UCSC genome browser than what IDR requires. Atleast one of the 3 columns i.e. those reserved for signal/p-value or q-value must be provided but you can use whatever ranking measure you want in either of those columns and specifiy to IDR which column you want it to use.

The main thing is that the ranking measure can be any quantitative ranking measure for peaks but should have the foll. properties
- Higher values of the ranking measure represent stronger/higher confidence peaks
- There is stronger consistency of the ranking measure for higher confidence peaks and lower consistency of the lower confidence peaks (you can visualize this as a scatter plot of the ranks of peaks common to both replicates .. where the peaks ranked by your ranking measure. You should see stronger correlation for the higher confidence peaks i.e better ranked peaks and worse correlation for the weaker noisy peaks).
- The ranking measure should not have too many ties amongst peaks
- Make sure you use a relatively relaxed peak calling threshold and provide a sufficient number of peaks down the ranked list so that IDR can see a sufficient noise component

Thanks,
-A



--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Dataminer

unread,
Feb 19, 2013, 7:44:14 AM2/19/13
to idr-d...@googlegroups.com
Hi Anshul,

Thank you for your reply.

A small tail question, I have broad peak files from macs2 that means I do not have signal enrichment, q value and pvalue and if I put -1 in signal enrichment, q value and pvalue it will result in tie.How can I handle such a situation or how to deal with brroadpeak files from macs2 and tailor them to fit IDR file format?

Any suggestions?

Thank you

Anshul Kundaje

unread,
Feb 19, 2013, 8:52:33 AM2/19/13
to idr-d...@googlegroups.com
MACS2 does output all signal fold-enrichments, p-values and/or q-values for narrow peaks. If you are using it in broad mode and its chaining smaller peaks to call larger regions, you could compute the fold-change or Poisson p-value of ChIP relative to input in each of these regions or use ChIP read density in each of the regions as ranking measures and put these in the signal value column. MACS2 can generate signal tracks with fold-change values per base. You could also try using these to compute for example the average/median/75th/90th quantile of the fold-change values per base in each region.

Btw, the IDR pipeline has not been optimized for broad peaks so I would not recommend using it as is without very careful analysis.

Thanks,
-A

Dataminer

unread,
Feb 20, 2013, 4:18:50 AM2/20/13
to idr-d...@googlegroups.com
HI!

yes, I know that IDR pipeline is not optimized for broad peaks still I would like to give it a try.

One last query, regarding the file format.

The file format which we get from Macs2 BroadPeak is:

Chr    Strt    Stp    Name    100*-log10Pvalue   Strand   StartofFirstNarrowPeak    END   RGB   No._of_blocks   length_of_each_block   
starts_of_each_blocks

1    13917    27329    peak_1    32    .    13985    27313    0    6    1,6824,1190,1767,315,1    0,68,8110,9399,13081,13411

Now, when I prepare a narrowPeakFormat for IDR analysis can I put the following in the columns:
Chr    Strt       Stp             Name        Score              Strand   Signal_Value      -log10pvalue          -log10qvalue    peak
1    13917    27329         peak_1      20(absent)         .              ?                       3.2(32/100)                   -1                  -1
 

to get 3.2 in -log10pvalue column, I divided the 100*-log10Pvalue column from broad peak by 100, is this ok?
and in Signal value I can put ratio of read count in treatment to control for each peak, is this correct?

After this will it be better to use signal value or pvalue to compute IDR?


Please help me in solving this puzzle.

Thank you

Anshul Kundaje

unread,
Feb 22, 2013, 10:55:08 AM2/22/13
to idr-d...@googlegroups.com
You should try both measures and see which one gives you better results ie. You can compare the idr plots that show the idr scores of peaks as you go down the ranked list of peaks (the batch-consistency-plot routine in the idr package). The measure that gives you better reproducibility is the one you will want to use.

It will also help if you create scatter plots of ranks of all overlapping peaks between replicates once using signal value as ranking measure and once using pvalues as the ranking measure. Sometimes one of these measures may have strange characteristics in which  case its easy to eliminate it. If you create these plots I.e the idr plots and the scatter plots and send it on the mailing list, i or others can advise you further.

Thanks
Anshul.

Sent from my Windows Phone
Reply all
Reply to author
Forward
0 new messages