What are signal p-values for ChIP-Seq?

1,276 views
Skip to first unread message

#TAY KAI YI#

unread,
Jul 14, 2022, 11:43:09 AM7/14/22
to gen...@soe.ucsc.edu

Hello,

 

I wish to compare histone ChIP-Seq markers between samples and extract the respective peak values.

 

However, I am quite confused what signal p-values actually represent. The ENCODE website mentioned that it is to reject the null hypothesis that the signal at that location is present in the control.

 

However, I observed that the larger the peak, the greater the y-axis value (which I believe is signal p-value) is.

 

Could I kindly have a clearer understanding of what signal p-value is and how do I interpret the results?

 

Thank you!

 

Best regards,

Kai Yi

 

Gerardo Perez

unread,
Jul 20, 2022, 12:33:54 AM7/20/22
to #TAY KAI YI#, gen...@soe.ucsc.edu

Hello Kai Yi,

Thank you for your interest in the Genome Browser and your question about signal p-values for ChIP-Seq.

Could you clarify exactly what assembly and track(s) are you looking at? We are getting the impression that you might be asking about hg19's Broad Histone signals track (https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHistone), which is a bigWig track. Or are you asking about a peaks track such as the hg38 TF ChIP track (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chrX&g=encTfChipPk), which is BED-based and has a pValue field: http://genome.ucsc.edu/FAQ/FAQformat.html#format13

Please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/HKAPR01MB361764648BFA6DCC0B14A37EFF889%40HKAPR01MB3617.apcprd01.prod.exchangelabs.com.

#TAY KAI YI#

unread,
Jul 25, 2022, 4:30:56 PM7/25/22
to Gerardo Perez, gen...@soe.ucsc.edu
Hi Gerardo,

Thank you for your reply, yes I am talking about ChIP-Seq bigwig tracks. They have two different file formats available, signal p-value and fold change over control. 

I hope to get a better understanding of what the signal p-value actually means, and how it was derived. I will really appreciate if you could kindly explain to me, or point me to any readings.

Thank you!

From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Wednesday, July 20, 2022 12:33 PM
To: #TAY KAI YI# <KAIY...@e.ntu.edu.sg>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] What are signal p-values for ChIP-Seq?
 

Brian Lee

unread,
Jul 29, 2022, 6:16:25 PM7/29/22
to #TAY KAI YI#, Gerardo Perez, gen...@soe.ucsc.edu

Hello Kai Yi,

Thank you for using the UCSC Genome Browser and your question about p-values for ChIP-Seq.

Likely the track you are looking at graphing in the bigWig is -log10(p-value) which is usually used in plots. For the typical plain p-value you can read about online,https://en.wikipedia.org/wiki/P-value, the smaller the number the better, so 0.01 is a good value (depending on the context), while a smaller 0.001 is better.  But without having a really unusual y-axis, it is hard to make an intuitive plot where "better" means the signal increases instead of decreases. So often when you see "p-value" next to a plot with a y axis with values like 0 to 10, where what is being displayed is not actually the p-value, rather -log10(p-value). For example, -log10(0.01) is 2, -log10(0.00001) is 5. This way the higher the plot, the smaller the real p-value, and the better the interpretation.

Note, however, that he precise meaning of p-value is hard to grasp, misuse is widespread and has been a major topic in metascience, which is the use of scientific methodology to study science itself. Even our own internal team has tripped up by p-value pitfalls, so you are in good company when you ask about it.  For instance, even in a published peer-reviewed paper, it wasn't fully understood until later that p-value was not interpreted incorrectly when not taking into account relative occurrence of two items when trying to juxtapose two p-values next to each other, when the items in question did not have the same effect size.

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please send new questions to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum to help others find answers to similar questions. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu, which is a private internal list to our support team.

All the best,


#TAY KAI YI#

unread,
Aug 1, 2022, 1:49:19 PM8/1/22
to Brian Lee, Gerardo Perez, gen...@soe.ucsc.edu

Hello Brian,

 

Thank you for your reply! The interpretation of signal p-value is indeed much clearer for me now.

 

I am rather intrigued by your explanation on juxtaposing two p-values from different data with different sample sizes. In your opinion, what would be the best way to compare bigWig signals from two samples (ran by different laboratories hence with different effect size)?

 

I recently came across (https://www.nature.com/articles/s41586-020-2493-4) this paper by ENCODE which uses Z-scores to compare signals within a single sample. They also mentioned that:

 

Z-score computation is necessary for the signals to be comparable across biosamples because the uniform processing pipelines for DNase-seq and ChIP–seq data produce different types of signals. The DNase-seq signal is in sequencing-depth-normalized read counts, whereas the ChIP–seq signal is the fold change of ChIP over input. Even for the ChIP–seq signal, which is normalized using a control experiment, substantial variation remains in the range of signals among biosamples.”

 

What are your opinions on the usage of Z-score to compare a single signal between biosamples from different laboratories? If I follow their methodology (I am not a statistician nor a bioinformatician so I might be extremely wrong about the interpretation of this paper):

  1. Obtain bigWigAverageOverBed from bigWig fold change files
  2. Obtain a Z-score for each ‘peak of interest’ compared with the rest of the ‘peaks of interest’ in each biosample

Would I be able to compare each peak Z-score between peaks? Or would I have to consider underlying assumptions e.g. same effect size?

 

Thank you and I look forward to your advice.

 

Best regards,

Kai Yi

 

 

From: Brian Lee
Sent: Saturday, 30 July 2022 6:16 AM
To: #TAY KAI YI#
Cc: Gerardo Perez; gen...@soe.ucsc.edu
Subject: Re: [genome] What are signal p-values for ChIP-Seq?

 

Hello Kai Yi,

Thank you for using the UCSC Genome Browser and your question about p-values for ChIP-Seq.

Likely the track you are looking at graphing in the bigWig is -log10(p-value) which is usually used in plots. For the typical plain p-value you can read about online,https://en.wikipedia.org/wiki/P-value, the smaller the number the better, so 0.01 is a good value (depending on the context), while a smaller 0.001 is better.  But without having a really unusual y-axis, it is hard to make an intuitive plot where "better" means the signal increases instead of decreases. So often when you see "p-value" next to a plot with a y axis with values like 0 to 10, where what is being displayed is not actually the p-value, rather -log10(p-value). For example, -log10(0.01) is 2, -log10(0.00001) is 5. This way the higher the plot, the smaller the real p-value, and the better the interpretation.

Note, however, that he precise meaning of p-value is hard to grasp, misuse is widespread and has been a major topic in metascience, which is the use of scientific methodology to study science itself. Even our own internal team has tripped up by p-value pitfalls, so you are in good company when you ask about it.  For instance, even in a published peer-reviewed paper, it wasn't fully understood until later that p-value was not interpreted incorrectly when not taking into account relative occurrence of two items when trying to juxtapose two p-values next to each other, when the items in question did not have the same effect size.

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please send new questions to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum to help others find answers to similar questions. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu, which is a private internal list to our support team.

All the best,

On Mon, Jul 25, 2022 at 1:30 PM '#TAY KAI YI#' via UCSC Genome Browser Public Support <gen...@soe.ucsc.edu> wrote:

Hi Gerardo,

 

Thank you for your reply, yes I am talking about ChIP-Seq bigwig tracks. They have two different file formats available, signal p-value and fold change over control. 

 

I hope to get a better understanding of what the signal p-value actually means, and how it was derived. I will really appreciate if you could kindly explain to me, or point me to any readings.

 

Thank you!

Daniel Schmelter

unread,
Aug 2, 2022, 8:37:45 PM8/2/22
to #TAY KAI YI#, Brian Lee, Gerardo Perez, gen...@soe.ucsc.edu

Hello, Kai Yi,

Thank you for using the UCSC Genome Browser and for your questions about comparing signals and scores.

Unfortunately, this support forum is limited to questions about using the UCSC Genome Browser and the tools and data. My colleague's personal experience should be thought of as independent from your specific situation. We are not qualified to offer any statistical or scientific methodology advice, only to advise on our website. Your question might be better suited for a PI or the BioStars forum. 

I wish you the best with your investigations. If you have any more questions, please include the email "gen...@ucsc.soe.edu" in your send field to reach our whole team. This email is publically archived.

All the best,
Daniel Schmelter
UCSC Genome Browser


Reply all
Reply to author
Forward
0 new messages