Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

performing statistics on normalized data

2,898 views
Skip to first unread message

Michael Sullivan

unread,
Aug 7, 2013, 5:08:21 PM8/7/13
to met...@magpie.bio.indiana.edu
I've been analyzing enzyme activity in crude extracts using three different substrates. My replicates for this analysis are independently prepared extracts. There is sufficient variation in the activity between these extracts that although the trend of substrate preference is pretty obvious (the best substrate is always the best, the worst is always the worst, and the midling is always the midling), statistical analysis (ANOVA) of the specific activity data (pkat/mg protein) does not show significant differences. However, if I normalize the data derived from each extract to the substrate with the highest activity, the data show relatively little variation between the different extracts. Of course the normalization means that the best substrate relative value is 1.00 +/- 0.000. Is normalizing in this way problematic for doing ANOVA or other statistical tests? It seems like it shouldn't be, but a colleague did suggest he thought it could be problematic (although he wasn't certain it was truly a problem either).

Thanks for any insight any of you can provide
---
Michael L. Sullivan, PhD
Research Molecular Geneticist
US Dairy Forage Research Center
1925 Linden Drive
Madison, WI 53706
608-890-0046 (Phone)
608-890-0076 (FAX)


Irit Rappley

unread,
Aug 7, 2013, 8:18:30 PM8/7/13
to Michael Sullivan, met...@magpie.bio.indiana.edu
You could do a repeated measures ANOVA. That means that all of the samples from replicate #1 are on the same row, etc, and the analysis takes into account that they would trend together. Then, in the post-ANOVA analysis, you can compare each column (substrate) to the best substrate. This is very easy to set up in a statistical analysis software package like Prism. Not sure exactly how you would do it in Excel, but I'm sure Google could help.

If you have a positive control -- a substrate that you know works well but is not one of the set that you are testing -- that would be better for normalization in my opinion.

It would be great to hear from someone with more expertise in statistics, though.

Hope this helps,
Irit


Irit Rappley, PhD
The Scripps Research Institute
BCC 265
10550 North Torrey Pines Road
La Jolla, CA 92037

858-784-9608




On Aug 7, 2013, at 2:08 PM, Michael Sullivan wrote:

I've been analyzing enzyme activity in crude extracts using three different substrates. My replicates for this analysis are independently prepared extracts. There is sufficient variation in the activity between these extracts that although the trend of substrate preference is pretty obvious (the best substrate is always the best, the worst is always the worst, and the midling is always the midling), statistical analysis (ANOVA) of the specific activity data (pkat/mg protein) does not show significant differences. However, if I normalize the data derived from each extract to the substrate with the highest activity, the data show relatively little variation between the different extracts. Of course the normalization means that the best substrate relative value is 1.00 +/- 0.000. Is normalizing in this way problematic for doing ANOVA or other statistical tests? It seems like it shouldn't be, but a colleague did suggest he thought it could be problematic (although he wasn't cert!
ain it was truly a problem either).

Thanks for any insight any of you can provide
---
Michael L. Sullivan, PhD
Research Molecular Geneticist
US Dairy Forage Research Center
1925 Linden Drive
Madison, WI 53706
608-890-0046 (Phone)
608-890-0076 (FAX)


_______________________________________________
Methods mailing list
Met...@net.bio.net<mailto:Met...@net.bio.net>
http://www.bio.net/biomail/listinfo/methods

Adrian Ardelean

unread,
Aug 8, 2013, 1:55:43 AM8/8/13
to methods@magpie. bio. indiana. edu, mlsu...@wisc.edu

Hi,

Ifeel that the true is in your hand!
Please excuse my igmorance!
What do you had extract? What about you said hight, middle and low?
So, the correction with extraction will give appropiate values. I have no experience with enzyme use, bur you must add correction with uncertainity. What is your recovery?

With all the best from Transylvania,

Adrian

Ardelean Adrian, DVM
Cluj-Napoca, Romania

Trimis din Yahoo! Mail pe Android

Michael Sullivan

unread,
Aug 8, 2013, 11:20:37 AM8/8/13
to met...@magpie.bio.indiana.edu
HI Alec,

Thanks for your reply. The concern of the colleague I mentioned was similar to yours, that the normalization somehow violated one of the underlying assumptions for ANOVA. His specific concern was that it somehow violated the independent sampling assumption. I've gone over this topic in my statistic reference (The original edition of Myra Samuels's "Statistics for the Life Sciences") and don't see that the normalization would really violate this assumption. Conceptually, it seems like it is no different than another routine normalization... i.e., correcting for protein concentration of the extract. In a sense... activity measurement of the best substrate is a measure of the concentration of active enzyme. If I had an antibody recognizing the enzyme, I could measure the amount of enzyme in the extract that way, and use that for normalization. Conceptually, that approach doesn't seem as though it would violate the independent sampling assumption. Conceptually, my normalization seems like the same thing. That SEM for that sample goes to 0 seems like a red flag, but perhaps it isn't!

Mike
---
Michael L. Sullivan, PhD
Research Molecular Geneticist
US Dairy Forage Research Center
1925 Linden Drive
Madison, WI 53706
608-890-0046 (Phone)
608-890-0076 (FAX)

On Aug 7, 2013, at 8:42 PM, Alec Morley wrote:

> You could try a non-parametric ANOVA based on ranking. Power is probably close to data which are normally distributed
> I would think that your normalisation method would upset some basic assumptions of ANOVA based on interval data
> Alec
>
> Alec Morley
> Emeritus Professor
> Department of Haematology and Genetic Pathology
> Flinders University
>
>
>
> -----Original Message-----
> From: methods...@oat.bio.indiana.edu [mailto:methods...@oat.bio.indiana.edu] On Behalf Of Michael Sullivan
> Sent: Thursday, 8 August 2013 7:08 AM
> To: met...@magpie.bio.indiana.edu
> Subject: performing statistics on normalized data
>
> I've been analyzing enzyme activity in crude extracts using three different substrates. My replicates for this analysis are independently prepared extracts. There is sufficient variation in the activity between these extracts that although the trend of substrate preference is pretty obvious (the best substrate is always the best, the worst is always the worst, and the midling is always the midling), statistical analysis (ANOVA) of the specific activity data (pkat/mg protein) does not show significant differences. However, if I normalize the data derived from each extract to the substrate with the highest activity, the data show relatively little variation between the different extracts. Of course the normalization means that the best substrate relative value is 1.00 +/- 0.000. Is normalizing in this way problematic for doing ANOVA or other statistical tests? It seems like it shouldn't be, but a colleague did suggest he thought it could be problematic (although he wasn't cert!
> ain it was truly a problem either).
>
> Thanks for any insight any of you can provide
> ---
> Michael L. Sullivan, PhD
> Research Molecular Geneticist
> US Dairy Forage Research Center
> 1925 Linden Drive
> Madison, WI 53706
> 608-890-0046 (Phone)
> 608-890-0076 (FAX)
>
>
> _______________________________________________
> Methods mailing list
> Met...@net.bio.net
> http://www.bio.net/biomail/listinfo/methods
>
>


Deitiker, Philip R

unread,
Aug 8, 2013, 6:04:08 PM8/8/13
to Met...@magpie.bio.indiana.edu
Since he is normalizing based on the highest value then at every point of measurement the highest substrate always has the value of 1, thus he is fixing the average but more importantly the standard deviation of the highest to 0. This creates an experimental artifact.
What this does in affect is increase the relative standard deviations of the other two groups while suppressing the SD of the first group. Suppose the ANOVA null hypothesis fails, are you then going to repeat the test of say the smallest and largest by still normalizing the largest. If you do a Student's T Test and assume unequal variation, then the variance of the highest is zero, you will almost by definition show that there is a significant difference.

Fisher test (FTEST) for variation has some utility here as long as your sample size is sufficient, before you start you want to know whether the level of unnormalized variation in the groups is similar or dissimilar. If the level of group variation is similar, then in subsequent T-test you also have to carry the similar argument forward.

Why extracts vary is not simply a function of biological variation. During the extraction process there is variation of the amount of tissue and the relative amount of solute used to extract the tissue, in addition there is loss due to apparatus (such as homogenizers), etc. There should be some non-enzymatic standard for normalization, such as the Bradford assay or proteins. Even better look at a composite score such as observed over average protein + observed over expected DNA.




-----Original Message-----
From: methods...@oat.bio.indiana.edu [mailto:methods...@oat.bio.indiana.edu] On Behalf Of Michael Sullivan
Sent: Thursday, August 08, 2013 10:21 AM
To: met...@magpie.bio.indiana.edu
Subject: Re: performing statistics on normalized data

HI Alec,

Thanks for your reply. The concern of the colleague I mentioned was similar to yours, that the normalization somehow violated one of the underlying assumptions for ANOVA. His specific concern was that it somehow violated the independent sampling assumption. I've gone over this topic in my statistic reference (The original edition of Myra Samuels's "Statistics for the Life Sciences") and don't see that the normalization would really violate this assumption. Conceptually, it seems like it is no different than another routine normalization... i.e., correcting for protein concentration of the extract. In a sense... activity measurement of the best substrate is a measure of the concentration of active enzyme. If I had an antibody recognizing the enzyme, I could measure the amount of enzyme in the extract that way, and use that for normalization. Conceptually, that approach doesn't seem as though it would violate the independent sampling assumption. Conceptually, my normalization !
> I've been analyzing enzyme activity in crude extracts using three different substrates. My replicates for this analysis are independently prepared extracts. There is sufficient variation in the activity between these extracts that although the trend of substrate preference is pretty obvious (the best substrate is always the best, the worst is always the worst, and the midling is always the midling), statistical analysis (ANOVA) of the specific activity data (pkat/mg protein) does not show significant differences. However, if I normalize the data derived from each extract to the substrate with the highest activity, the data show relatively little variation between the different extracts. Of course the normalization means that the best substrate relative value is 1.00 +/- 0.000. Is normalizing in this way problematic for doing ANOVA or other statistical tests? It seems like it shouldn't be, but a colleague did suggest he thought it could be problematic (although he wasn't ce!
Message has been deleted

Adrian Ardelean

unread,
Aug 9, 2013, 5:34:16 PM8/9/13
to methods@magpie. bio. indiana. edu, mlsu...@wisc.edu
Hi,

Even I need explanation like for noobs, Michael I need some help from you.
Well, can you exlain who are the analyte and how it is express. How it measurement units. What is the assay. What kind of assay qualitative/ quantitative?
It is optimised assay? What it is the uncertainity for your ANALYTE?
At this point I think that you have some answers and soon you will find the rest.
You must validate the assay first. So, you will know the LOD, LOQ, recovery etc..
I am sure that you had olready done. So,  can you tell something about their values?

With all the best from Transylvania,

Adrian

Ardelean Adrian, DVM
Cluj-Napoca, Romania

PS please excuse my english and typing, I jjust want to be helpful. :-)

Jonathan Rupp

unread,
Aug 9, 2013, 9:09:01 PM8/9/13
to Deitiker, Philip R, Met...@magpie.bio.indiana.edu
I've had this same question. Next time I will remember non-enzymatic
standards for normalizing, but the samples are long gone. What about
normalizing to the total signal for each extract, so that each substrate is
a percentage of the total for that day? e.g. 60% 30% and 10% It's still
introducing a bias; each substrate's variation is now proportional to the
total variation, but it's probably closer to reality than normalizing to
one of the three.

Michael Sullivan

unread,
Aug 11, 2013, 12:22:17 PM8/11/13
to Jonathan Rupp, Met...@magpie.bio.indiana.edu
Your suggestion is intriguing, but my intuition tells me that that approach would have all the same problems as my original approach. The only thing that it would allow to be done would be to calculate an error for the best substrate, I think.

My "non-normaloized" data (i.e. the data that is not normalized to the best substrate value) is actually already normalized to protein content (e.g. it is expressed as pkatal/mg protein). In theory, this should correct for extraction efficiency, etc. However, as DK rightly pointed out, there are any number of things that could lead to the amount of ACTIVE enzyme differing from extraction to extraction, even if the tissue source is the same and even taking into account protein content of extracts. E.g., if the enzyme is especially labile, for example.

I think in biochemistry, the approach I've taken (I.e. setting some particular condition to "1" and normalizing to that) is used a lot. And, looking at my data, the normalization produces results that are incredibly consistent across the replicates . As somebody else suggested, I have looked at some non-parametric approaches, but my data set appears to be to small to utilize those approaches. Given how consistent my data is with the normalization approach I've used, it does seem like the measurements are reflecting reality, and it would seem silly to run additional replicates (the assay is not trivial to run) just to have a higher "n" for analysis. I rather suspect that if I ran two additional replicates, n would be big enough I wouldn't need to normalize anyway to see significant results. But, it seems silly that there doesn't seem to be a statistical approach that could be used to analyze the type of data I've generated. Seriously, the differences as revealed by the normalized data are so obviously significantly different! Check out the attached bar graphes! You can see the trend in the Raw data, but the trend is so consistent within a given extract that there's almost no variation when viewed as relative to the best activity.

Anyway, if I happen to find a way to analyze data like this, I'll share it with the group!

Mike

Pow Joshi

unread,
Aug 11, 2013, 12:54:04 PM8/11/13
to Jonathan Rupp, Met...@magpie.bio.indiana.edu
Forgive me for this rather dumb question, but isn't comparing specific
enzyme activities indicate that you have already normalized the enzymatic
activity to the total protein content of your extract preparation, which
means you would have had to perform a Bradford or Lowry protein estimates
that are independent measures for normalizations?

....
Pow

Michael L. Sullivan

unread,
Aug 11, 2013, 4:58:19 PM8/11/13
to Pow Joshi, Met...@magpie.bio.indiana.edu
Yes, this is correct, at least for what I had described, except we are not talking about purified protein, we are talking about activity from a crude extract. And, as you might have read in the post by DK and I have also mentioned, there could be many ways in the course of extraction that the amount of active enzyme recovered might vary, even from the same/similar tissue samples (for example, if the enzyme is labile, perhaps it warms up a little more in one extraction than another, or perhaps the frozen tissue doesn't get mixed into the buffer quite fast enough in one case). Thus, it is not unexpected that the amount of activity recovered, even normalized to protein content, might not be the same between different crude tissue extract preparations.

In my experiment, I want to see how well the enzyme of interest uses different substrates, so to me it makes the most sense to normalize to one particular substrate. That way, it doesn't matter whether there is variation in the amount of active enzyme in different extract preparations. And, in fact, in my experiments, how well the various tested substrates are utilized is VERY consistent when I normalize this way… eg. if best substrate is 100%, second substrate is always around 65%, poorest substrate is always around 35%.

Mike

Dr Engelbert Buxbaum

unread,
Aug 12, 2013, 9:44:46 AM8/12/13
to
In article <mailman.588.137591...@net.bio.net>,
mlsu...@wisc.edu says...
>
> I've been analyzing enzyme activity in crude extracts using three
> different substrates. My replicates for this analysis are
> independently prepared extracts. There is sufficient variation in the
> activity between these extracts that although the trend of substrate
> preference is pretty obvious (the best substrate is always the best,
> the worst is always the worst, and the midling is always the midling),
> statistical analysis (ANOVA) of the specific activity data (pkat/mg
protein) does not show significant differences. However, if I normalize
the data derived from each extract to the substrate with the highest
activity, the data show relatively little variation between the
different extracts. Of course the normalization means that the best
substrate relative value is 1.00 +/- 0.000. Is normalizing in this way
problematic for doing ANOVA or other statistical tests? It seems like it
shouldn't be, but a colleague did suggest he thought it could be
problematic (although he wasn't certain it was truly a problem either).
>
> Thanks for any insight any of you can provide

Your problem is that the enzyme concentration in a crude extract (with
respect to total protein) can vary quite significantly from batch to
batch. The enzymatic activity under optimal conditions is a proxy for
the actual concentration (that is, enzyme concentration times turnover
number).

So you need to find a way to ensure that the actual amount of enzyme in
each assay is the same, not just the amount of protein. If you simply
use enzymatic activity under optimal conditions, measured once, as 100%,
then that data point has no variation for all batches. However, you
could measure it repeatedly for each batch, and set the average as 100%
for that batch. The repeat measurements would give you the standard
deviation. So, say, for a given batch, you measure 9, 10 and 11 pkat/mg,
than the average is 10 +/- 1 pkat/mg, and the activity 100 +/- 10% under
optimal conditions. For another batch, the numbers may be 14, 15 and 16
pkat/mg, giving you and average of 15 +/- 1 pkat/mg or 100 +/- 7% for
that batch. Then you can reference your measurements under other
conditions to these values.

Alternatively, you could find a substance that binds specifically to
your enzyme (e.g., an inhibitor) and measure the binding capacity in all
batches. Then calculate turnover numbers (molecules of substrate handled
by each molecule of enzyme per second) under all conditions. No
normalization between batches would be required as you would express
your measurement in fundamental units.

Adrian Ardelean

unread,
Aug 12, 2013, 12:07:47 PM8/12/13
to pow....@gmail.com, mlsu...@wisc.edu, methods@magpie. bio. indiana. edu
Hi,

Can you use certified references material for enzyme and for each kind of SUBSTRATE?
Can you make a calibration curve for each SUBSTRATE?
After that, you can make fortified samples for each substrate at the level that you will expect to enzyme act.
Than you can run fortified samples pears with test sample with omonime substrate.

Please excuse my comment if it is not applicable. I have none experience with enzymes test.

Regards,

Adrian

Ardelean Adrian, DVM
Cluj-Napoca, Romania

Trimis din Yahoo! Mail pe Android

>>>> I've been analyzing enzyme activity in crude extracts using three
>>> different substrates. My replicates for this analysis are independently
>>> prepared extracts. There is sufficient variation in the activity between
>>> these extracts that although the trend of substrate preference is pretty
>>> obvious (the best substrate is always the best, the worst is always the
>>> worst, and the midling is always the midling), statistical analysis
>> (ANOVA)
>>> of the specific activity data (pkat/mg protein) does not show significant
>>> differences. However, if I normalize the data derived from each extract
>> to
>>> the substrate with the highest activity, the data show relatively little
>>> variation between the different extracts. Of course the normalization
>> means
>>> that the best substrate relative value is 1.00 +/- 0.000. Is normalizing
>> in
>>> this way problematic for doing ANOVA or other statistical tests? It seems
>>> like it shouldn't be, but a colleague did suggest he thought it could be

>>> problematic (although he wasn't ce!
>>> rt!


>>>> ain it was truly a problem either).
>>>>
>>>> Thanks for any insight any of you can provide

Pow Joshi

unread,
Aug 12, 2013, 12:45:55 PM8/12/13
to Michael L. Sullivan, Met...@magpie.bio.indiana.edu
Then I am wondering if you would need a reference substrate as someone else
pointed out as well. Because, normalizing to the best substrate makes it a
reference substrate and therefore your comparisons will be only between the
remaining 2 substrates.
I am no statistician and my biochemistry is also rather old, however, it
would seem to me that the only statistical way you will be able to deal
with the number of variables within the system is to increase your N of
crude extracts to approach a Gaussian distribution. BTW, I didn't receive
the data file you had sent. I'd be curious to see your raw data if you
won't mind sending it again.

Deitiker, Philip R

unread,
Aug 12, 2013, 4:30:24 PM8/12/13
to Met...@magpie.bio.indiana.edu
That would be better but the most deviant value in the group would have would create partial dependencies. The reason for using a total protein created constant is that one assumes that any given enzyme is a small fraction of the total.

From: Jonathan Rupp [mailto:jonath...@gmail.com]
Sent: Friday, August 09, 2013 8:09 PM
To: Deitiker, Philip R
Cc: Met...@net.bio.net
Subject: Re: performing statistics on normalized data

I've had this same question. Next time I will remember non-enzymatic standards for normalizing, but the samples are long gone. What about normalizing to the total signal for each extract, so that each substrate is a percentage of the total for that day? e.g. 60% 30% and 10% It's still introducing a bias; each substrate's variation is now proportional to the total variation, but it's probably closer to reality than normalizing to one of the three.

On Thu, Aug 8, 2013 at 2:04 PM, Deitiker, Philip R <pde...@bcm.edu<mailto:pde...@bcm.edu>> wrote:
Since he is normalizing based on the highest value then at every point of measurement the highest substrate always has the value of 1, thus he is fixing the average but more importantly the standard deviation of the highest to 0. This creates an experimental artifact.
What this does in affect is increase the relative standard deviations of the other two groups while suppressing the SD of the first group. Suppose the ANOVA null hypothesis fails, are you then going to repeat the test of say the smallest and largest by still normalizing the largest. If you do a Student's T Test and assume unequal variation, then the variance of the highest is zero, you will almost by definition show that there is a significant difference.

Fisher test (FTEST) for variation has some utility here as long as your sample size is sufficient, before you start you want to know whether the level of unnormalized variation in the groups is similar or dissimilar. If the level of group variation is similar, then in subsequent T-test you also have to carry the similar argument forward.

Why extracts vary is not simply a function of biological variation. During the extraction process there is variation of the amount of tissue and the relative amount of solute used to extract the tissue, in addition there is loss due to apparatus (such as homogenizers), etc. There should be some non-enzymatic standard for normalization, such as the Bradford assay or proteins. Even better look at a composite score such as observed over average protein + observed over expected DNA.




-----Original Message-----
From: methods...@oat.bio.indiana.edu<mailto:methods...@oat.bio.indiana.edu> [mailto:methods...@oat.bio.indiana.edu<mailto:methods...@oat.bio.indiana.edu>] On Behalf Of Michael Sullivan
Sent: Thursday, August 08, 2013 10:21 AM
To: met...@magpie.bio.indiana.edu<mailto:met...@magpie.bio.indiana.edu>
Subject: Re: performing statistics on normalized data

HI Alec,

Thanks for your reply. The concern of the colleague I mentioned was similar to yours, that the normalization somehow violated one of the underlying assumptions for ANOVA. His specific concern was that it somehow violated the independent sampling assumption. I've gone over this topic in my statistic reference (The original edition of Myra Samuels's "Statistics for the Life Sciences") and don't see that the normalization would really violate this assumption. Conceptually, it seems like it is no different than another routine normalization... i.e., correcting for protein concentration of the extract. In a sense... activity measurement of the best substrate is a measure of the concentration of active enzyme. If I had an antibody recognizing the enzyme, I could measure the amount of enzyme in the extract that way, and use that for normalization. Conceptually, that approach doesn't seem as though it would violate the independent sampling assumption. Conceptually, my normalization !
seems like the same thing. That SEM for that sample goes to 0 seems like a red flag, but perhaps it isn't!

Mike
---
Michael L. Sullivan, PhD
Research Molecular Geneticist
US Dairy Forage Research Center
1925 Linden Drive
Madison, WI 53706
608-890-0046<tel:608-890-0046> (Phone)
608-890-0076<tel:608-890-0076> (FAX)
> 608-890-0046<tel:608-890-0046> (Phone)
> 608-890-0076<tel:608-890-0076> (FAX)
>
>
> _______________________________________________
> Methods mailing list
> Met...@net.bio.net<mailto:Met...@net.bio.net>
> http://www.bio.net/biomail/listinfo/methods
>
>


_______________________________________________
Methods mailing list
Met...@net.bio.net<mailto:Met...@net.bio.net>
http://www.bio.net/biomail/listinfo/methods

_______________________________________________
Methods mailing list
Met...@net.bio.net<mailto:Met...@net.bio.net>
http://www.bio.net/biomail/listinfo/methods

0 new messages