Log-normal data: how to back transform results after analysis on log scale?

3,213 views
Skip to first unread message

mej...@icr.ac.uk

unread,
Aug 7, 2013, 7:08:07 AM8/7/13
to meds...@googlegroups.com

Greetings,

 

I have some assay results from women, taken at two points in time, and each assay measured in duplicate in the lab (i.e. I have 4 measurements per woman, 2 at one point in time and 2 at another point in time).  Using the repeated data I have estimated the Intraclass Correlation Coefficient (ICC).  I used Proc Mixed in SAS, and the ICC with 95% CI can be estimated as the AR(1) (autoregressive) term, and the actual variance components estimated from an alternative presentation of results using CS (compound symmetry).

 

The above analysis is straightforward and done on the log scale as the original assay data is skew.

 

I now want to back transform the variances to the original assay scale.

 

I understand the mathematical relation between means and variances on the log normal scale and the normal scale (i.e. to get back to the arithmetic mean on the original scale requires knowing both mean and variance from the transformed scale, and is not simply a case of “take the anti-log”).  (http://en.wikipedia.org/wiki/Log-normal_distribution ; http://mathworld.wolfram.com/LogNormalDistribution.html )

 

My problem is with back transforming the three variances I have (between women, within woman and residual assay variances), and what mean to use with each.

 

I’ve kind of convinced myself that I should use a mean of zero (on the transformed scale) for the assay residual and within woman variances, and something like the actual transformed scale mean with the between woman variance.  My reasoning is that the assay residual and within woman variances are basically variation around the overall mean, so each has zero mean on the transformed scale.

 

But when I back transform in this way I actually end up with a non-zero mean for the assay residual and within woman terms, since these have non-zero variance on the transformed scale.

 

So now I wonder if the mean I should use when back transforming the between woman variance should be adjusted slightly so that when I get a back transformed mean, this mean and the two I get from the assay residual and within woman variances add up to the actual observed untransformed mean.

 

I’d be grateful if anyone who has been in a similar situation could give me some advice, or if anyone has any ideas about this.

 

Thank you.

 

Michael.

BXC (Bendix Carstensen)

unread,
Aug 7, 2013, 7:45:04 AM8/7/13
to meds...@googlegroups.com

Hi Michael (who, where?)

 

You do not back-transform variances from analyses on the log-scale.

 

First you should always work on the SD-scale that is sqrt(var), because this has the  same units as your original measurements.

 

When you log-transform you lose the scaling of your original data as far as linear models is concerned, switching to a different scaling of your original data (from say inches to cm) just mean that you add a constant to all data points on the transformed data, so only your intercept in the model changes.

 

Moreover, the SDs derived from (natural) log transformed data are merely coefficients of variation; in general, by simple Taylor expansion (the delta method):

 

SD( log(X) ) \approx SD(X)/mean(X) = CV(X)

 

So your SDs from the log-transformed data are readily interpretable as coefficients of variation.

 

You could argue that in the case where the CV is large. the approximation does not hold.

But in that case you may not be interested in the CV after all, because implicitly you would think of the CV as a multiplier to use for construction of a confidence or prediction interval as X \pm 2*CV*X. But if you are  making a meaningful log-transform, you actually do have positive data, so if CV>0.5, this procedure will give you lower limits that are negative.

 

Instead it is better to use the SD based on an analysis of log-transformed data which will give you confidence/prediction interval from

X / exp(2*SD(log(X))) to X * exp(2*SD(log(X))). So in that sense you could back-transform your SDs to multipliers as exp(2*SD(log(X))).

Note that if SD(log(X)) is small, then exp(2*SD(log(X))) \approx 1 + 2*SD(log(X))

 

Hope this helps,

b.r.

Bendix Carstensen

Senior Statistician

Epidemiology

Steno Diabetes Center A/S

Niels Steensens Vej 2-4

DK-2820 Gentofte

Denmark

+45 44 43 87 38 (direct)

+45 30 75 87 38 (mobile)

b...@steno.dk    http://BendixCarstensen.com

www.steno.dk

 

 

 

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mej...@icr.ac.uk

unread,
Aug 7, 2013, 9:05:19 AM8/7/13
to meds...@googlegroups.com
Dear Bendix,
 
"So your SDs from the log-transformed data are readily interpretable as coefficients of variation"
 
Thank you, you are right.  I now see that I do not need to back transform the variances, but that what I really needed was something that is 'readily interpretable' on the original scale, and the CV will do this for me.
 
Thank you for the insightful reply.
 
Best wishes,
 
Michael
 
Michael Jones
Section of Genetics and Epidemiology
Institute of Cancer Research, London, UK
 

Frank Harrell

unread,
Aug 8, 2013, 12:26:28 PM8/8/13
to meds...@googlegroups.com
Check out the smearing estimator.  There is a fairly general function smearingEst in the R Hmisc package, though it doesn't know about random effects.

Frank
Reply all
Reply to author
Forward
0 new messages