Statistics (mean and standard deviation) on logarithmic data

Paul

unread,

Sep 20, 2010, 11:59:00 AM9/20/10

to

Hello everyone,

I'm trying to do some statistics on data in MATLAB, but this is more of a fundamental mathematics question than anything else. I have a bunch of power measurements (from antennas) in dB. I'd like to get the average and standard deviation of this data between several patterns.

The purpose is to characterize the back lobes from many measurements and account for 96% of cases (2 standard deviations above the mean). So the idea is to compute the mean, add two standard deviations, get a result in dB, and use that in my subsequent calculations.

It is my understanding that I cannot directly use the mean() function as the decibel values need to be converted to base 10 first. So, I would convert to base 10, take the mean, then convert the linear mean to a dB value.

What about the std() function though? If I convert the dB values to linear, compute the standard deviation, then bring it back to dB, the answer is outright wrong. It boggles my mind as to why I can't do this. I doubt I can directly apply std() to the dB data either.

Do I need to implement the standard deviation function manually, using x and x_bar in dB? Is there a matlab function for this?

For what it's worth, the data is Gaussian in log scale (10*log10) when checked with the normplot() function.

I'd appreciate if anyone who could shed some light on this issue. Googling for several hours has not yielded any concrete results.

Thanks in advance!

Rich Ulrich

unread,

Sep 20, 2010, 2:43:28 PM9/20/10

to

I'm numbering the paragraphs so that I can refer to them
out of the written order. And reformatting the ones I will
refer to.

On Mon, 20 Sep 2010 11:59:00 EDT, Paul <darkenr...@gmail.com>
wrote:

(1)

>I'm trying to do some statistics on data in MATLAB, but this is more of a fundamental mathematics question than anything else. I have a bunch of power measurements (from antennas) in dB. I'd like to get the average and standard deviation of this data between several patterns.

(2)* see below

> The purpose is to characterize the back lobes from many measurements
> and account for 96% of cases (2 standard deviations above the mean).
> So the idea is to compute the mean, add two standard deviations, get a
> result in dB, and use that in my subsequent calculations.

(3)*

> It is my understanding that I cannot directly use the mean()
> function as the decibel values need to be converted to base 10 first.
> So, I would convert to base 10, take the mean, then convert the linear
> mean to a dB value.

(4)

>What about the std() function though? If I convert the dB values to linear, compute the standard deviation, then bring it back to dB, the answer is outright wrong. It boggles my mind as to why I can't do this. I doubt I can directly apply std() to the dB data either.

(5)

>Do I need to implement the standard deviation function manually, using x and x_bar in dB? Is there a matlab function for this?

(6)*

> For what it's worth, the data is Gaussian in log scale (10*log10)
> when checked with the normplot() function.
>
>I'd appreciate if anyone who could shed some light on this issue. Googling for several hours has not yielded any concrete results.

I have to wonder, What does (6) denote?
Which data are "Gaussian in log scale"?
What do you have to do to your dB measures to see
numbers that are Gaussian (Normal)?

The basic data are dB, which is already the log of power.

But the assertion seems to me as if you *may* be saying that
when you take log(dB-measures), you now have a plot that
appears fairly normal.

What is it that looks normal? dB? log(dB)? I'm nearly sure
that it will not be the "linear power" represented by dB.
Whatever looks Normal, *that* is what you need to find
means and SDs for, if you are trying to place a confidence
interval or limit based on the (strong) assumption of Normality.
- If you have many hundreds of cases of data, you might
consider constructing limits based on the rank-order of outcomes,
and avoid the whole question of the underlying metric.

According to (2), you do want to place a limit. Plus/ minus
2 SD includes about 95% of the data when the data are
close to normal. Two SD above the mean marks off the
extreme 1.7% of the data; so I wonder where you get
"96%" while pointing to one end alone.

Your (3) says that you think you cannot take the mean of
the dB measures. Well, the mean of the dB measures will
give you the "geometric mean" of the raw power. However,
I do not see any reason at all to talk about the raw power.
Is there a reason? or is there just a random piece of odd
advice? - which seems (to me, so far) to be based on
considerations that are outside of what you have described
as the problem.

- By the way, when you say "converted to base 10", using it
in this context confuses me. Conversion "to base 10" is a term
I might use when given numbers that are in "base e", and I
want to use or ignore the constant of conversion. However,
that conversion seems to be totally irrelevant here.
The obvious, available "conversion" that would change things
is to convert dB to raw power -- by exponentiating the dB.
And that will give you, I assume, a strongly skewed distribution
for which the mean and SD will *not* provide the same tail
areas that rely on Normal underlying distributions.
In sum, I do assume that you are talking about converting to
raw power, which seems to be a bad idea.

I think I have replied implicitly to (4). I am not a Matlab user
so I can't speak to whatever the package offers as routines.

--
Rich Ulrich

Paul

unread,

Sep 20, 2010, 2:56:48 PM9/20/10

to

You might have a look at the geometric mean (http://en.wikipedia.org/
wiki/Geometric_mean) and geometric standard deviation (http://
en.wikipedia.org/wiki/Geometric_standard_deviation). I don't know
about MATLAB, so I can't say whether there's a built-in function, but
the formula is straightforward.

/Paul

Gantz

unread,

Sep 22, 2010, 3:20:25 PM9/22/10

to

On Sep 20, 11:43 am, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> I'm numbering the paragraphs so that I can refer to them
> out of the written order. And reformatting the ones I will
> refer to.
>

> On Mon, 20 Sep 2010 11:59:00 EDT, Paul <darkenreape...@gmail.com>

> Rich Ulrich- Hide quoted text -
>
> - Show quoted text -

Thank you for the response. This cleared up the issue. Briefly, the
dB power measurements are linear when plotted in a log-log plot of
probability [dB] vs. normalized gain [dB]. As a result, I'm
performing gaussian statistics direclty on the dB values, and this
should be valid.

Rich Ulrich

unread,

Sep 23, 2010, 2:34:04 PM9/23/10

to

On Wed, 22 Sep 2010 12:20:25 -0700 (PDT), Gantz
<paulne...@gmail.com> wrote:

>On Sep 20, 11:43 am, Rich Ulrich <rich.ulr...@comcast.net> wrote:
>> I'm numbering the paragraphs so that I can refer to them
>> out of the written order. And reformatting the ones I will
>> refer to.
>>
>> On Mon, 20 Sep 2010 11:59:00 EDT, Paul <darkenreape...@gmail.com>
>> wrote:

[snip, previous]

>
>Thank you for the response. This cleared up the issue. Briefly, the
>dB power measurements are linear when plotted in a log-log plot of

I'm pleased if the issue is all cleared up. I do have
to remark on your terminology, which I don't follow.

Do you really intend to say, "log-log" plot?

I would want to see "linear" in a "linear" plot of dB, or of the
function of dB. What a straight line on a log-log plot tells
you is that there is when you take the logs of the variables,
you get a straight line.

I would not expect you to have to take the log of
the dB measures, which are already the "log of power".
But that is how I read the description.

>probability [dB] vs. normalized gain [dB]. As a result, I'm

If probabilities P are all small, using the log(P) is not an
unreasonable scale for many purposes. However, since
P is bounded by 0 and 1 (and not open-ended) the logistic
is the more general, acceptable model...

I don't remember ever seeing probabilities measured
in dB. Is this a convention in your area? A quick
Google search did show me dB used for describing
log-likelihood or log-odds. That is unusual to me, but
it is not unreasonable.

>performing gaussian statistics direclty on the dB values, and this
>should be valid.

--
Rich Ulrich