Trying to build array formula to reverse elements in a column

37 views
Skip to first unread message

Tushar Mehta

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
Impressed by the array formulae posted by various people, I decided to build my
own for a relatively simple problem by using the OFFSET function. All the
components seem to work but the final result is #VALUE! :(

Rummaging through deja.com I did find a post by Alan Beban that used the INDEX
function. Using that I can make things work. But why doesn't the OFFSET
function work?

To reverse the elements in a column range B3:B6, I created a name aRng that
refers to =Sheet1!$B$3:$B$6

In cells G3 through G6 I array-entered

=OFFSET(aRng,ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng))),0,1,1)

If I evaluate the individual components I get the right results. For example,
ROW(INDIRECT("$1:$"&ROWS(aRng)) is {1;2;3;4}
and
ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng))) is {3;2;1;0}

However, the end result is that all 4 cells contain #VALUE!

I tried transposing the {3;2;1;0} array so that I got {3,2,1,0} but the error
remains the same.

If I enter it into a single cell (i.e., not as an array-formula) I *do* get the
value in the last cell, B6.

H e l p !!!!

The formula that does work is
=INDEX(aRng,ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng)))+1,1)
and it is array-entered in a 4 row x 1 column range.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--

Bernie Deitrick

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
Tushar,

Some worksheet functions just aren't capable of returning arrays by design, and
apparently OFFSET is one of those. I don't think there is any place with a listing
of what can or can't return arrays: trial and error is the only way AFAIK.

HTH,
Bernie

Laurent Longre

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
Tushar,

Try this :

{=N(OFFSET(A1:C3,{2;1;0},{2,1,0}))}

... array entered in a 3R * 3C range.

(it assumes that A1:C3 contains only numeric values).

Why the N() function? Because OFFSET doesn't return values but range
references, like INDEX or INDIRECT. So, when the 2nd and 3rd parameters
are vectors, OFFSET doesn't return an array of values, but an array of
references. And Excel can't show directly the contents of this
(undocumented) data type. It would show #VALUE!

You have thus to dereference the array, either through N() or T(), in
order to coerce this array of references into an array of values.

Cordialement (et Joyeux Noël),

Laurent
---
Excel MVP

Tushar Mehta a écrit:

Chip Pearson

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
Tushar,

Here's one from the "Working With Lists" page on my web site,
http://www.cpearson.com/excel/lists.htm.

=OFFSET($A$3:$A$7,MAX(ROW($B$3:$B$7))-ROW(),0)

where A3:A7 is the data you want to reverse, and B3:B7 is the range
containing the formulas. Since this is an array formula, you must press
Ctrl+Shift+Enter rather than just Enter when you first enter the formula,
and whenever you edit it later.

Enter the formula in B3, and then fill down to B7.

Actually, the formula could be shortened to
=OFFSET($A$3,MAX(ROW($B$7))-ROW(),0)

where A3 is the first cell of the data, and B7 is the last cell of the
reversed list.

--
Cordially,
Chip Pearson
Pearson Software Consulting Services
Microsoft MVP - Excel
www.cpearson.com
ch...@cpearson.com


"Tushar Mehta" <ng_p...@bigfoot.com> wrote in message
news:MPG.12c42f12c...@msnews.microsoft.com...

Chip Pearson

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
And it can be further shortened to

=OFFSET($A$3,ROW($B$7)-ROW(),0)

No need for array-entry, either.

--
Cordially,
Chip Pearson
Pearson Software Consulting Services
Microsoft MVP - Excel
www.cpearson.com
ch...@cpearson.com


"Chip Pearson" <ch...@cpearson.com> wrote in message
news:emaqikLS$GA.78@cppssbbsa05...

Tushar Mehta

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
[This followup was posted to microsoft.public.excel.worksheet.functions and a
copy was sent to Bernie Deitrick <dei...@consumer.org>.]

I was about to send off an acknowledgement for the very quick response when I
saw Laurent [Longre]'s explanation of an undocumented data type and the use of
N() to dereference the result of OFFSET.

Since there is no single repository of information on how various functions
behave when used in array formulas, I will start one over the weekend.
The starting page will be tushar-mehta.com/excel/ and the founding
members will be INDEX, OFFSET, and INDIRECT. If it turns out that there
already is one, I will be more than happy to *not* duplicate the
information. I know of Chip [Pearson]'s site and will add a link to his
page but won't duplicate the information on my site.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--
In <385A6A33...@consumer.org>, Bernie Deitrick <dei...@consumer.org>
wrote


> Tushar,
>
> Some worksheet functions just aren't capable of returning arrays by design, and
> apparently OFFSET is one of those. I don't think there is any place with a listing
> of what can or can't return arrays: trial and error is the only way AFAIK.
>
> HTH,
> Bernie
>
> Tushar Mehta wrote:
>

Tushar Mehta

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
[This followup was posted to microsoft.public.excel.worksheet.functions and a
copy was sent to Laurent Longre <laurent...@free.fr>.]

Thanks for the speedy response.

If I understand this correctly, INDEX returns values while OFFSET returns a
reference. That is why I have to coerce the results of OFFSET as in
=N(OFFSET(A1:C3,{2;1;0},{2,1,0})) but can use the results of INDEX directly as
in =INDEX(A1:C3,{3;2;1},{3,2,1}).

INDIRECT seems to be similar to OFFSET in that I have to dereference the result
with N().

This is with Excel 2000.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--
In <385A6DFA...@free.fr>, Laurent Longre <laurent...@free.fr> wrote


> Tushar,
>
> Try this :
>
> {=N(OFFSET(A1:C3,{2;1;0},{2,1,0}))}
>
> ... array entered in a 3R * 3C range.
>
> (it assumes that A1:C3 contains only numeric values).
>
> Why the N() function? Because OFFSET doesn't return values but range
> references, like INDEX or INDIRECT. So, when the 2nd and 3rd parameters
> are vectors, OFFSET doesn't return an array of values, but an array of
> references. And Excel can't show directly the contents of this
> (undocumented) data type. It would show #VALUE!
>
> You have thus to dereference the array, either through N() or T(), in
> order to coerce this array of references into an array of values.
>
> Cordialement (et Joyeux Noël),
>
> Laurent
> ---
> Excel MVP
>
> Tushar Mehta a écrit:
> >

Laurent Longre

unread,
Dec 17, 1999, 3:00:00 AM12/17/99
to
Tushar Mehta a écrit :

>
> If I understand this correctly, INDEX returns values while OFFSET returns a
> reference. That is why I have to coerce the results of OFFSET as in
> =N(OFFSET(A1:C3,{2;1;0},{2,1,0})) but can use the results of INDEX directly as
> in =INDEX(A1:C3,{3;2;1},{3,2,1}).

Not exactly. Although it might sound strange, I think that INDEX can't
return any true array, even in an array formula. It returns only single
values. IMO, if it is used in a range as an array formula, it returns a
set of values which are evaluated separately by the calculation engine
and then fill separately the cells of the selected range, not a true
two-dimensional array.

{=ROWS(INDEX(A1:A3,{1;2;3}))} returns always 1 (=> single value) whereas
{=ROWS(OFFSET(A1:A3,{0;1;2},0)) returns 3 (=> array of 3 values).

Laurent

Tushar Mehta

unread,
Dec 19, 1999, 3:00:00 AM12/19/99
to
Thanks for the tips in my ongoing education on array formulas. The
three examples from Chip [Pearson] lead to the (inevitable?) query.
What's the performance characteristics on using array formulas?

The reason I am posting this rather than testing the performance myself
is that I am not sure how one would do that in the first place!
Typically, I would write some code to start a clock, do the
calculations, and stop the clock. But, I don't know how to do that with
Excel's calculation engine.

I can think of three issues related to the performance of array
formulas:

The first refers to the three variants that Chip provided, including one
that is a non-array formula.

The second refers to Laurent [Longre]'s response, again in this thread,
about the difference between INDEX and OFFSET. If I understand that
correctly, the Excel calculation engine processes the INDEX function
multiple times to get all the values while the OFFSET function returns a
true array, albiet of a strange and undocumented data type.

The third refers to Harlan [Grove]'s generalized suggestions in the
thread 'Help: MMULT(array1,array2) - Need to reverse Array2 to get
correct product' The solutions, while very general, seems to be the
most processor-intensive. They were definitely the most grey-matter-
intensive <g>

Please don't get the wrong idea about why I am asking these questions.
By and large, I am a strong proponent of using the solution that is the
most people-efficient, especially given the rate of improvement in the
price-performance ratio of computers. At the same time, I remain
acutely aware that, in many systems, there will exist critical code that
must be performance-efficient.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--
In <OTTyyoLS$GA.294@cppssbbsa05>, Chip Pearson <ch...@cpearson.com> wrote

Chip Pearson

unread,
Dec 19, 1999, 3:00:00 AM12/19/99
to
Hi Tushar,

I'd like to know more about the internals of the calculation process in
Excel, but haven't found anything from MS on the topic. I suppose a rather
crude way to time the execution of a particular function would be something
like the following code:

'---------------------------------------------------------
Public Declare Function GetTickCount Lib "kernel32" () As Long

Sub RunCalcs()

Dim Ndx As Long
Dim StartTicks As Long
Dim EndTicks As Long

Const NumCalcs = 10000

Application.Calculation = xlCalculationManual
Application.DisplayAlerts = False
Application.EnableEvents = False
Application.ScreenUpdating = False

StartTicks = GetTickCount()
For Ndx = 1 To NumCalcs
Application.CalculateFull
Next Ndx
EndTicks = GetTickCount()

Application.Calculation = xlCalculationAutomatic
Application.DisplayAlerts = True
Application.EnableEvents = True
Application.ScreenUpdating = True

Range("F1").Value = StartTicks
Range("F2").Value = EndTicks
Range("F3").Value = EndTicks - StartTicks
Range("F4").Value = (EndTicks - StartTicks) / NumCalcs

End Sub
'---------------------------------------------------------

Just make sure that the *only* thing on the worksheet is the formula(s) to
test, and the data required by them. While this is a rather crude way of
doing things, it may be useful for generalized comparisons. The problems
with the timing are probably going to be constant between formulas. For
example, the test result for formula #1 may be way off from the "actual"
value, but the test result for formula #2 should (?? and I'll leave it to
math/stats guys like you and Dave B to tell me if I'm wrong here) be just as
far off, so comparing the two results should give you some measure of the
relative performance of the two formulas.

I use this approach quite frequently, and the results seems to agree with
what I intuitively think that they should be (although once in a while I'm
quite surprised).

For example, I used the formula I originally posted (but extended for 100
rows)

=OFFSET($A$3:$A$102,MAX(ROW($B$3:$B$102))-ROW(),0)

and recalculated the sheet 10,000 times. This took about 70 seconds, or
about 7 sheet recalculations per second. Then, I used my revised, non-array
formula (again, extended to 100 rows)

=OFFSET($A$3,ROW($B$102)-ROW(),0)

and recalculated the sheet 10,000 times. This took about 7.5 seconds, or
about 75 sheet recalculations per second. Which tells me that for these two
formulas, for 100 rows, the second one is about 10 times faster than the
first. Sounds roughly reasonable to me, although I would have guess the
second would have been even more efficient than the first.

Of course there are better ways than GetTickCount to time a process, because
activities by the operating and other processes will affect the results.
GetProcessTime (NT only, I think) will give you the actual time used by a
process, in increments of 100 nano-seconds (!!!!). I used code like the
procedure shown above because it is simple, and I don't really care about
the absolute amount of time it takes a formula to run. I'm more interested
in comparing the times between two different formulas.

As you can well imaging, the calculation times of area formulas are not
(generally) linear with the number of rows in the arrays. Doubling the
number of rows is going to much more than double amount time it takes to
calculate the formula. I haven't done any real study of time, only some
playing around, but the results I've seen tend to agree with what
intuitively I thought they should be. (Of course, that's rather circular,
isn't it. My intuition is based on what I've seen Excel do, so of course
I'm really just remembering past performance, not really intuiting anything!
But you know what I mean.)

I'd be interested if anyone has done a more rigorous analysis of calculation
times, or has a good solid tool to do that (Stephen? Rob? Laurent?) . I
wish that MS would release some much more technical information about Excel.
Not "how-to" information, but "why" and "how it works" information.


--
Cordially,
Chip Pearson
Pearson Software Consulting Services
Microsoft MVP - Excel
www.cpearson.com ch...@cpearson.com

"Tushar Mehta" <ng_p...@bigfoot.com> wrote in message

news:MPG.12c6d1de9...@msnews.microsoft.com...

Laurent Longre

unread,
Dec 19, 1999, 3:00:00 AM12/19/99
to
Tushar,

> The second refers to Laurent [Longre]'s response, again in this thread,
> about the difference between INDEX and OFFSET. If I understand that
> correctly, the Excel calculation engine processes the INDEX function
> multiple times to get all the values while the OFFSET function returns a
> true array, albiet of a strange and undocumented data type.

Yes exactly.

More in my reply to Chip's post.

Laurent

Laurent Longre

unread,
Dec 19, 1999, 3:00:00 AM12/19/99
to
Chip Pearson a écrit :

>
> Sounds roughly reasonable to me, although I would have guess the
> second would have been even more efficient than the first.

IMO, it is normal. A "multi-cells array formula" is IMO logically slower
than the equivalent set of "single cell non-array formulae":

1) If you change a single cell of the range used by the array formula,
the whole array formula will be recalculated.

For instance, say that you have entered {=MYARRAYFORMULA(A1:A5000)} in
B1:B5000. If you enter a new value in any cell of A1:A5000 (say, in
A350), the *whole* array formula will be automatically recalculated.
Exactly as if you had changed not only the value of A350, but of the
5000 cells in A1:A5000.

On the contrary, say that you have entered =MYNONARRAYFORMULA(A1) in B1
and filled this "normal" formula in B2:B5000. If you change then the
value of A350, it will recalculate only the corresponding formula in
B350. The recalculation will be faster.

2) The array formula will need memory allocation to store the
intermediate calculated arrays and the resulting arrays. It has also to
perform a lot of copies (copy the intermediate and final results in the
allocated arrays). This will also spend time. On the contrary, single
cell non-array formulae won't need any special memory allocation and
data copies.

AFAIC, I have tested and compared the calculation speed of 1)
single-cell formulae, 2) the built-in DBCOUNT function and 3)
array-formulae like {=SUM((Range1<X)*(Range2<Y))}, all of them producing
the same results. I have presented the calculation speeds at
http://longre.free.fr/pages/form/form_mat.htm#Performances (see the
table).It shows that 1) is calculated faster than 2), which is
calculated MUCH faster than 3).

Anyway, although I think that in most cases array-formulae can't speed
up the calculations (except of course when you use them in combination
with *true* array functions like MMULT or MINVERSE), they provide:

- a better readability (for instance, 1 single array formula can replace
3000 single-cell formulae)

- and also a smaller file size.

> As you can well imaging, the calculation times of area formulas are not
> (generally) linear with the number of rows in the arrays. Doubling the
> number of rows is going to much more than double amount time it takes to
> calculate the formula. I haven't done any real study of time, only some
> playing around, but the results I've seen tend to agree with what
> intuitively I thought they should be.

I have also noted often the same. But, as you say, it's rather difficult
to understand this unless MS release more technical information about
the calculation engine.

Cordially,

Laurent

Dana DeLouis

unread,
Dec 20, 1999, 3:00:00 AM12/20/99
to
Thank you everyone, especially Laurent, for this excellent discussion!
I have some changes to make :-)
If anyone is interested...

XL: Some Worksheet Functions Don't Allow Array Constants
http://support.microsoft.com/support/kb/articles/Q124/2/16.asp

Bernie Deitrick <dei...@consumer.org> wrote in message
news:385A6A33...@consumer.org...
> <snip> I don't think there is any place with a listing

Tushar Mehta

unread,
Dec 21, 1999, 3:00:00 AM12/21/99
to
I understand both the intuitive reasoning and the numerical analysis
provided in comparing array with non-array formulas. But, that, as it
turns out, is not the whole story.

Ever since I discovered array formulas back on Excel 4 (or was it 3 or
2?) I have used them for three purposes and one of them has been to
*improve* performance. That's right, to improve performance. [The
fourth reason are the various formulas that I've discovered through
these Excel NGs and why I was inspired to ask about performance.]

The first reason for using them is that the function actually returns
just one result very much like an non-array formula. However, the
result consists of multiple elements and consequently, one must provide
an array for the result. This would include functions such as MMULT,
LINEST, etc.

The second has been for documentation - along the lines that Laurent
[Longre] noted. Actually, it goes a little beyond readability since the
line between that and defensive programming can be somewhat blurry. One
example would be comparing monthly profits across 2 years. If the
monthly profits for each of the 2 years occupy 12 cells in columns A and
B respectively and the difference is in column C, an array formula
indicates that these 12 cells constitute a single entity that I am
analyzing.

The third is the scenario where a array formula speedens up
recalculation -- often by a magnitude! Here's why. Most, if not all,
functions can be thought of as containing three code segments. The
first validates the parameters provided, the second does whatever the
function is supposed to do, and the third massages the results into a
format appropriate for output. In the case where the first (or the
third) takes more time than the second, an array formula will speeden up
processing.

Generating random numbers is one example where the first, i.e.,
validation, can be much more expensive than actually generating the
result. Consider a custom random number function in which the
parameters define a piece-wise linear distribution. In this case, the
RV generator must perform extensive validation of the data. The values
must be in ascending order and the probability values must pass a set of
tests. After that happens, the actual generation is reasonably quick.
In some tests I ran earlier today, I calculated 100 numbers from a
custom CDF with 6 points. Then, I ran the test 100 times. The
resulting average time to generate the 100 numbers:

The non-array method took 0.158 seconds
The array formula took 0.007 seconds.

I hope to post a more detailed article on the process I used, including
some bizarre results that I got with Excel's (or is it VBA's) TIMER
function and additional results comparing the significance of the three
segments that I talked about above.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--
In <385D2404...@free.fr>, Laurent Longre <laurent...@free.fr>
wrote
> Chip Pearson a écrit :


> >
> > Sounds roughly reasonable to me, although I would have guess the
> > second would have been even more efficient than the first.
>

> > As you can well imaging, the calculation times of area formulas are not
> > (generally) linear with the number of rows in the arrays. Doubling the
> > number of rows is going to much more than double amount time it takes to
> > calculate the formula. I haven't done any real study of time, only some
> > playing around, but the results I've seen tend to agree with what
> > intuitively I thought they should be.
>

Reply all
Reply to author
Forward
0 new messages