37 views

Skip to first unread message

Dec 17, 1999, 3:00:00 AM12/17/99

to

Impressed by the array formulae posted by various people, I decided to build my

own for a relatively simple problem by using the OFFSET function. All the

components seem to work but the final result is #VALUE! :(

own for a relatively simple problem by using the OFFSET function. All the

components seem to work but the final result is #VALUE! :(

Rummaging through deja.com I did find a post by Alan Beban that used the INDEX

function. Using that I can make things work. But why doesn't the OFFSET

function work?

To reverse the elements in a column range B3:B6, I created a name aRng that

refers to =Sheet1!$B$3:$B$6

In cells G3 through G6 I array-entered

=OFFSET(aRng,ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng))),0,1,1)

If I evaluate the individual components I get the right results. For example,

ROW(INDIRECT("$1:$"&ROWS(aRng)) is {1;2;3;4}

and

ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng))) is {3;2;1;0}

However, the end result is that all 4 cells contain #VALUE!

I tried transposing the {3;2;1;0} array so that I got {3,2,1,0} but the error

remains the same.

If I enter it into a single cell (i.e., not as an array-formula) I *do* get the

value in the last cell, B6.

H e l p !!!!

The formula that does work is

=INDEX(aRng,ROWS(aRng)-ROW(INDIRECT("$1:$"&ROWS(aRng)))+1,1)

and it is array-entered in a 4 row x 1 column range.

--

Regards,

Tushar Mehta

www.tushar-mehta.com

--

Dec 17, 1999, 3:00:00 AM12/17/99

to

Tushar,

Some worksheet functions just aren't capable of returning arrays by design, and

apparently OFFSET is one of those. I don't think there is any place with a listing

of what can or can't return arrays: trial and error is the only way AFAIK.

HTH,

Bernie

Dec 17, 1999, 3:00:00 AM12/17/99

to

Tushar,

Try this :

{=N(OFFSET(A1:C3,{2;1;0},{2,1,0}))}

... array entered in a 3R * 3C range.

(it assumes that A1:C3 contains only numeric values).

Why the N() function? Because OFFSET doesn't return values but range

references, like INDEX or INDIRECT. So, when the 2nd and 3rd parameters

are vectors, OFFSET doesn't return an array of values, but an array of

references. And Excel can't show directly the contents of this

(undocumented) data type. It would show #VALUE!

You have thus to dereference the array, either through N() or T(), in

order to coerce this array of references into an array of values.

Cordialement (et Joyeux Noël),

Laurent

---

Excel MVP

Tushar Mehta a écrit:

Dec 17, 1999, 3:00:00 AM12/17/99

to

Tushar,

Here's one from the "Working With Lists" page on my web site,

http://www.cpearson.com/excel/lists.htm.

=OFFSET($A$3:$A$7,MAX(ROW($B$3:$B$7))-ROW(),0)

where A3:A7 is the data you want to reverse, and B3:B7 is the range

containing the formulas. Since this is an array formula, you must press

Ctrl+Shift+Enter rather than just Enter when you first enter the formula,

and whenever you edit it later.

Enter the formula in B3, and then fill down to B7.

Actually, the formula could be shortened to

=OFFSET($A$3,MAX(ROW($B$7))-ROW(),0)

where A3 is the first cell of the data, and B7 is the last cell of the

reversed list.

--

Cordially,

Chip Pearson

Pearson Software Consulting Services

Microsoft MVP - Excel

www.cpearson.com

ch...@cpearson.com

"Tushar Mehta" <ng_p...@bigfoot.com> wrote in message

news:MPG.12c42f12c...@msnews.microsoft.com...

Dec 17, 1999, 3:00:00 AM12/17/99

to

And it can be further shortened to

=OFFSET($A$3,ROW($B$7)-ROW(),0)

No need for array-entry, either.

--

Cordially,

Chip Pearson

Pearson Software Consulting Services

Microsoft MVP - Excel

www.cpearson.com

ch...@cpearson.com

"Chip Pearson" <ch...@cpearson.com> wrote in message

news:emaqikLS$GA.78@cppssbbsa05...

Dec 17, 1999, 3:00:00 AM12/17/99

to

[This followup was posted to microsoft.public.excel.worksheet.functions and a

copy was sent to Bernie Deitrick <dei...@consumer.org>.]

copy was sent to Bernie Deitrick <dei...@consumer.org>.]

I was about to send off an acknowledgement for the very quick response when I

saw Laurent [Longre]'s explanation of an undocumented data type and the use of

N() to dereference the result of OFFSET.

Since there is no single repository of information on how various functions

behave when used in array formulas, I will start one over the weekend.

The starting page will be tushar-mehta.com/excel/ and the founding

members will be INDEX, OFFSET, and INDIRECT. If it turns out that there

already is one, I will be more than happy to *not* duplicate the

information. I know of Chip [Pearson]'s site and will add a link to his

page but won't duplicate the information on my site.

--

Regards,

Tushar Mehta

www.tushar-mehta.com

--

In <385A6A33...@consumer.org>, Bernie Deitrick <dei...@consumer.org>

wrote

> Tushar,

>

> Some worksheet functions just aren't capable of returning arrays by design, and

> apparently OFFSET is one of those. I don't think there is any place with a listing

> of what can or can't return arrays: trial and error is the only way AFAIK.

>

> HTH,

> Bernie

>

> Tushar Mehta wrote:

>

Dec 17, 1999, 3:00:00 AM12/17/99

to

[This followup was posted to microsoft.public.excel.worksheet.functions and a

copy was sent to Laurent Longre <laurent...@free.fr>.]Thanks for the speedy response.

If I understand this correctly, INDEX returns values while OFFSET returns a

reference. That is why I have to coerce the results of OFFSET as in

=N(OFFSET(A1:C3,{2;1;0},{2,1,0})) but can use the results of INDEX directly as

in =INDEX(A1:C3,{3;2;1},{3,2,1}).

INDIRECT seems to be similar to OFFSET in that I have to dereference the result

with N().

This is with Excel 2000.

--

Regards,

Tushar Mehta

www.tushar-mehta.com

--

In <385A6DFA...@free.fr>, Laurent Longre <laurent...@free.fr> wrote

> Tushar,

>

> Try this :

>

> {=N(OFFSET(A1:C3,{2;1;0},{2,1,0}))}

>

> ... array entered in a 3R * 3C range.

>

> (it assumes that A1:C3 contains only numeric values).

>

> Why the N() function? Because OFFSET doesn't return values but range

> references, like INDEX or INDIRECT. So, when the 2nd and 3rd parameters

> are vectors, OFFSET doesn't return an array of values, but an array of

> references. And Excel can't show directly the contents of this

> (undocumented) data type. It would show #VALUE!

>

> You have thus to dereference the array, either through N() or T(), in

> order to coerce this array of references into an array of values.

>

> Cordialement (et Joyeux Noël),

>

> Laurent

> ---

> Excel MVP

>

> Tushar Mehta a écrit:

> >

Dec 17, 1999, 3:00:00 AM12/17/99

to

Tushar Mehta a écrit :

>

> If I understand this correctly, INDEX returns values while OFFSET returns a

> reference. That is why I have to coerce the results of OFFSET as in

> =N(OFFSET(A1:C3,{2;1;0},{2,1,0})) but can use the results of INDEX directly as

> in =INDEX(A1:C3,{3;2;1},{3,2,1}).

>

> If I understand this correctly, INDEX returns values while OFFSET returns a

> reference. That is why I have to coerce the results of OFFSET as in

> =N(OFFSET(A1:C3,{2;1;0},{2,1,0})) but can use the results of INDEX directly as

> in =INDEX(A1:C3,{3;2;1},{3,2,1}).

Not exactly. Although it might sound strange, I think that INDEX can't

return any true array, even in an array formula. It returns only single

values. IMO, if it is used in a range as an array formula, it returns a

set of values which are evaluated separately by the calculation engine

and then fill separately the cells of the selected range, not a true

two-dimensional array.

{=ROWS(INDEX(A1:A3,{1;2;3}))} returns always 1 (=> single value) whereas

{=ROWS(OFFSET(A1:A3,{0;1;2},0)) returns 3 (=> array of 3 values).

Laurent

Dec 19, 1999, 3:00:00 AM12/19/99

to

Thanks for the tips in my ongoing education on array formulas. The

three examples from Chip [Pearson] lead to the (inevitable?) query.

What's the performance characteristics on using array formulas?

three examples from Chip [Pearson] lead to the (inevitable?) query.

What's the performance characteristics on using array formulas?

The reason I am posting this rather than testing the performance myself

is that I am not sure how one would do that in the first place!

Typically, I would write some code to start a clock, do the

calculations, and stop the clock. But, I don't know how to do that with

Excel's calculation engine.

I can think of three issues related to the performance of array

formulas:

The first refers to the three variants that Chip provided, including one

that is a non-array formula.

The second refers to Laurent [Longre]'s response, again in this thread,

about the difference between INDEX and OFFSET. If I understand that

correctly, the Excel calculation engine processes the INDEX function

multiple times to get all the values while the OFFSET function returns a

true array, albiet of a strange and undocumented data type.

The third refers to Harlan [Grove]'s generalized suggestions in the

thread 'Help: MMULT(array1,array2) - Need to reverse Array2 to get

correct product' The solutions, while very general, seems to be the

most processor-intensive. They were definitely the most grey-matter-

intensive <g>

Please don't get the wrong idea about why I am asking these questions.

By and large, I am a strong proponent of using the solution that is the

most people-efficient, especially given the rate of improvement in the

price-performance ratio of computers. At the same time, I remain

acutely aware that, in many systems, there will exist critical code that

must be performance-efficient.

--

Regards,

Tushar Mehta

www.tushar-mehta.com

--

In <OTTyyoLS$GA.294@cppssbbsa05>, Chip Pearson <ch...@cpearson.com> wrote

Dec 19, 1999, 3:00:00 AM12/19/99

to

Hi Tushar,

I'd like to know more about the internals of the calculation process in

Excel, but haven't found anything from MS on the topic. I suppose a rather

crude way to time the execution of a particular function would be something

like the following code:

'---------------------------------------------------------

Public Declare Function GetTickCount Lib "kernel32" () As Long

Sub RunCalcs()

Dim Ndx As Long

Dim StartTicks As Long

Dim EndTicks As Long

Const NumCalcs = 10000

Application.Calculation = xlCalculationManual

Application.DisplayAlerts = False

Application.EnableEvents = False

Application.ScreenUpdating = False

StartTicks = GetTickCount()

For Ndx = 1 To NumCalcs

Application.CalculateFull

Next Ndx

EndTicks = GetTickCount()

Application.Calculation = xlCalculationAutomatic

Application.DisplayAlerts = True

Application.EnableEvents = True

Application.ScreenUpdating = True

Range("F1").Value = StartTicks

Range("F2").Value = EndTicks

Range("F3").Value = EndTicks - StartTicks

Range("F4").Value = (EndTicks - StartTicks) / NumCalcs

End Sub

'---------------------------------------------------------

Just make sure that the *only* thing on the worksheet is the formula(s) to

test, and the data required by them. While this is a rather crude way of

doing things, it may be useful for generalized comparisons. The problems

with the timing are probably going to be constant between formulas. For

example, the test result for formula #1 may be way off from the "actual"

value, but the test result for formula #2 should (?? and I'll leave it to

math/stats guys like you and Dave B to tell me if I'm wrong here) be just as

far off, so comparing the two results should give you some measure of the

relative performance of the two formulas.

I use this approach quite frequently, and the results seems to agree with

what I intuitively think that they should be (although once in a while I'm

quite surprised).

For example, I used the formula I originally posted (but extended for 100

rows)

=OFFSET($A$3:$A$102,MAX(ROW($B$3:$B$102))-ROW(),0)

and recalculated the sheet 10,000 times. This took about 70 seconds, or

about 7 sheet recalculations per second. Then, I used my revised, non-array

formula (again, extended to 100 rows)

=OFFSET($A$3,ROW($B$102)-ROW(),0)

and recalculated the sheet 10,000 times. This took about 7.5 seconds, or

about 75 sheet recalculations per second. Which tells me that for these two

formulas, for 100 rows, the second one is about 10 times faster than the

first. Sounds roughly reasonable to me, although I would have guess the

second would have been even more efficient than the first.

Of course there are better ways than GetTickCount to time a process, because

activities by the operating and other processes will affect the results.

GetProcessTime (NT only, I think) will give you the actual time used by a

process, in increments of 100 nano-seconds (!!!!). I used code like the

procedure shown above because it is simple, and I don't really care about

the absolute amount of time it takes a formula to run. I'm more interested

in comparing the times between two different formulas.

As you can well imaging, the calculation times of area formulas are not

(generally) linear with the number of rows in the arrays. Doubling the

number of rows is going to much more than double amount time it takes to

calculate the formula. I haven't done any real study of time, only some

playing around, but the results I've seen tend to agree with what

intuitively I thought they should be. (Of course, that's rather circular,

isn't it. My intuition is based on what I've seen Excel do, so of course

I'm really just remembering past performance, not really intuiting anything!

But you know what I mean.)

I'd be interested if anyone has done a more rigorous analysis of calculation

times, or has a good solid tool to do that (Stephen? Rob? Laurent?) . I

wish that MS would release some much more technical information about Excel.

Not "how-to" information, but "why" and "how it works" information.

--

Cordially,

Chip Pearson

Pearson Software Consulting Services

Microsoft MVP - Excel

www.cpearson.com ch...@cpearson.com

"Tushar Mehta" <ng_p...@bigfoot.com> wrote in message

news:MPG.12c6d1de9...@msnews.microsoft.com...

Dec 19, 1999, 3:00:00 AM12/19/99

to

Tushar,

> The second refers to Laurent [Longre]'s response, again in this thread,

> about the difference between INDEX and OFFSET. If I understand that

> correctly, the Excel calculation engine processes the INDEX function

> multiple times to get all the values while the OFFSET function returns a

> true array, albiet of a strange and undocumented data type.

Yes exactly.

More in my reply to Chip's post.

Laurent

Dec 19, 1999, 3:00:00 AM12/19/99

to

Chip Pearson a écrit :

>

> Sounds roughly reasonable to me, although I would have guess the

> second would have been even more efficient than the first.

>

> Sounds roughly reasonable to me, although I would have guess the

> second would have been even more efficient than the first.

IMO, it is normal. A "multi-cells array formula" is IMO logically slower

than the equivalent set of "single cell non-array formulae":

1) If you change a single cell of the range used by the array formula,

the whole array formula will be recalculated.

For instance, say that you have entered {=MYARRAYFORMULA(A1:A5000)} in

B1:B5000. If you enter a new value in any cell of A1:A5000 (say, in

A350), the *whole* array formula will be automatically recalculated.

Exactly as if you had changed not only the value of A350, but of the

5000 cells in A1:A5000.

On the contrary, say that you have entered =MYNONARRAYFORMULA(A1) in B1

and filled this "normal" formula in B2:B5000. If you change then the

value of A350, it will recalculate only the corresponding formula in

B350. The recalculation will be faster.

2) The array formula will need memory allocation to store the

intermediate calculated arrays and the resulting arrays. It has also to

perform a lot of copies (copy the intermediate and final results in the

allocated arrays). This will also spend time. On the contrary, single

cell non-array formulae won't need any special memory allocation and

data copies.

AFAIC, I have tested and compared the calculation speed of 1)

single-cell formulae, 2) the built-in DBCOUNT function and 3)

array-formulae like {=SUM((Range1<X)*(Range2<Y))}, all of them producing

the same results. I have presented the calculation speeds at

http://longre.free.fr/pages/form/form_mat.htm#Performances (see the

table).It shows that 1) is calculated faster than 2), which is

calculated MUCH faster than 3).

Anyway, although I think that in most cases array-formulae can't speed

up the calculations (except of course when you use them in combination

with *true* array functions like MMULT or MINVERSE), they provide:

- a better readability (for instance, 1 single array formula can replace

3000 single-cell formulae)

- and also a smaller file size.

> As you can well imaging, the calculation times of area formulas are not

> (generally) linear with the number of rows in the arrays. Doubling the

> number of rows is going to much more than double amount time it takes to

> calculate the formula. I haven't done any real study of time, only some

> playing around, but the results I've seen tend to agree with what

> intuitively I thought they should be.

I have also noted often the same. But, as you say, it's rather difficult

to understand this unless MS release more technical information about

the calculation engine.

Cordially,

Laurent

Dec 20, 1999, 3:00:00 AM12/20/99

to

Thank you everyone, especially Laurent, for this excellent discussion!

I have some changes to make :-)

If anyone is interested...

I have some changes to make :-)

If anyone is interested...

XL: Some Worksheet Functions Don't Allow Array Constants

http://support.microsoft.com/support/kb/articles/Q124/2/16.asp

Bernie Deitrick <dei...@consumer.org> wrote in message

news:385A6A33...@consumer.org...

> <snip> I don't think there is any place with a listing

Dec 21, 1999, 3:00:00 AM12/21/99

to

I understand both the intuitive reasoning and the numerical analysis

provided in comparing array with non-array formulas. But, that, as it

turns out, is not the whole story.

provided in comparing array with non-array formulas. But, that, as it

turns out, is not the whole story.

Ever since I discovered array formulas back on Excel 4 (or was it 3 or

2?) I have used them for three purposes and one of them has been to

*improve* performance. That's right, to improve performance. [The

fourth reason are the various formulas that I've discovered through

these Excel NGs and why I was inspired to ask about performance.]

The first reason for using them is that the function actually returns

just one result very much like an non-array formula. However, the

result consists of multiple elements and consequently, one must provide

an array for the result. This would include functions such as MMULT,

LINEST, etc.

The second has been for documentation - along the lines that Laurent

[Longre] noted. Actually, it goes a little beyond readability since the

line between that and defensive programming can be somewhat blurry. One

example would be comparing monthly profits across 2 years. If the

monthly profits for each of the 2 years occupy 12 cells in columns A and

B respectively and the difference is in column C, an array formula

indicates that these 12 cells constitute a single entity that I am

analyzing.

The third is the scenario where a array formula speedens up

recalculation -- often by a magnitude! Here's why. Most, if not all,

functions can be thought of as containing three code segments. The

first validates the parameters provided, the second does whatever the

function is supposed to do, and the third massages the results into a

format appropriate for output. In the case where the first (or the

third) takes more time than the second, an array formula will speeden up

processing.

Generating random numbers is one example where the first, i.e.,

validation, can be much more expensive than actually generating the

result. Consider a custom random number function in which the

parameters define a piece-wise linear distribution. In this case, the

RV generator must perform extensive validation of the data. The values

must be in ascending order and the probability values must pass a set of

tests. After that happens, the actual generation is reasonably quick.

In some tests I ran earlier today, I calculated 100 numbers from a

custom CDF with 6 points. Then, I ran the test 100 times. The

resulting average time to generate the 100 numbers:

The non-array method took 0.158 seconds

The array formula took 0.007 seconds.

I hope to post a more detailed article on the process I used, including

some bizarre results that I got with Excel's (or is it VBA's) TIMER

function and additional results comparing the significance of the three

segments that I talked about above.

--

Regards,

Tushar Mehta

www.tushar-mehta.com

--

In <385D2404...@free.fr>, Laurent Longre <laurent...@free.fr>

wrote

> Chip Pearson a écrit :

> >

> > Sounds roughly reasonable to me, although I would have guess the

> > second would have been even more efficient than the first.

>

> > As you can well imaging, the calculation times of area formulas are not

> > (generally) linear with the number of rows in the arrays. Doubling the

> > number of rows is going to much more than double amount time it takes to

> > calculate the formula. I haven't done any real study of time, only some

> > playing around, but the results I've seen tend to agree with what

> > intuitively I thought they should be.

>

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu