RTD Array Formula Performance - Marshalling

1,512 views
Skip to first unread message

MK

unread,
Jan 12, 2014, 11:09:01 PM1/12/14
to exce...@googlegroups.com
Hi all,

I have a spreadsheet that I'm trying to get array formula updates working in realtime in Excel. The array however is quite long (100,000 rows, 3 columns as a rough guide).

When evaluating the formula not using RTD the performance isn't noticeable at all. However using the RX extensions and returning an object[,] array as the realtime result causes performance issues.

I've done some profiling and tracked down where the most time is spent. It seems that for some reason ExcelDNA is marshalling the same object[,] array back to native for each cell in the array causing a calculation explosion. i.e for a managed multidimensional array of 100 * 100 DNA will mashall a native version of that array 100,000 times as Excel tries to reevaluate the function for every cell. In effect DNA is marshalling the same array for Excel for every cell i.e. processing becomes exponential to the number of cells.

The method in question is : unsafe public IntPtr MarshalManagedToNative(object ManagedObj) in class XlObjectArray12MarshalerImpl.cs

Is there a way to get ExcelDNA to cache marshalled results if it knows that the array is something it has unmarshalled before? Even if the cache lives for about a second or so that would be a pretty big improvement to performance.

Would make using RTD arrays feasible.

Cheers,

MK

Govert van Drimmelen

unread,
Jan 13, 2014, 5:10:57 PM1/13/14
to exce...@googlegroups.com
Hi MK,

The behaviour that you see, where the RTD function returning an array is called once for every cell, is not peculiar to Excel-DNA. Excel does this for any RTD server - Microsoft is quite explicit: "The RTD function cannot return an array." (http://support.microsoft.com/kb/286258).

So I would discourage you from trying to combine RTD features and arrays. A more reliable approach might be to have the RTD function return a 'handle' to an internally stored data structure, that gets returned to a single cell.
Then you could have lookup functions (maybe array functions) that return the internal values. It becomes more complicated, but you'll get good performance.

If this is too terrible, you might like to experiment with the result from ```XlCall.Excel(XlCall.xlfCaller);``` in different calls to your function. I think it will be null for all the individual cell calls, and only have a valid ExcelReference for the one array call. You might then try to shortcut your calculation when it is null, and only return the full array when you have an ExcelReference. I tried this a bit some years ago, but can't remember whether there was some problem that stopped me from using this technique further.

Regards,
Govert

MK

unread,
Jan 13, 2014, 7:15:26 PM1/13/14
to exce...@googlegroups.com
Hi Govert,

Thanks for your reply. In my particular case I am already caching the result of the method call (i.e think of my function as returning a static array defined in code). Your RX examples on your site do this naturally (i.e uses the cached value until another value comes through the observable). In this case the value I'm subscribed to is the object[,]. Profiling Excel and my code shows the DNA class as taking more than 90% of the time marshalling the same array. (i.e if I alter the addin marshaller and remember the last called result and its parameter and caching the results of the function I get almost instant RTD arrays - this fix was tested against ExcelDNA source downloaded)

I can try your approach in returning only the value for that particular cell when I do the call but I've never used this technique. I'm guessing you mean something like (don't know the XlCall.xlfCaller interface yet so I'm just guessing what you are trying to tell me to do with the function).

private object[,] resultArray;

public static object[,] TestFunction()
{
var reference = XlCall.Excel(XlCall.xlfCaller);

var newArray = new object[,]
var rowOfCurrentCell;
var colOfCurrentCell;

newArray[rowOfCurrentCell, colOfCurrentCell] = this.resultArray[rowOfCurrentCell, colOfCurrentCell];

return newArray;
}

As a suggestion it might be nice to have an attribute defined on my UDF ([ImmutableReturnValue]) that allows the marshaller to weakly cache marshalling results of return objects so that if a function returns the same result over and over it doesn't have be remarshalled back to a native array.

Thanks for your reply, gives me something to go on.

MK

MK

unread,
Jan 14, 2014, 1:22:42 AM1/14/14
to exce...@googlegroups.com
For people reading this post in the future I tried the XlCall.Excel(XlCall.xlfCaller) function Govert suggested - it doesn't let me determine which cell in the array formula it is trying to evaluate (just shows me the rows and columns of the array). I was thinking of using it to pad out the returned object[,] array with nulls for everything other than the cell being evauated to reduce the bottleneck in the DNA marshalling code.

Other than modifying ExcelDNA source in the class mentioned in my first post I can't get it to be acceptable. The hack I did to get the RTD array formulas to be fast was in XlObjectArray12MarshalerImpl.MarshalManagedToNative(object ManagedObj) exploiting the fact that RTD array formulas are called in order most of the time (statistically I'm reducing ExcelDNA's marshalling substantially).

Added these fields to the XlObjectArray12MarshalerImpl class

private IntPtr? lastFunctionResultPointer;
private object lastFunctionResult;

Then in the MarshalManagedToNative method just before the Reset call:

if (this.lastFunctionResult != null && object.ReferenceEquals(ManagedObj, this.lastFunctionResult))
{
    return this.lastFunctionResultPointer.Value;
}
else
{
    this.lastFunctionResultPointer = null;
    this.lastFunctionResult = null;
}

And finally just before returning the value remembering the pointer and the input function (i.e if the function is called twice with the same input we use the last method invocation result).

NOTE: This hack is unacceptable for production use due to the fact that the object to marslal to native might have mutated since the last call. An attribute on the [ExcelFunction] indiciating immutability of the return type may allow selectively applying this optimisation.

Hope this helps someone. I can't use changed source due to policy but someone out there might find some use in this.



On Tuesday, January 14, 2014 9:10:57 AM UTC+11, Govert van Drimmelen wrote:

DC

unread,
Jan 27, 2014, 1:56:50 PM1/27/14
to exce...@googlegroups.com
Govert,

When you say :

So I would discourage you from trying to combine RTD features and arrays. A more reliable approach might be to have the RTD function return a 'handle' to an internally stored data structure, that gets returned to a single cell.

Would that apply to using the async methods in Excel-DNA as well since they use RTD under the hood?

Regards,

Dave 

Govert van Drimmelen

unread,
Jan 27, 2014, 4:39:51 PM1/27/14
to exce...@googlegroups.com
Hi Dave,

Hard to say. One issue with arrays and RTD is that the wrapped function gets called once for every cell, and once for the whole array. But the RTD-based async function will keep the returned value around, so these calls should be quite fast. But the multiple calls have some overhead if the array is large.

I've not yet managed to properly combine the Rx-style IObservables with the ArrayResizer.
You might have to try it in your context - I have little experience of using these myself.

-Govert

MK

unread,
Jan 28, 2014, 1:31:39 AM1/28/14
to exce...@googlegroups.com
Hi Dave,

When I looked at this the new ASync RX stuff did in fact cache the value called (the managed object value). The overhead comes from the marshalling from the cached value to Excel native arrays for every RTD cell + once for the array - this part is independent of the async caching that happens.

I took Govert's suggestion and used a two formula approach where I store cached results, assign a token/handle to that version of the state, and then have another formula to get the array for a given update token/handle. This second formula (the one returning an array) is a standard UDF with no RTD. The generated token value should change on each update via RTD to force Excel to evaluate dependent formulas, in this case your whole array. That effectively means that managed -> native marshalling only happens on each RTD event rather than for each (cell in array + 1). This works with at least 300,000 large cells which while a little ugly (requires two formulas) was acceptable in our case.

VB

unread,
Sep 22, 2014, 3:34:24 AM9/22/14
to exce...@googlegroups.com
Hi Govert,

What if we call RTD inside a UDF function like this:

var rtdResult = XlCall.RTD("My.Server", null, rtdParams.ToArray());

but write RTD server in such a way that it returns (in rtdResult) a key (the handle you mention) to some shared C# dictionary where result arrays are stored instead of actual data. But instead of the second function we process the handle inside UDF.

I do not know all details how RTD works, but what I do now is I pass object[,] serialized as string and deserialize it inside an UDF. Serialization is not a fast thing to do for large arrays, but the RTD update logic works as expected - that is, if UDF returns something dependent on RTD result but not the result itself the RTD server still starts and sends updates into the calling UDFs normally. 

So after we get a new rtdResult, we lookup actual data and return it:
return StaticSharedClass.SharedData[ rtdResult.ToString() ];

What are the downsides/risks in this approach?

Thanks!
VB

Marc Lefkon

unread,
Dec 10, 2014, 10:18:50 PM12/10/14
to exce...@googlegroups.com
@VB-
This approach seems to work quite well.  My other option to avoid the 'double formula' approach is to include some sort of index in the original Excel formula and avoid Excel arrays all together.  

But your approach encapsulates the array in one clean UDF.  Just increment the pointer handle each update and then you only need to call Topic.UpdateValue(handle) once.  The UDF still gets called once for each cell (ignoring the call for the entire array), but this must be much more efficient for Excel than storing an internal list of many many Topics which would all need an UpdateNotify().

Govert, can I ask VB's question again? are there any downsides/risks in the approach?

Thanks a lot.
Reply all
Reply to author
Forward
0 new messages