Fastest Way to Read a Range of Cells into C# - BUT I want the 'text' value, not the 'value' value

3,970 views
Skip to first unread message

Terry Aney

unread,
Dec 5, 2015, 7:28:05 PM12/5/15
to Excel-DNA
Is there a way to do this?  I know Interop.Range.Value2 and ExcelReference.GetValue() but both return the 'value'.

In most cases I could get away with it, but when the value/text is a 'date', those methods return me a double and I have know way of knowing if it is a date or not.

Additionally, might want to try and pull of same mechanics and get all the formulas for a given range.

Possible?

Terry Aney

unread,
Dec 14, 2015, 12:22:29 PM12/14/15
to Excel-DNA
Well, Govert helped me discover how to get 'Text' from a cell via:

(string)XlCall.Excel( XlCall.xlfGetCell, (int)GetCellType.Text, excelReference )



Only way to use this was via looping, so I didn't do that as times below show.

As a little background, I need to export 'data' from Excel into Xml...so I don't care about number formats, but I do need 'Dates' and .GetValue() returns dates as doubles.  I could probably make some rules/assumptions on an entire column by maybe checking formats of first row before doing a .GetValue then transform double->date for any columns flagged as 'Date' columns but given the speed listed below, not really worth the risk/effort/ROI.

So here are my results for reading a 15,000x100 cells (let me know if you have any questions about the 'methods' at all):

Run C API GetValue() (3) Interop Range.Value2 Interop Range.Value Interop Range.Value/Loop (1) VBA/Loop (1) C API Looping/GetCell (1) COM (Application.Run) (1) Interop Loop/Text (1)
1 0.3507217 0.7069541 0.9426819 1.6121424 20.5388871 19.6186631 22.3122662 72.5640065
2 0.3614963 0.7421369 0.9145262 1.5115977 19.9144984 19.2103384 21.2756509 70.8351868
3(2) 0.35609931 0.719151943 0.925460836 1.557100406 22.6988244 23.0396793 23.624186 82.022291
4(2) 0.351611989 0.731355898 0.923744209 1.558899742 17.7283302 21.2414558 23.3982391 81.0125688
5(2) 0.353727928 0.728213728 0.923501217 1.550801263 22.1358615 22.5800253 23.242252 81.1945079
Average 0.354731445 0.725562514 0.925982872 1.558108302 20.60328032 21.13803238 22.77051884 77.5257122

(1) Dates are proper dates instead of just double
(2) Ran in Release mode…strange that it was slower
(3) Can return ExcelMissing, ExcelEmpty, or ExcelError

C API GetValue() - Returns zero-based object[,], but dates are returned as double, also includes ExcelEmpty, ExcelMissing, ExcelError
Interop Range.Value2 - Returns one-based object[,], but dates are returned as double and empty cells are null
Interop Range.Value - Returns one-based object[,], but dates are actual DateTimes, empty cells are null
Interop Range.Value/Loop - Returns zero-based string[,] (converting Interop's one-based), dates are formatted as yyyy-mm-dd, empty cells are null
VBA/Loop - Speed test just to see how fast VBA could return a zero-based string[,], values formatted with .Text
C API Looping/GetCell - Returns zero-based string[,], values are formatted as text
COM/Loop (Application.Run) - DNA addin calls .xlam addin that returns zero-based string[,], values formatted with .Text (looping inside the .xlam, didn't use Range.Value, probably would be faster doing that)
Interop Loop/Text - Returns zero-based string[,], values are formatted with .Text

So as much as I wanted to keep 'Interop' coding out of the mix (don't know why, just something that felt right), I've settled on doing Interop.Range.Value then my DateTimes are preserved.  I couldn't find any C API equivalent to this call...

Range.Value -> No C API call that I could find
Range.Value2 -> Equivalent to ExcelReference.GetValue()

Patrick O'Beirne

unread,
Dec 14, 2015, 12:29:03 PM12/14/15
to exce...@googlegroups.com
How about

Set xlXML = CreateObject("MSXML2.DOMDocument")
xlXML.LoadXML rng.Value(xlRangeValueMSPersistXML)

Govert van Drimmelen

unread,
Dec 15, 2015, 4:42:19 AM12/15/15
to exce...@googlegroups.com

Hi Terry,

 

Thanks for posting this!

 

It does seem to confirm that the C API is significantly faster than the COM object model, where it does apply.

 

-Govert

--
You received this message because you are subscribed to the Google Groups "Excel-DNA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to exceldna+u...@googlegroups.com.
To post to this group, send email to exce...@googlegroups.com.
Visit this group at https://groups.google.com/group/exceldna.
For more options, visit https://groups.google.com/d/optout.

Terry Aney

unread,
Dec 15, 2015, 3:05:50 PM12/15/15
to Excel-DNA

I had to post this.  Will be hard to get entire context around it, but here it goes.

As I said above (or maybe in other threads), I needed to export 'data' from Excel into Xml format for processing by other systems.  Our current add-in was VBA but we wanted to move to C# for following reasons:

a) Better source code management (not one huge binary .xlam file ... or hacks to attempt to auto export all the modules to text on save as a form of 'source')
b) Too many times, Excel has crashed during VBA coding and I've lost hours upon hours of work.
c) Easier to write code in C# (i.e. cleaner solutions to most problems, ability to use features not even present in VBA)
d) All our developers are C# (mostly trying to forget VBA ;) )
e) the list goes on...

Anyway, given an Excel file like this, with 26,700 rows:


And I generate an Xml structure that looks like this:



In my VBA version, it took 246 seconds to complete... in C#/DNA it took 0.6 seconds!!


I didn't even know it was possible to create a 10MB file in 0.6 seconds.  Obviously, the code was written 'better' (in general) and also better in terms of using C# functionality that isn't available in VBA, but I am still amazed.

Govert van Drimmelen

unread,
Dec 15, 2015, 3:29:34 PM12/15/15
to Excel-DNA
That's a great story Terry - 246 seconds to 0.6 seconds!

In case you haven't run into this - I do accept donations via PayPal ;-)


Cheers,
Govert

Terry Aney

unread,
Dec 15, 2015, 5:40:45 PM12/15/15
to Excel-DNA
Done.

Govert van Drimmelen

unread,
Dec 15, 2015, 6:06:18 PM12/15/15
to exce...@googlegroups.com
That's so kind - thank you Terry!

-Govert


From: exce...@googlegroups.com [exce...@googlegroups.com] on behalf of Terry Aney [terry...@gmail.com]
Sent: 16 December 2015 12:40 AM
To: Excel-DNA
Subject: Re: [ExcelDna] Re: Fastest Way to Read a Range of Cells into C# - BUT I want the 'text' value, not the 'value' value

--

Patrick O'Beirne

unread,
Dec 16, 2015, 4:34:13 AM12/16/15
to Excel-DNA
It depends on how well the VBA is written. My suggestion does 26700 rows in 0.76 seconds, but the xml is not looking the way you want it.;
I'm not familiar with the DomDocument object but perhaps setting some properties might get it the way you want. If not, then ExcelDNA and C# is still unbeatable!

Option Explicit
Declare Function GetTickCount Lib "kernel32" () As Long
Sub TA()
Dim xlXML As Object, t As Long
t = GetTickCount
Set xlXML = CreateObject("MSXML2.DOMDocument")
xlXML.LoadXML ActiveSheet.UsedRange.Value(xlRangeValueMSPersistXML)
Debug.Print GetTickCount - t ' 625 ms
xlXML.Save "F:\temp\TA.xml"
Debug.Print GetTickCount - t  ' 765 ms
End Sub

FastExcel

unread,
Dec 19, 2015, 1:35:34 PM12/19/15
to Excel-DNA
Won't help your .Text problem, but to set the record straight: reading 15000 row x 100 columns of numbers into a variant array using VBA .Value2 takes my PC 0.092 secs, and 0.3 secs using .Value, rather than the 20 seconds for your VBA loop.

Dim var as variant
var = Range("A1:CV15000").Value2

Coercing the range to a single xlOper with C should be a bit faster.

- Charles

kissinger chen

unread,
May 12, 2016, 6:47:57 PM5/12/16
to Excel-DNA
Sorry I did not catch you. Which method did you use to read the 26,700 rows? Could you give me some code samples here? Thanks.

Paul Harrington

unread,
May 14, 2016, 4:25:39 PM5/14/16
to Excel-DNA

Patrick, I was not aware of the xlRangeValueMSPersistXML enum value and it prompted me to do a couple of experiments (although I have not yet done any benchmarking. Despite messing around with ExcelDNA for some years, I don't know very much C# as I do my regular computing in Python). For convenience, I wrote the .Value(xlRangeValueMSPersistXML) to a file and experimented in LINQPad with reshaping it as a DataTable. I am very interested in figuring out how to reshape it to https://github.com/mgravell/fast-member  as it supports the IDataReader interface and is thus usable as an argument to SendToServer in the SqlBulkCopy class.I am curious if one can send a table of string values to the dataserver and have it take care of the type-coercion. On the other hand, doing it client-side is better for error-checking.

Assuming I figure out all the C# housekeeping, I will do some benchmarking and report back but in the meantime, I have included the LINQ fragments below.

pjjH


XmlDocument doc = new XmlDocument();
doc.Load(XmlReader.Create("C:/ftp/x.xml"));

 var column_names = doc.GetElementsByTagName("s:AttributeType")
                 .Cast<XmlNode>()
.Select(e => e.Attributes["rs:name"].Value);

 var rows = doc.GetElementsByTagName("z:row")
                 .Cast<XmlNode>()
.Select(e => (from XmlAttribute x in e.Attributes select x.Value).ToArray());  
var table = new DataTable(); 
foreach (var column in column_names) {
table.Columns.Add(column);
}

foreach (var row in rows) {
table.Rows.Add(row);

Paul Harrington

unread,
May 14, 2016, 7:33:08 PM5/14/16
to Excel-DNA

The results were a little surprising to me (but perhaps I am doing something wrong). For ~190K cells, the time to serialize it to XML was just over a second (which was slower than I expected) and then the performance of the LINQ queries and DataTable construction is good (128 ms). 


1010 ms to get XML selection

1139 ms to generate XML document

7 columns 27440 rows 1267 ms total


           
           
Stopwatch stopwatch = Stopwatch.StartNew();

            System.Data.DataTable dt = new System.Data.DataTable();

            dt.Clear();
            XmlDocument doc = new XmlDocument();
            string s = (string)r.get_Value(XlRangeValueDataType.xlRangeValueMSPersistXML);
            stopwatch.Stop();
            Debug.WriteLine(@"{0} ms to get XML selection", stopwatch.ElapsedMilliseconds);

            stopwatch.Start(); 
            doc.LoadXml(s);
            stopwatch.Stop();
            Debug.WriteLine(@"{0} ms to generate XML document", stopwatch.ElapsedMilliseconds);
            
            stopwatch.Start();

            var column_names = doc.GetElementsByTagName("s:AttributeType")
                          .Cast<XmlNode>()
                          .Select(e => e.Attributes["rs:name"].Value);

            var table = new System.Data.DataTable();
            foreach (var column in column_names)
            {
                table.Columns.Add(column);
            }

             var rows =  doc.GetElementsByTagName("z:row")
                 .Cast<XmlNode>()
.Select(e => (from XmlAttribute x in e.Attributes select x.Value).ToArray());

             foreach (var row in rows)
             {
                 table.Rows.Add(row);
             }
            stopwatch.Stop();
            Debug.WriteLine(@"{0} columns {1} rows {2} ms total", table.Columns.Count, table.Rows.Count, stopwatch.ElapsedMilliseconds);


Paul Harrington

unread,
May 15, 2016, 11:10:11 AM5/15/16
to Excel-DNA
BTW, the use of the Microsoft.Office.Interop.Excel.Range is due to the tests being done within the SheetSelectionChange handler for a CTP.

        private static void xlApp_SheetSelectionChange(object Sh, Range Target)

I would like to do some more systematic testing of reading/writing regions via the different mechanisms.

Terry Aney

unread,
Oct 6, 2017, 9:23:44 AM10/6/17
to Excel-DNA
Not sure why I didn't get notified of message.

I ended up just using the COM .Value property.

var reference = new ExcelReference(  ...  ); // I had a reference passed in, but just putting this here as example
var data = reference.GetRange().Value; // GetRange() was extension method to just take ExcelReference address information and grab the same cells as a MS Office Range
// Reference was only 1 cell if null, so not an object[,] but actual value, so change to array
var value = data as object[,] ?? new object[,] { { data } };
return new InteropArray( value );



... Below is just a helper class ...


 /// <summary>
 
/// Wrapper for Microsoft Interop Arrays that are 1 based to make accessing them
 
/// via C# 0 based arrays work.
 
/// </summary>
 
public class InteropArray
 
{
 
private object[,] data;


 
public int Rows { get; private set; }
 
public int Columns { get; private set; }


 
private int rowOffset;
 
private int colOffset;


 
public InteropArray( object[,] data )
 
{
 
this.data = data;
 
Rows = data.GetLength( 0 );
 
Columns = data.GetLength( 1 );


 
// If GetArray is called on single cell, c# creates the
 
// array instead of Excel and it is 0 based.
 rowOffset
= data.GetLowerBound( 0 );
 colOffset
= data.GetLowerBound( 1 );
 
}




 
public object this[ int x, int y ]
 
{
 
get
 
{
 
return data[ x + rowOffset, y + colOffset ];
 
}
 
}
 
}

Terry Aney

unread,
Oct 6, 2017, 9:29:58 AM10/6/17
to Excel-DNA
Right, I stated that above as the second and third fastest methods.
Reply all
Reply to author
Forward
0 new messages