Use of ExcelReader for large data sets and slow response

21 views

Skip to first unread message

ThorPedo

unread,

Feb 24, 2017, 6:31:22 AM2/24/17

to Accord.NET Framework

Hi all forum readers that may have knowledge of the ExcelReader provided with the Accord.net

In a "research" project I am working on for using SVM classifiers to analyse blob shapes I am using excel documents for intermediate storage of property data. This can be a large data set 250.000 records or more (final excel file will be 100 - 200 Mb), but the writing has to be done in batches of 2000 records as I am loading the image files for the blobs into memory to calculate the properties. Therefor I am wrting (appending) to the xlsx file in batches. this is taking very long time - and I don't understand why. I am using the ExcelReader sample from your Accord.net web site as template for this. The steps are;

1. create the reader ExcelReader db = new ExcelReader(xlsFileName, true, false);

2. create the dataset and datatabel
DataSet workbook = null;

DataTable worksheet = null;

3. Load the selected datasheet
worksheet = db.GetWorksheet(sn);

workbook.Tables.Add(worksheet);

4. Then I append the new data to the sheet

foreach (blobProperties r in records)

{

worksheet.Rows.Add(new object[] {

(double)r.area,

(double)r.perimeter,

.......

(bool)r.qualifies});

line++;

}

5. Finally I create the modified excel file. The old file is over-written.

bool success = CreateExcelFile.CreateExcelDocument(workbook, xlsFileName);

Steps 1 - 5 are repeated as long as there are any blobs left in my list of images to process.

This works fine for smaller data sets ( <10.000 records) but when I come to larger datasets it takes hours to do this processing. for dataset with 250.000 records I need to iterate 125 times through this loop.

Am I doing this wrong, i.e. is this not the way to append data to an excel file? As far as I understand 250.000 records of total 200 Mb is not much for the DataSet or DataTable classes nor for xlsx files.

The C# source for writer method is attached.

ExcelReaderMethod.cs

Reply all

Reply to author

Forward

0 new messages