Use of ExcelReader for large data sets and slow response

21 views
Skip to first unread message

ThorPedo

unread,
Feb 24, 2017, 6:31:22 AM2/24/17
to Accord.NET Framework
Hi all forum readers that may have knowledge of the ExcelReader provided with the Accord.net

In a "research" project I am working on for using SVM classifiers to analyse blob shapes I am using excel documents for intermediate storage of property data.  This can be a large data set 250.000 records or more (final excel file will be 100 - 200 Mb), but the writing has to be done in batches of 2000 records as I am loading the image files for the blobs into memory to calculate the properties.  Therefor I am wrting (appending) to the xlsx file in batches.  this is taking very long time - and I don't understand why. I am using the ExcelReader sample from your Accord.net web site as template for this.  The steps are;

1. create the reader    ExcelReader db = new ExcelReader(xlsFileName, true, false); 
2. create the dataset and datatabel
        DataSet workbook = null;
        DataTable worksheet = null;
3. Load the selected datasheet 
                    worksheet = db.GetWorksheet(sn);
                    workbook.Tables.Add(worksheet);
4. Then I append the new data to the sheet
        foreach (blobProperties r in records)
        {

            worksheet.Rows.Add(new object[] { 
                (double)r.area, 
                (double)r.perimeter, 
                 .......
                (bool)r.qualifies});

            line++;
        }
5. Finally I create the modified excel file.  The old file is over-written.

             bool success = CreateExcelFile.CreateExcelDocument(workbook, xlsFileName);

Steps 1 - 5 are repeated as long as there are any blobs left in my list of images to process.

This works fine for smaller data sets ( <10.000 records) but when I come to larger datasets it takes hours to do this processing. for dataset with 250.000 records I need to iterate 125 times through this loop.

Am I doing this wrong, i.e. is this not the way to append data to an excel file?  As far as I understand 250.000 records of total 200 Mb is not much for the DataSet or DataTable classes nor for xlsx files.

The C# source for writer method is attached.
ExcelReaderMethod.cs
Reply all
Reply to author
Forward
0 new messages