Hi all forum readers that may have knowledge of the ExcelReader provided with the Accord.net
In a "research" project I am working on for using SVM classifiers to analyse blob shapes I am using excel documents for intermediate storage of property data. This can be a large data set 250.000 records or more (final excel file will be 100 - 200 Mb), but the writing has to be done in batches of 2000 records as I am loading the image files for the blobs into memory to calculate the properties. Therefor I am wrting (appending) to the xlsx file in batches. this is taking very long time - and I don't understand why. I am using the ExcelReader sample from your Accord.net web site as template for this. The steps are;
1. create the reader ExcelReader db = new ExcelReader(xlsFileName, true, false);
2. create the dataset and datatabel
DataSet workbook = null;
DataTable worksheet = null;
3. Load the selected datasheet
worksheet = db.GetWorksheet(sn);
workbook.Tables.Add(worksheet);
4. Then I append the new data to the sheet
foreach (blobProperties r in records)
{
worksheet.Rows.Add(new object[] {
(double)r.area,
(double)r.perimeter,
.......
(bool)r.qualifies});
line++;
}
5. Finally I create the modified excel file. The old file is over-written.
bool success = CreateExcelFile.CreateExcelDocument(workbook, xlsFileName);
Steps 1 - 5 are repeated as long as there are any blobs left in my list of images to process.
This works fine for smaller data sets ( <10.000 records) but when I come to larger datasets it takes hours to do this processing. for dataset with 250.000 records I need to iterate 125 times through this loop.
Am I doing this wrong, i.e. is this not the way to append data to an excel file? As far as I understand 250.000 records of total 200 Mb is not much for the DataSet or DataTable classes nor for xlsx files.
The C# source for writer method is attached.