Also, regarding error handling. I saw another post where you responded
about all the fatal parsing errors that your library handles. I think
the point of the original question (and my current question) is how
data conversion errors at the row level are handled. I guess it's more
of a .NET question really since data type errors, key violations, etc.
would actually be thrown from SQL/.NET and not from your library, but
maybe you have some insight as to how this would be best handled in a
way that would let you continue processing while batching errors to
return at the end? Perhaps that could be added to your sample code.
Also, is your sale still on? It's past March 24, but you still have
the price posted as $75 :)
Thanks.
As for exception handling, the topic is of course very complicated. I'm
not sure which thread that you're referring to about exceptions. There
are different types of exceptions that can occur, and in no way are
they all recoverable. If the format is not correct CSV, there's no
place that I can picture for the parser to pick up at. If I remember
the thread that you're referring to, that's the situation I discussed.
Now if you're picturing maybe a value in a column that's defined as a
double, and the value at the CSV level can not be converted into a
double, I could see a way to recover. This situation is not currently
supported, but I might think about it. Now if the data can not be
inserted into the database for some reason, be it a data conversion
issue or whatever, continuing on after the exception is not currently
handled by SqlBulkCopy, so continuing on is rather difficult. I suppose
that you could create another instance of SqlBulkCopy, and pass in the
half used CsvDataReader instance, and as far as I can picture offhand,
it should continue on after the exception starting with the next
record. I suppose this solution should also handle the situation above
equally well. I try to make my code samples on my site reasonably
straightforward and relatively few of them to keep the confusion level
down. There's a million different ways these classes can be used, so
for specific situations, people just more have to request some
suggestions directly if they have questions.
I extended the sale to the 31st, so you shouldn't have any problems
there.
I figure most of the answers above are basically mute points since you
apparently are stuck in 1.1. In 1.1, I would suggest using DTS to bulk
load the data into SQL Server, and possibly use my parser to validate
the data beforehand, since you are again going to have issues trying to
recover and continue from exceptions while bulk loading with DTS.
Bruce Dunwiddie
Based on your feedback, I am thinking that in 1.1 I should stick to the
low-tech approach and loop through the import file manually and save
each row myself. This should also allow me to do any validation and
error-handling at the row level in the case of mismatched column
lengths or data type issues. I don't think performance should be a
huge issue for me -- some files could be large, but the order of
magnitude will normally only be in the hundreds of records per file
imported at any given time. I suppose I could could also execute the
upload processing on multiple threads if it became a bottleneck. So
based on that, does the following seem like a reasonable apporach to
you?
- Parse the upload file stream using your CSV library
- Output the stream to a DataTable in memory
- Open my DB connection
- Loop through the DataRows, saving each row to the DB
- Close the DB connection and clean up
Thanks again for the feedback. I look forward to trying out your
component.
Bruce Dunwiddie