I have a situation where files sent to me have a character after the
last field delimiter. I am getting back an error message:
Bulk Copy FAILED: Maximum column length of 100,000 exceeded in column 39
in record 2,477. Set the SafetySwitch property to false if you're
expecting column lengths greater than 100,000 characters to avoid this
error.
I am not exactly sure why the error message reads the way it does but I
can tell you this:
1) There are 153 field delimiters ( I use the pipe symbol | )
2) The longest line in the entire file is only 987 characters total,
including the pipe symbol field delimiters.
3) The line with the error only has 783 columns in it total including
delimiters (as counted by UltraEdit)
4) Every single line has the same number of delimiter characters in it,
so it is not that one has too few or two many.
I am importing over 50 of these files, with millions of records, most
import, but a small handful fail with this kind of error.
I had this error long ago and contacted you but never really resolved
the problem. Because of the size of these files it is difficult to
really view them with ultraedit (gigabytes of text).
This time I wrote some code to split the file apart and write just a
chunk containing the error line as reported by CSVReader and sure
enough, when I go to the line indicated there is (in this case) a single
0 at the end of the line, before the crlf.
I tried feeding just this smaller "chunk" back through my import program
and it failed with the same error message (different line number of course).
Anyway, I have the "smaller" file (7.9 megs uncompressed) if you would
like to look at it.
In the meantime I need to know if there is an easy way to get at the
line to strip anything off after the last delimiter so that I can
proceed to import my data. I already sink CSVData_ReadRecord but I
think you said that event never even fired because your code didn't
consider it a line.
Anything you can do to help me figure this out would be much appreciated.
--
John W. Colby
www.ColbyConsulting.com
I tracked it down. I use a | character as the field delimiter. I had
set "Use text qualifier" as true so as to strip out the quotes around a
field if any were encountered. Sure enough there is an "opening" text
qualifier but no "closing" text qualifier.
Is this a common problem? It certainly seems to be a common problem
with the data files from this one company I am getting data from, and it
seems that it could be encountered often in "hand entered" data where a
person could simply type the " (text qualifier) by mistake in the middle
of real data.
I am wondering if you can put in a "fix dirty data" property to tell
your code to strip the text qualifier if a field delimiter is
encountered before a close text qualifier.
IOW I chose a pipe because I have never seen it used in normal text
(non-programming language text). I need to strip ALL text qualifiers
(") out of the data but would have to pre-process every file in order to
do this.
Question, does CSVReader read an entire line (to the crlf for example)
and then process that? If so could you read the line and raise a
"preprocess" event that would hand the event handler the line of data by
reference? I could then do a simple replace(LineData,'"','') on every
line every time.
If I can't do something like that how do you recommend handling this
data. I do have other data files where the field delimiter is the more
normal comma and so would need the normal processing of text qualifiers.
IOW is there some standard programmatic way of handling this issue or am
I stuck with manually finding and fixing bad data?
Thanks for your help on this,
John W. Colby
--
John W. Colby
www.ColbyConsulting.com