On Jan 6, 2:36 am, judson <
judson.morri...@gmail.com> wrote:
> I have pulled data from an Oracle database and I would like to write
> it to an Excel file. It writes to csv just fine, but using xlwt it
> alway skips a varying amount of rows (6 - 20) after row 2544 or 2545
> and then resumes writing rows. Every time I open the file Excel says
> it has to recover data and that some may have been lost. I have
> cell_overwrite_ok=False. I have tried writing the xls document in two
> slightly different ways with the same results. Sorry if this is a
> basic question, but I haven't seen a similar issue in the forum. Thank
> you in advance.
Hi, Judson,
Does the Excel recovery log give you any clues?
Try opening the file with as many of (xlrd, OpenOffice.org Calc,
Gnumeric) as you conveniently can. Any more clues?
>
> 1st way:
>
> # i = 1 to start after headers
>
> i = 1
>
> # Write results of query to .XLS
>
> for row in res:
> print row
Doing
print i, row
might make it easier to compare this output with the output xls
file ...
Have you noticed anything strange about the incoming data in the
affected band of rows?
What are the Oracle types of each column? What is the maximum length
of string that you have observed?
What database-reading gadget (e.g. cxOracle) are you using? What
version?
What Python datatype are you getting for strings: unicode, or str
(encoded as UTF-8? something else?)?
If there are any BLOBs, what are you doing with them?
What OS, what Python version, what xlwt version, what Excel version?
What does
python -c"import sys; print repr(sys.maxunicode)"
produce?
> for col_index, datum in enumerate(row):
> ws.write(i,col_index,datum)
> ws.flush_row_data()
What happens if you comment out the ws.flush_row_data()?
Quoting from the tutorial that you can access via
http://www.python-excel.org
:
"""
If a large number of rows have been written to a Worksheet and memory
usage is
becoming a problem, the flush_row_data method may be called on the
Worksheet.
Once called, any rows flushed cannot be accessed or modified.
It is recommended that flush_row_data is called for every 1000 or so
rows of a normal
size that are written to an xlwt.Workbook. If the rows are huge, that
number should be
reduced.
"""
Why are you calling flush_row_data() after each row, instead of doing
something like
if i % 1000 == 0:
ws.flush_row_data()
? Why are you calling flush_row_data() at all? Are your rows huge? If
so, what is making a 5-element row huge?
Note that the maximum size of string that Excel 2003 and 2007 support
is 32767 Unicode characters. Until a few weeks ago, I was under the
impression that the maximum was 65535 (the length is held in what the
MS docs describe as a 16-bit unsigned integer). Consequently each xlwt
release either doesn't check at all, or checks for not exceeding
65535.
Please add some code to check for max 32767 characters.
> i = i + 1
>
> wb.save(file_name_date + 'Trial.xls')
>
> 2nd way:
... snipped; it's logically the same as the first code, except that it
tells us that there are 5 columns. Please confirm.
Cheers,
John