Lost Data - Missing a Chunk of rows in the middle of the data

141 views
Skip to first unread message

judson

unread,
Jan 5, 2011, 10:36:46 AM1/5/11
to python-excel
I have pulled data from an Oracle database and I would like to write
it to an Excel file. It writes to csv just fine, but using xlwt it
alway skips a varying amount of rows (6 - 20) after row 2544 or 2545
and then resumes writing rows. Every time I open the file Excel says
it has to recover data and that some may have been lost. I have
cell_overwrite_ok=False. I have tried writing the xls document in two
slightly different ways with the same results. Sorry if this is a
basic question, but I haven't seen a similar issue in the forum. Thank
you in advance.

1st way:

# i = 1 to start after headers

i = 1

# Write results of query to .XLS

for row in res:
print row
for col_index, datum in enumerate(row):
ws.write(i,col_index,datum)
ws.flush_row_data()
i = i + 1

wb.save(file_name_date + 'Trial.xls')


2nd way:

# i = 1 to start after headers

i = 1

# Write results of query to .XLS

for row in res:
print row
print row[0]
print row[1]
print row[2]
print row[3]
print row[4]
print i
value_0 = row[0]
value_1 = row[1]
value_2 = row[2]
value_3 = row[3]
value_4 = row[4]
ws.write(i, 0,value_0)
ws.write(i, 1,value_1)
ws.write(i, 2,value_2)
ws.write(i, 3,value_3)
ws.write(i, 4,value_4)
ws.flush_row_data()
i = i + 1

wb.save(file_name_date + 'Trial.xls')

John Machin

unread,
Jan 5, 2011, 2:52:33 PM1/5/11
to python-excel


On Jan 6, 2:36 am, judson <judson.morri...@gmail.com> wrote:
> I have pulled data from an Oracle database and I would like to write
> it to an Excel file. It writes to csv just fine, but using xlwt it
> alway skips a varying amount of rows (6 - 20) after row 2544 or 2545
> and then resumes writing rows. Every time I open the file Excel says
> it has to recover data and that some may have been lost. I have
> cell_overwrite_ok=False. I have tried writing the xls document in two
> slightly different ways with the same results. Sorry if this is a
> basic question, but I haven't seen a similar issue in the forum. Thank
> you in advance.

Hi, Judson,

Does the Excel recovery log give you any clues?

Try opening the file with as many of (xlrd, OpenOffice.org Calc,
Gnumeric) as you conveniently can. Any more clues?

>
> 1st way:
>
> # i = 1 to start after headers
>
> i = 1
>
> # Write results of query to .XLS
>
> for row in res:
>     print row

Doing
print i, row
might make it easier to compare this output with the output xls
file ...

Have you noticed anything strange about the incoming data in the
affected band of rows?

What are the Oracle types of each column? What is the maximum length
of string that you have observed?

What database-reading gadget (e.g. cxOracle) are you using? What
version?

What Python datatype are you getting for strings: unicode, or str
(encoded as UTF-8? something else?)?

If there are any BLOBs, what are you doing with them?

What OS, what Python version, what xlwt version, what Excel version?

What does
python -c"import sys; print repr(sys.maxunicode)"
produce?


>     for col_index, datum in enumerate(row):
>         ws.write(i,col_index,datum)
>     ws.flush_row_data()

What happens if you comment out the ws.flush_row_data()?

Quoting from the tutorial that you can access via http://www.python-excel.org
:

"""
If a large number of rows have been written to a Worksheet and memory
usage is
becoming a problem, the flush_row_data method may be called on the
Worksheet.
Once called, any rows flushed cannot be accessed or modified.
It is recommended that flush_row_data is called for every 1000 or so
rows of a normal
size that are written to an xlwt.Workbook. If the rows are huge, that
number should be
reduced.
"""

Why are you calling flush_row_data() after each row, instead of doing
something like

if i % 1000 == 0:
ws.flush_row_data()

? Why are you calling flush_row_data() at all? Are your rows huge? If
so, what is making a 5-element row huge?

Note that the maximum size of string that Excel 2003 and 2007 support
is 32767 Unicode characters. Until a few weeks ago, I was under the
impression that the maximum was 65535 (the length is held in what the
MS docs describe as a 16-bit unsigned integer). Consequently each xlwt
release either doesn't check at all, or checks for not exceeding
65535.

Please add some code to check for max 32767 characters.

>     i = i + 1
>
> wb.save(file_name_date + 'Trial.xls')
>
> 2nd way:

... snipped; it's logically the same as the first code, except that it
tells us that there are 5 columns. Please confirm.

Cheers,
John

judson

unread,
Jan 5, 2011, 4:02:13 PM1/5/11
to python-excel
John,

The error when opened by Excel 2007 says that:
Excel found unreadable content in ‘Trial.xls’. Do you want to recover
the contents of the workbook? If you trust the source of the workbook,
click yes.

XML output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/
2006/main"><logFileName>error006600_01.xml</
logFileName><summary>Errors were detected in file 'C:
\Scheduled_Python_Output\Trial.xls'</
summary><additionalInfo><info>Excel recovered your formulas and cell
values, but some data may have been lost.</info></additionalInfo></
recoveryLog>

Print i, row does make it more readable, but I still don’t see any
difference in the data for those rows.

No, I haven’t noticed anything different about the affected rows, but
I suppose it could be possible.

Honestly, I am still learning my way around Oracle. The data comes
back as one list per row. Within the list I have:
String, String, Int, Datetime, Int

I am using cxOracle, ver:
Cx_Oracle-5.0.4

The strings are no longer than 3 characters.

I have tried Unicode and UTF-8 for xlwt and I don’t see a difference
in the output either way.

I don’t know what a BLOB is, sorry.

Unfortunately, I am on Windows XP and Excel 2007. Python version is
2.5.4. Xlwt version is 0.7.2.

print repr(sys.maxunicode) gives:
65535
No difference with or without flushing the rows. The data is not huge
at all, just thought I would try it.

I don’t think that any of my data is over 32767 characters in length.

Yes, there are five columns.

I hope I answered everything you asked. Thank you so much for replying
to my message. I hope my answers give you some idea of what it could
be; this is driving me nuts. Again, thank you.
> Quoting from the tutorial that you can access viahttp://www.python-excel.org

John Machin

unread,
Jan 5, 2011, 5:17:29 PM1/5/11
to python-excel


On Jan 6, 8:02 am, judson <judson.morri...@gmail.com> wrote:
> John,
>
> The error when opened by Excel 2007 says that:
>         Excel found unreadable content in ‘Trial.xls’. Do you want to recover
> the contents of the workbook? If you trust the source of the workbook,
> click yes.
>
> XML output:
>         <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/
> 2006/main"><logFileName>error006600_01.xml</
> logFileName><summary>Errors were detected in file 'C:
> \Scheduled_Python_Output\Trial.xls'</
> summary><additionalInfo><info>Excel recovered your formulas and cell
> values, but some data may have been lost.</info></additionalInfo></
> recoveryLog>

Hi Judson, In other words, "no clues".

>
> Print i, row does make it more readable, but I still don’t see any
> difference in the data for those rows.
>
> No, I haven’t noticed anything different about the affected rows, but
> I suppose it could be possible.
>
> Honestly, I am still learning my way around Oracle. The data comes
> back as one list per row. Within the list I have:
>         String, String, Int, Datetime, Int

Do you mean that you have Python objects whose types are:

( 2 x (str and/or unicode)), int, datetime.datetime, and int

or are they Oracle types (I've fortunately forgotten most of what I
knew about Oracle)?

>
> I am using cxOracle, ver:
>         Cx_Oracle-5.0.4
>
> The strings are no longer than 3 characters.
>
> I have tried Unicode and UTF-8 for xlwt and I don’t see a difference
> in the output either way.
>
> I don’t know what a BLOB is, sorry.

BinaryLargeOBject ... be happy, not sorry :-)

>
> Unfortunately, I am on Windows XP and Excel 2007. Python version is
> 2.5.4. Xlwt version is 0.7.2.
>
> print repr(sys.maxunicode) gives:
>         65535
> No difference with or without flushing the rows. The data is not huge
> at all, just thought I would try it.
>
> I don’t think that any of my data is over 32767 characters in length.

Don't think about your data; please, as requested, insert checking
code.

> Yes, there are five columns.
>
> I hope I answered everything you asked.

Opening the file with xlrd was inconvenient?

> Thank you so much for replying
> to my message. I hope my answers give you some idea of what it could
> be

Unfortunately, not so far. Strings over 32767 characters would have
been my best bet, but I can't create a file that would cause Excel
recovery to omit even one row. When there's only one over-sized string
out of many thousands, its "recovery" consists of treating ALL text
cells as existing but empty.

Can you send me (private email, informal "handshake NDA preferred):

(a) Your full script (you may wish to redact any userid/password/etc)
using "way 1" without the flush_row_data()
(b) copy of the xls file as it exists immediately after creation by
xlwt using that script
(c) copy of a slice of the "print i, row" output covering the rows
omitted by Excel plus a few non-omitted rows before and after

Cheers,
John

Judson Morrison

unread,
Jan 5, 2011, 6:20:05 PM1/5/11
to python...@googlegroups.com
I will send more info directly to you tomorrow if the problem still exists.

A couple of interesting things.

I am sure all of my data is below the max length limit.

If I read the file using XLRD, all data is returned without any blank rows.

The output to a csv file still works flawlessly. Thank you for all of
your help. If you'd prefer to have the direct information tonight, let
me know.

Thank you.

> --
> You received this message because you are subscribed to the Google Groups "python-excel" group.
> To post to this group, send an email to python...@googlegroups.com.
> To unsubscribe from this group, send email to python-excel...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/python-excel?hl=en-GB.
>
>

John Machin

unread,
Jan 5, 2011, 7:28:32 PM1/5/11
to python-excel


On Jan 6, 10:20 am, Judson Morrison <judson.morri...@gmail.com> wrote:
> I will send more info directly to you tomorrow if the problem still exists.

If there's a possibility that the problem may magically vanish (e.g.
due to database changes), I'd like to get something to work on as soon
as possible.

> A couple of interesting things.
>
> I am sure all of my data is below the max length limit.
>
> If I read the file using XLRD, all data is returned without any blank rows.

Is the problem "missing" rows or "blank" rows???

>
> The output to a csv file still works flawlessly. Thank you for all of
> your help. If you'd prefer to have the direct information tonight, let
> me know.

Yes, please. As soon as possible. In any case it's only 11:25 AM in my
TZ :-)

Cheers,
John

Judson Morrison

unread,
Jan 5, 2011, 7:40:19 PM1/5/11
to python...@googlegroups.com
John,

I can't thank you enough for you help.

I suppose the problem could resolve itself, but it is worse today than
yesterday.

Honestly, I am hesitant to send you the output. It isn,t personal, I'm
just a bit paranoid. Let me sleep on it, please.

And you request for clarification is tough. The total number of rows
is correct, but the missing rows are blank. So the rows that are a
problem are both blank and missing.

Look forward to (in all likelihood) sending you more information tomorrow.

Thank you for your help.

Regards,

Judson

John Machin

unread,
Jan 5, 2011, 8:14:15 PM1/5/11
to python-excel


On Jan 6, 11:40 am, Judson Morrison <judson.morri...@gmail.com> wrote:
> John,
>
> I can't thank you enough for you help.
>
> I suppose the problem could resolve itself, but it is worse today than
> yesterday.
>
> Honestly, I am hesitant to send you the output. It isn,t personal, I'm
> just a bit paranoid. Let me sleep on it, please.

So send me the (redacted) code now.

> And you request for clarification is tough. The total number of rows
> is correct, but the missing rows are blank. So the rows that are a
> problem are both blank and missing.

If the total number of rows is correct, how can there be "missing"
rows? Do you mean that rows that should contain valid data are
appearing as "blank" after Excel does its "recovery"?

What is the total number of rows, approximately?

Cheers,
John

Judson Morrison

unread,
Jan 5, 2011, 8:41:56 PM1/5/11
to python...@googlegroups.com
Yes, they appear blank.

There are about 5,500 rows.

Where in the world are you?

Thanks again?

Reply all
Reply to author
Forward
0 new messages