UTF-16 surrogate leading to out of memory fatal error

232 views
Skip to first unread message

Pankaj

unread,
Feb 11, 2009, 2:49:07 AM2/11/09
to Spreadsheet::ParseExcel
The perl code is as below :

use strict;
use Spreadsheet::ParseExcel;

my $parser = Spreadsheet::ParseExcel->new(
CellHandler => \&cell_handler,
NotSetCell => 1
);

my $workbook = $parser->Parse('bad.xls');

sub cell_handler {

my $workbook = $_[0];
my $sheet_index = $_[1];
my $row = $_[2];
my $col = $_[3];
my $cell = $_[4];

print $cell->unformatted(), "\n";

}



When I run the above code the below error comes and the program dies:

D:\Perl\bin\search tool>perl testa.pl
UTF-16 surrogate 0xdcfc at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xd809 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xde1b at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xda56 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdf37 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xde00 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdfb8 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xd89b at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdb79 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdbb1 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xd83e at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdff8 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdbff at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdd98 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xd9bf at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdcd7 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdde6 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdabe at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdb71 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xd912 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdab0 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
Unicode character 0xfdde is illegal at D:/Perl/site/lib/Spreadsheet/
ParseExcel/FmtDefault.pm line 81.
UTF-16 surrogate 0xdc77 at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm
line 1011.
Use of uninitialized value in length at D:/Perl/site/lib/Spreadsheet/
ParseExcel.pm line 1947.
Use of uninitialized value $sTxt in unpack at D:/Perl/site/lib/
Spreadsheet/ParseExcel/FmtDefault.pm line 81.
substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm
line 1016.
UTF-16 surrogate 0xdeec at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
UTF-16 surrogate 0xdd7e at D:/Perl/site/lib/Spreadsheet/ParseExcel/
FmtDefault.pm line 81.
Out of memory!

jmcnamara

unread,
Feb 11, 2009, 8:05:58 AM2/11/09
to Spreadsheet::ParseExcel


On Feb 11, 7:49 am, Pankaj <pankaj.i...@gmail.com> wrote:

> When I run the above code the below error comes and the program dies:
>
> D:\Perl\bin\search tool>perl testa.pl
> UTF-16 surrogate 0xdcfc at D:/Perl/site/lib/Spreadsheet/ParseExcel/
> FmtDefault.pm line 81.


Hi Pankaj,

Can you send me the system output from the program at the following
link:


http://groups.google.com/group/spreadsheet-parseexcel/browse_thread/thread/c759ffd42612a653?hl=en


Could you also send me a sample Excel file that generates that error.

John.
--

Pankaj

unread,
Feb 12, 2009, 1:46:50 AM2/12/09
to Spreadsheet::ParseExcel
Done.



On Feb 11, 6:05 pm, jmcnamara <jmcnamar...@gmail.com> wrote:
> On Feb 11, 7:49 am, Pankaj <pankaj.i...@gmail.com> wrote:
>
> > When I run the above code the below error comes and the program dies:
>
> > D:\Perl\bin\search tool>perl testa.pl
> > UTF-16 surrogate 0xdcfc at D:/Perl/site/lib/Spreadsheet/ParseExcel/
> > FmtDefault.pm line 81.
>
> Hi Pankaj,
>
> Can you send me the system output from the program at the following
> link:
>
> http://groups.google.com/group/spreadsheet-parseexcel/browse_thread/t...

John McNamara

unread,
Feb 12, 2009, 6:27:59 AM2/12/09
to spreadsheet...@googlegroups.com


On Thu, Feb 12, 2009 at 6:46 AM, Pankaj <panka...@gmail.com> wrote:

Done.


Hi,

Do you have a smaller Excel file that demonstrates the problem?

John.
--

Pankaj

unread,
Feb 12, 2009, 8:14:23 AM2/12/09
to Spreadsheet::ParseExcel
I think you can reduce the size of the file by removing some contents.


I am sending you another file but it may not show UTF problem .




On Feb 12, 4:27 pm, John McNamara <jmcnamar...@gmail.com> wrote:

jmcnamara

unread,
Feb 12, 2009, 11:24:19 AM2/12/09
to Spreadsheet::ParseExcel
On Feb 12, 1:14 pm, Pankaj <pankaj.i...@gmail.com> wrote:
> I think you can reduce the size of the file by removing some contents.

Hi,

No, I can't. The worksheets in both workbooks that you sent are
password protected.


> I am sending you another file but it may not show UTF problem .


The general problem can be a fixed by using the Unicode format when
parsing the file:

#!/usr/bin/perl

use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::FmtUnicode;


my $parser = Spreadsheet::ParseExcel->new(
CellHandler => \&cell_handler,
NotSetCell => 1
);


my $format = Spreadsheet::ParseExcel::FmtUnicode->new();


my $workbook = $parser->Parse('IPL2008.xls', $format );

sub cell_handler {

my $workbook = $_[0];
my $sheet_index = $_[1];
my $row = $_[2];
my $col = $_[3];
my $cell = $_[4];

print $cell->unformatted(), "\n";

}

However, there are some other problems. I don't know if these relate
to the file size or some other issues and since the worksheets are
password protected there isn't much I can do to debug it.

If you wish me to look at the problem further please send an
unprotected file with any worksheets that aren't part of the problem
removed.

Thanks,

John.
--







Pankaj

unread,
Feb 13, 2009, 1:45:31 AM2/13/09
to Spreadsheet::ParseExcel
Those files are not password protected.

I dont know why they are prompting for password.

Pankaj

unread,
Feb 13, 2009, 3:03:47 AM2/13/09
to Spreadsheet::ParseExcel
ok they may prompt for password but only for the first time.



On Feb 12, 9:24 pm, jmcnamara <jmcnamar...@gmail.com> wrote:

jmcnamara

unread,
Feb 13, 2009, 4:36:08 AM2/13/09
to Spreadsheet::ParseExcel

On Feb 13, 6:45 am, Pankaj <pankaj.i...@gmail.com> wrote:
> Those files are not password protected.

Hi Pankaj,

The files aren't password protected. The workbook (the collection of
worksheets) is. So I can't modify the worksheets to isolate the problem
(s).

See the Excel menu option Tools -> Protection -> Unprotect Workbook

John.
--

Pankaj

unread,
Feb 13, 2009, 4:38:24 AM2/13/09
to Spreadsheet::ParseExcel
I dont remember the password.

I downloaded from the internet.

Pankaj

unread,
Feb 13, 2009, 8:39:35 AM2/13/09
to Spreadsheet::ParseExcel
You can remove the password from the excel files through some
softwares.



On Feb 13, 2:36 pm, jmcnamara <jmcnamar...@gmail.com> wrote:

Pankaj

unread,
Feb 17, 2009, 1:42:11 AM2/17/09
to Spreadsheet::ParseExcel
I think UTF encoding is not causing the program to crash.

There is some other reason.



On Feb 13, 2:36 pm, jmcnamara <jmcnamar...@gmail.com> wrote:

jmcnamara

unread,
Feb 17, 2009, 6:08:07 AM2/17/09
to Spreadsheet::ParseExcel
On Feb 17, 6:42 am, Pankaj <pankaj.i...@gmail.com> wrote:
> I think UTF encoding is not causing the program to crash.
>
> There is some other reason.


Hi Pankaj,

I was aware of that. In fact in one of the above responses I said:

"The general problem can be a fixed by using the Unicode format when
parsing the file ... However, there are some other problems. I don't
know if these relate to the file size or some other issues and since
the worksheets are password protected there isn't much I can do to
debug it".

That still stands.

Apart from the UTF issue (which can mainly be fixed by using
FmtUnicode as shown above), there is an issue with pagebreaks in one
of the files that you sent and there is possibly another Unicode
problem. The main "out of memory" error may just be down to the size
of the file or the number of strings in the file.

However, if I can't modify the Excel file that is producing the
problems then I cannot debug it.

So, if you wish me to help you then you are going to have to send me
an un-protected file that demonstrates the problem. Without that there
is no way that I can help you.

John.
--

Pankaj

unread,
Feb 17, 2009, 8:12:18 AM2/17/09
to Spreadsheet::ParseExcel
When is the excel file prompting for password : at the start or inside
the sheets ?

jmcnamara

unread,
Feb 17, 2009, 10:41:49 AM2/17/09
to Spreadsheet::ParseExcel


On Feb 17, 1:12 pm, Pankaj <pankaj.i...@gmail.com> wrote:
> When is the excel file prompting for password : at the start or inside
> the sheets ?

Hi Pankaj,

I already explained that previously in this thread:

http://groups.google.com/group/spreadsheet-parseexcel/msg/8d3fcbae7984ef13

John.
--

Pankaj

unread,
Feb 18, 2009, 1:07:08 AM2/18/09
to Spreadsheet::ParseExcel
I have sent you another file which is causing that error.
> http://groups.google.com/group/spreadsheet-parseexcel/msg/8d3fcbae798...
>
> John.
> --

jmcnamara

unread,
Feb 18, 2009, 8:26:16 AM2/18/09
to Spreadsheet::ParseExcel
On Feb 18, 6:07 am, Pankaj <pankaj.i...@gmail.com> wrote:
> I have sent you another file which is causing that error.


Hi Pankaj,

It looks like the Parse() error is caused by the fact that the
workbook is password protected.

When Excel adds Workbook level protection it encrypts the data in the
file. ParseExcel doesn't currently handle encrypted files so it reads
in garbage-like data for row and column entries and creates enormous
arrays that eat of the available memory.

There is an RT tracker for this and it is on my to-do list to fix it
as soon as possible.

For now try to avoid parsing password protected files.

John.
--

Pankaj

unread,
Feb 18, 2009, 8:42:29 AM2/18/09
to Spreadsheet::ParseExcel
ok John.

How to avoid parsing password protected files ?


Is Win32::OLE module of any use ?

jmcnamara

unread,
Feb 19, 2009, 6:34:23 AM2/19/09
to Spreadsheet::ParseExcel
On Feb 18, 1:42 pm, Pankaj <pankaj.i...@gmail.com> wrote:
>
> How to avoid parsing password protected files ?
>


Hi,

Currently you can't. I try implement something in an upcoming release.


> Is Win32::OLE module of any use ?

I don't know.

John.
--
Reply all
Reply to author
Forward
0 new messages