Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Help !!! Problem about FlateDecode(using zlib)

2,064 views
Skip to first unread message

KongHu

unread,
Dec 2, 2002, 3:18:33 PM12/2/02
to
Hey guys,

I am working on a program to extract text from pdf file.
I use Zlib to handle FlateDecode, it works great in most
situation. But sometimes it fails.

for example I met a block as:

1932 0 obj
<< /Length 759 /Filter /FlateDecode >>
stream
////////////////////////////Hex Code
A3 A6 02 38 D5 19 EB FB 05 99 52 6A B7 CD 8E 3C
39 E5 35 E3 3B 76 29 6F 10 AA 3F 17 F5 82.......
///////////////////////////

The first byte is A3, according to the specification
of Zlib, 0xA3&0xF != 8, so it's not a stream with Zlib header.

This first two bytes is not "1F 8B" so it's not a gzip stream

I assume it is a bare inflated stream. I use flateInit2 with
parameter -15 to init zlib and then use inflate to decode the
stream. but no luck. there's no problem.it return Z_STREAM_END
but the return buffer is empty, coz the compressed data is 759
bytes, so I don't believe the return data is an empty string.

I paste my code here
///////////////////////////////////
string StreamFilter::FlateDecode(string i_strData)
{
unsigned char* compr = (unsigned char*) i_strData.c_str();
unsigned long comprLen = i_strData.size();
string strRslt = "";
int err;
z_stream d_stream; //* decompression stream

strcpy((char*)m_pUncompr, "garbage");

d_stream.zalloc = (alloc_func)0;
d_stream.zfree = (free_func)0;
d_stream.opaque = (voidpf)0;

d_stream.next_in = compr;
d_stream.avail_in = (uInt)comprLen;

if(i_strData[0]&0xf != Z_DEFLATED)
{
err = inflateInit2(&d_stream, -15);
}
else
{
err = inflateInit(&d_stream);
}
CHECK_ERR(err, "inflateInit");


for (;;)
{

d_stream.next_out = m_pUncompr; //* discard the output /
d_stream.avail_out = (uInt)m_pUncomprLen;

err = inflate(&d_stream, Z_SYNC_FLUSH );//Z_NO_FLUSH

strRslt.append((char*) m_pUncompr, d_stream.total_out);

if (err == Z_STREAM_END )
{
break;
}
else if(err != Z_OK)
{
break;
}

}

err = inflateEnd(&d_stream);
CHECK_ERR(err, "inflateEnd");

return strRslt;
}

///////////////////////////////////


any comment is appreciated

Thx,

Lino

Derek B. Noonburg

unread,
Dec 2, 2002, 4:27:53 PM12/2/02
to
In article <tePG9.91774$ea.15...@news2.calgary.shaw.ca>, KongHu wrote:
> I am working on a program to extract text from pdf file.
> I use Zlib to handle FlateDecode, it works great in most
> situation. But sometimes it fails.
>
> for example I met a block as:
>
> 1932 0 obj
> << /Length 759 /Filter /FlateDecode >>
> stream
> ////////////////////////////Hex Code
> A3 A6 02 38 D5 19 EB FB 05 99 52 6A B7 CD 8E 3C
> 39 E5 35 E3 3B 76 29 6F 10 AA 3F 17 F5 82.......
> ///////////////////////////
>
> The first byte is A3, according to the specification
> of Zlib, 0xA3&0xF != 8, so it's not a stream with Zlib header.

Is the PDF file encrytpted? Encryption will modify the stream data,
but leave all of the surrounding stuff ("<< /Length ... >> stream")
untouched.

- Derek

0 new messages