API for very large content/attachments

Orin

unread,

Aug 11, 2010, 10:55:13 PM8/11/10

to java-libpst-discuss

(See Issue 20.)

In my opinion, it would be impractical to read a very large message or
attachment into a byte array all at once, so I'd like to open
discussion on how we should handle such data.

I'm thinking some kind of stream interface that the user could read in
reasonably sized chunks so that all the data need not be in memory at
the same time.

There is the associated problem of informing the user to use the
stream interface rather than the existing interfaces that return a
byte array. I see a couple of options - an exception that says the
data is too large and a separate API to return a stream, or a special
exception that contains the stream - this wouldn't need any new APIs.
The transition between just returning a byte array and throwing the
exception would be when the data is in an XXBLOCK rather than an
XBLOCK or regular block - a little under 8MB by my quick calculations.

Orin.

Richard Johnson

unread,

Aug 12, 2010, 6:30:05 AM8/12/10

to java-libp...@googlegroups.com

Totally agree, I think that something like a PSTAttachmentInputStream would be the way to go.

I do think however that we would be better off completely removing the getFileContents function rather than throwing a exception, as I feel that most users will forget to test for the exception, and given the rarity of the XXBlock (they weren't even in the docs I had originally) I wouldn't be surprised if it's not picked up. I concede it is inconvenient to change the API on people, but if people have to work in code to deal with exceptions, we may as well just make them move over to using an input stream.

Thoughts? If we are happy with this I could probably look into it a bit over the weekend.

Richard

Orin Eman

unread,

Aug 12, 2010, 12:26:23 PM8/12/10

to java-libp...@googlegroups.com

I've been working on Issue 20 - I added a method in PSTFile to read data blocks so it can be used by all the other objects. For now, I'll just make it throw an exception if it sees an XXBLOCK - I believe the exception would be caught within libpst and the current user API would return no data. I haven't seen an XXBLOCK in the PST files I have around. I'll get these changes into svn for you to look at.

So you are thinking of replacing getFileContents() with something like getFileInputStream() that works with any size file? That would make sense to me.

Orin.

Richard Johnson

unread,

Aug 12, 2010, 1:47:56 PM8/12/10

to java-libp...@googlegroups.com

With the new method you're adding to PSTFile, would this be similar to the existing PSTObject.processArray? Or am I confusing this with something else? (not too familiar with the MS terminology just yet). If this is what you had in mind, we could just modify isPSTArray to throw an exception when it hits a 0x1 0x2 signature citing the lack of support for items with XXBlocks.

I acknowledge that it's all a bit messy when it gets to these bits, I only really got the hang of reading items in arrays towards the end (hence annoying functions like getBlockOffsets, which is really duplicating the reading process, at the time I didn't realise items would use the offsets later).

Yep, that's pretty much exactly how I was going to handle it for the Attachments. As it will be a stream we should be able to support XXBlocks without a problem there.

Richard

Orin

unread,

Aug 12, 2010, 2:19:29 PM8/12/10

to java-libpst-discuss

It wraps PSTObject.processArray() and getBlockOffsets() as well as
getOffsetIndexNode(). The same sequence is used in many places:

class PSTFileBlock{
byte[] data = null;
int[] blockOffsets = null;
}

public PSTFileBlockreadLeaf(long bid)
throws IOException, PSTException
{
PSTFileBlockret = new PSTFileBlock();

// get the index node for the descriptor index
OffsetIndexItem offsetItem = PSTObject.getOffsetIndexNode(in, bid);
boolean bInternal = (offsetItem.indexIdentifier & 0x02) != 0;

ret.data = new byte[offsetItem.size];
in.seek(offsetItem.fileOffset);
in.read(ret.data);

if ( bInternal &&
offsetItem.size >= 8 && // temporary, too short should throw an
exception
ret.data[0] == 1 )
{
// (X)XBLOCK
if ( ret.data[1] == 2 ) {
throw new PSTException("XXBLOCKS not supported yet!");
}
ret.blockOffsets = PSTObject.getBlockOffsets(in, ret.data);
ret.data = PSTObject.processArray(in, ret.data);
bInternal = false;
}

// (Internal blocks aren't compressed)
if ( !bInternal &&
encryptionType == PSTFile.ENCRYPTION_TYPE_COMPRESSIBLE)
{
ret.data = PSTObject.decode(ret.data);
}

return ret;
}

It could go in PSTObject as a static method instead - but would then
need the PSTFile as a parameter. I think both processArray and
getBlockOffsets could be combined into this function, but I haven't
looked at that yet - this method is a step in that direction though.

For tables in XBLOCKS, I've made the rgbiAlloc array two dimensional
so getNodeInfo() doesn't need to treat whichBlock > 0 specially.

Orin.

> > On Thu, Aug 12, 2010 at 3:30 AM, Richard Johnson <rjohnson.id...@gmail.com

> > > wrote:
>
> >> Totally agree, I think that something like a PSTAttachmentInputStream
> >> would be the way to go.
>
> >> I do think however that we would be better off completely removing the
> >> getFileContents function rather than throwing a exception, as I feel that
> >> most users will forget to test for the exception, and given the rarity of
> >> the XXBlock (they weren't even in the docs I had originally) I wouldn't
> >> be surprised if it's not picked up. I concede it is inconvenient to change
> >> the API on people, but if people have to work in code to deal with
> >> exceptions, we may as well just make them move over to using an input
> >> stream.
>
> >> Thoughts? If we are happy with this I could probably look into it a bit
> >> over the weekend.
>
> >> Richard
>

> >> On Thu, Aug 12, 2010 at 3:55 AM, Orin <orin.e...@gmail.com> wrote:
>
> >>> (See Issue 20.)
>
> >>> In my opinion, it would be impractical to read a very large message or
> >>> attachment into a byte array all at once, so I'd like to open
> >>> discussion on how we should handle such data.
>
> >>> I'm thinking some kind of stream interface that the user could read in
> >>> reasonably sized chunks so that all the data need not be in memory at
> >>> the same time.
>
> >>> There is the associated problem of informing the user to use the
> >>> stream interface rather than the existing interfaces that return a
> >>> byte array. I see a couple of options - an exception that says the
> >>> data is too large and a separate API to return a stream, or a special
> >>> exception that contains the stream - this wouldn't need any new APIs.
> >>> The transition between just returning a byte array and throwing the
> >>> exception would be when the data is in an XXBLOCK rather than an
> >>> XBLOCK or regular block - a little under 8MB by my quick calculations.
>

> >>> Orin.- Hide quoted text -
>
> - Show quoted text -

Richard Johnson

unread,

Aug 12, 2010, 2:30:38 PM8/12/10

to java-libp...@googlegroups.com

Sounds great and looks much cleaner. I'll throw a proposed input stream for attachments together shortly and get you to have a look over it.

Richard Johnson

unread,

Aug 15, 2010, 4:03:46 PM8/15/10

to java-libp...@googlegroups.com

Okay, I've thrown some code into SVN which implements an input stream for attachments.

It supports XBLOCK and XXBLOCKs, and also works with stupidly small attachments that don't use external refs. This registry hack was useful for getting large amounts of data into the PST: http://www.slipstick.com/outlook/ol2010/attachment_size.asp

The performance seems decent (for the read functions that take an array at least). Let me know if you have feedback.

Richard

Franck

unread,

Aug 16, 2010, 4:36:00 PM8/16/10

to java-libpst-discuss

Hi,

I've tested input stream method for some of my attachments and it
works well.

Thx

On 15 août, 22:03, Richard Johnson <rjohnson.id...@gmail.com> wrote:
> Okay, I've thrown some code into SVN which implements an input stream for
> attachments.
>
> It supports XBLOCK and XXBLOCKs, and also works with stupidly small
> attachments that don't use external refs. This registry hack was useful for
> getting large amounts of data into the PST:http://www.slipstick.com/outlook/ol2010/attachment_size.asp
>
> The performance seems decent (for the read functions that take an array at
> least). Let me know if you have feedback.
>
> Richard
>
> On Thu, Aug 12, 2010 at 11:30 AM, Richard Johnson

> <rjohnson.id...@gmail.com>wrote:

>
>
>
> > Totally agree, I think that something like a PSTAttachmentInputStream would
> > be the way to go.
>
> > I do think however that we would be better off completely removing the
> > getFileContents function rather than throwing a exception, as I feel that
> > most users will forget to test for the exception, and given the rarity of
> > the XXBlock (they weren't even in the docs I had originally) I wouldn't
> > be surprised if it's not picked up. I concede it is inconvenient to change
> > the API on people, but if people have to work in code to deal with
> > exceptions, we may as well just make them move over to using an input
> > stream.
>
> > Thoughts? If we are happy with this I could probably look into it a bit
> > over the weekend.
>
> > Richard
>

Rajkumar Manuka

unread,

Jun 19, 2012, 6:36:02 AM6/19/12

to java-libp...@googlegroups.com

Hi All,

I have tried for reading the .txt file which attachment but I'm getting Exception like ...filesize is 0.
other than .txt another formats are working fine.

Thanku in Advance

Ben Reid

unread,

Jun 28, 2012, 12:25:40 AM6/28/12

to java-libp...@googlegroups.com

I am also having trouble saving text file attachments.

All my binary attachments are working fine (XLS, MDB, PNG, GIF - they all work), but saving a plain text file attachment results in garbled text.

Am I supposed to be using a different method to save plain text attachments instead of getFileInputStream() ?

Reply all

Reply to author

Forward