We would like to directly edit the decoded bytes of a page content stream and would like to know the most efficient and safe way to do so with PDFNet. By "safe", we mean in terms of making the fewest changes possible to the PDF as a whole, ideally changing only the stream bytes (and of course updating the stream's "Length" entry accordingly) - we understand that it's our responsibility to ensure that our modifications to the stream's contents are still valid according to the PDF spec. Currently our approach is as follows:
1. Read decoded stream data to a byte array:
private static byte[] GetStreamBytes(Obj stream)
{
byte[] bytes;
using (MemoryStream memoryStream = new MemoryStream())
{
Filter filter = stream.GetDecodedStream();
FilterReader filterReader = new FilterReader(filter);
byte[] buffer = new byte[filter.Size()];
int length = 0;
while ((length = filterReader.Read(buffer)) > 0) {
memoryStream.Write(buffer, 0, length);
}
bytes = memoryStream.ToArray();
memoryStream.Close();
}
return bytes;
}
2. Modify stream bytes as desired.
3. Write the modified stream bytes back to the original stream object. We have tried three approaches for this:
a)
stream.SetStreamData(byteArray);
This approach does not appear to preserve the encoding type of the original stream data (i.e. just writes uncompressed data).
b)
stream.SetStreamData(byteArray, new FlateEncode(null));
This approach always uses Flate encoding.
c)
Obj contentsNew = pdf.CreateIndirectStream(memoryStream.ToArray(), new FlateEncode(null));
pdf.GetSDFDoc().Swap(contents.GetObjNum(), contentsNew.GetObjNum());
// TODO: Copy additional entries from old stream dictionary to new one
This approach always uses Flate encoding and also requires some manual copying of entries from one dictionary to another.
Can you please comment on what the best way is to accomplish our overall goal - whether it is one of the approaches taken above, some variation thereof, or something completely different?
A