Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.
That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).
I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.
The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:
http://en.wikipedia.org/wiki/Open_Packaging_Conventions
Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608
FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php
However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.
I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.
thanks in advance, folks.
just a minor comment:
I can't help so much, and am by no means an expert on OPC.
but, one thing I am aware of is that, in general, Mac-produced ZIP files are
not themselves "plain old zip files", since the Mac actually hacks and
extends the format in a few ways in order to support various Mac specific
features (such as the difference between data and resource forks, ...).
and, so, it is also possible that office is like "WTF is this?..." when
running into said Mac'isms...
I forget, but I think StuffIt and friends may also use other, non-deflate,
compression algos, which is a potential risk (if this is done, there would
likely be an option somewhere to force the use of deflate).
I say this because I think I remember there being a few 'StuffIt' entries in
the list of compression algos, but I am not certain here (and most ZIP
handlers will only accept Deflate or maybe Deflate64, but not all unzip
tools support Deflate64 either, so it is a big tradeoff of maybe a little
more compression for certain tools refusing to accept said files...).
so, the big question is if, infact, the zipfile data you would be producing
is 'orthodox', beyond just its conformance to OPC specifics...
or such...
good input, thanks for the info. I hadn't thought of the resource
forks; you're right, that could be causing the problem.
I have a hopeful message to the good folks at Stuffit; let's hope they
can shed some light.
thanks again.
you're definitely ahead of me on ideas, B... I tried out your idea
regarding resource forks, but no go.
I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool,
which by all accounts, does not include resource forks:
zip -X -r test test
... then renamed test.zip as test.docx and attempted to open it with
Word 2008. No luck, Word declares the document bogus.
I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource
forks in the first place, it shouldn't be an issue. Just to make sure,
I checked the zip file and didn't see any resource-looking files:
% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)
:(
<snip>
also noticed:
'test' is included in the path, but was very likely not in the original...
have you tried zip'ing the files from within the same directory, such that
'test' is not part of the imported path?...
'test' was the document name, so that's consistent.
the problem seems to be with the zip creators themselves; I was able to
successfully modify a .docx file (although not create one from scratch)
with the Mac compression tool Springy.
unfortunately, there's no scripting or command line capability at the
moment, or for some time, so although it's workable as a production
tool, it won't work in a work flow. that's not as helpful as it sounds;
it's just sending me the long way around editing a document manually,
when I could just do it with Word itself.
the idea is to automate the creation of these documents, so even a
single break in that chain prevents that from happening.
you don't seem to be getting the idea here...
that 'test' is the document name, does not matter, it is if 'test' was part
of the path within the original docx file...
often, if you unzip a file, a path may be *created* which has the name of
the original ZIP, but this path is not necessarily present *within* the
original ZIP file.
a botch this severe will almost certainly make things not work...
maybe try something from within the 'test' directory, such as:
zip -r ../test2.zip *
and, see if this makes any difference...
Phantom, may I ask what your Springy settings were? I tried Springy on
a pptx file with no luck. I modified the slide xml file embedded
inside the .pptx archive of a single-slide powerpoint slide show. I
only changed the text of a textbox to keep things as simple as
possible. I did this in TextWrangler after opening the xml file from a
contextual menu in Springy. I saved the file in TextWrangler, and
Springy then asked 'if I really wanted to overwrite' the xml file
already in the archive. I said yes. I then tried to open the .pptx
file with PowerPoint, but 'there was an error opening' the file.
I don't need command line/workflow compatibility, so if I could only
get this to work that would make my day.
Cheers, Philip