How to get the Word Document Content in byte array

@discussions.microsoft.com Satyen

unread,

May 31, 2007, 9:40:01 PM5/31/07

to

Hi All

I am writing a Word Addin and I need to get the document content in byte
array - I tried doing SaveAs and re-opening it , but it gave me error that it
cannot access the file as it is being used by another process.

I can't close the file at this point

I am using C# in developing this Addin.

I searched and found some solution like below -

((Microsoft.Office.Interop.Word.Document)_document).ActiveWindow.Selection.WholeStory();

((Microsoft.Office.Interop.Word.Document)_document).ActiveWindow.Selection.Copy();

IDataObject data = Clipboard.GetDataObject();

But not sure if data.GetData(DataFormats.Rtf).ToString(); will lose some of
the Word objects (like tables / drawimgs etc)

Any help will be appreciated

Cindy M.

unread,

Jun 5, 2007, 6:17:51 AM6/5/07

to

Hi =?Utf-8?B?U2F0eWVu?=,

Which version of Word are we discussing?

RTF should give a fairly exact representation. But for newer versions of Word (from
2000 onwards) HTML will probably be better.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in
the newsgroup and not by e-mail :-)

Satyen

unread,

Jun 5, 2007, 2:11:00 PM6/5/07

to

Hi Cindy,

Thanks for replying to my question - so far, I am using Word 2003 (but Ijust
installed Office 2007 and will play with that a bit)

I will try changing it to HTML

Thanks again.

Satyen

unread,

Jun 6, 2007, 2:16:00 PM6/6/07

to

Hi Cindy,

I tried HTML and it is close enough (as you said) but not quite the same. Do
you know how to get the content of the document in byte array (without
closing the document)

Thanks.

Cindy M.

unread,

Jun 6, 2007, 2:18:55 PM6/6/07

to

Hi =?Utf-8?B?U2F0eWVu?=,

> Thanks for replying to my question - so far, I am using Word 2003 (but Ijust
> installed Office 2007 and will play with that a bit)
>
> I will try changing it to HTML
>

In 2003 and 2007 there's an even better possibility. Word provides a native XML
"vocabulary": WordProcessingML. You can get Document.Content.XML into a string
which will, with a couple of exceptions (some document properties) contain all
the information to recreate the document.

Satyen

unread,

Jun 7, 2007, 4:40:01 PM6/7/07

to

Cool - but it has XMLNodes , so I can do something like below

Microsoft.Office.Interop.Word.XMLNodes xmlNodes =
((Microsoft.Office.Interop.Word.Document)_document).ActiveWindow.Document.Content.XMLNodes;

But then do I need to iterarte - eh, that's too much. I was expecting
something like you suggested .XML which is string represeatation of the
document

- Satyen

Satyen

unread,

Jun 7, 2007, 6:32:01 PM6/7/07

to

Hi

I took a different approach and save another copy in a temp file, read the
content from the second temp file (this time did not get can't access error)

((Microsoft.Office.Interop.Word.Document)_document).SaveAs(ref objFile,
.....) ;
FileInfo fi = new FileInfo ((string)objFile);
String tempFileCopy = ( Path.Combine(Path.GetTempPath(), "copy_"+keyName) );
fi.CopyTo(tempFileCopy, true);
byte[] contents = File.ReadAllBytes((string)tempFileCopy);

Thanks .