Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Extract Text out of MS WORD 6.0 DOC's

0 views
Skip to first unread message

Graham Powell

unread,
Sep 28, 1997, 3:00:00 AM9/28/97
to

Unfortunately Brian life is not as easy as it may seem.

Word 6/7 files are stored as either simple or complex compound documents
embedded with the OLE file format. Only the simple documents contain a
single contiguous block of text. Once the file has been edited the
embedded text becomes fragmented.

Check out the following functions :
StgOpenStorage and OpenStream

If you get to this stage ok you have extracted the text from the OLE
format.

After this, to understand the compound document file format, your best
bet is to contact Microsoft for their mind blowing document which
explains it all. You will have to sign a non-disclosure agreement.

Best of luck
Graham

In article <19970928150...@ladder02.news.aol.com>, BBSoft GbR
<bbso...@aol.com> writes
>I want to extract Text out of MS WORD 6.0 Files. I have seen, that there
>is some Plain ASCII-Text area in the File. But how do I find the starting
>and ending offset of the Text.
>Anyone now how to find the Programmer of CatDoc.C? He has written a
>converter, but It doesn't work, and I cannot find out how he finds the end
>of the Text.
>Im using D2.
>
>Thank you
>
>Brian
>

--
Graham Powell

BBSoft GbR

unread,
Sep 28, 1997, 3:00:00 AM9/28/97
to

Stefan Schader

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

I would open Word from Delphi with the subject document. Then use an
OLE Object to pass WordBasic macros to highlight all the text and grab
it to the clipboard and then do whatever you want to do with it.
An example of using OLE is the spell checker found on DSP.

0 new messages