Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

REXX routine to read a MS Word file?

75 views
Skip to first unread message

Gary Richtmeyer

unread,
Jul 6, 2004, 8:43:02 AM7/6/04
to
Does anybody have/know of a routine to allow a REXX program to be able to
read a MS Word (in it's .doc format) file in order to do some bulk data
extraction? I have thousands (literally!) of Word files that I need to get
some info, but with the number of files involved, it's not feasible to save
each as a txt file in order to do the extraction.

I have both ObjRexx and Regina and on a Windows-based environment if that
makes a difference.

Thanks,

Gary Richtmeyer


Lee Peedin

unread,
Jul 6, 2004, 9:55:31 AM7/6/04
to
Stand by Gary,
I have a sample "somewhere" in my archives and will try to locate it.

Lee Peedin
VP RexxLA

Chris

unread,
Jul 6, 2004, 11:31:10 AM7/6/04
to

"Gary Richtmeyer" <glricht-R...@imailbox.com> wrote in message
news:p4OdnaxXX43...@giganews.com...

On 32 bit Windows you could use the w32funcs DLL available at
http://home.interlog.com/~ptjm/ to access the Word documents using OLE.

Here's a simple example that opens a document, steps through the 'Words'
collection and displays each word on the screen:


/* Example of OLE automation with Word. */

call rxfuncadd 'w32loadfuncs', 'w32util', 'w32loadfuncs'
call w32loadfuncs

wrd = w32CreateObject("Word.Application")
documentscollection = w32getproperty(wrd,"Documents")

myFile = "C:\Documents and Settings\Administrator\My
Documents\Rexx\w32funcs\wordread.doc"
document = w32CallFunc(documentscollection,"Open",'s',myFile)
wordcount = w32getproperty(document,"Words.Count")

if datatype(wordCount,"N") then do i = 1 to wordCount
wordref = w32getsubobj(document,"Words","I",i)
say w32getproperty(wordref,"Text")
end

cleanup:
call w32CallProc wrd, "FileClose"
call w32ReleaseObject wrd
call w32olecleanup
call w32dropfuncs
/* End of example */


Lee Peedin

unread,
Jul 6, 2004, 11:37:45 AM7/6/04
to
Gary,
Here's an example using Object Rexx

WATCH for line wraps ! ! ! !

/* testword.rex */

wrdObj = .oleObject~new("Word.Application")

text = 'Gary Richtmeyer'

doc_spec = 'c:\Documents and Settings\Administrator\My
Documents\*.doc'
call SysFileTree doc_spec,mydocs.,'f' -- Get all Word
docs in specific folder
do aa = 1 to mydocs.0 -- Obtain just the
filename
entry_length = length(mydocs.aa)
cpos = pos("c:",mydocs.aa) - 1
fname = substr(mydocs.aa,cpos,entry_length-cpos+1)
call Search_Doc fname
end

wrdObj~quit
exit


Search_Doc:
parse arg current_doc
wrdObj~Visible = .False
wrdObj~Documents~Open(current_doc)

if wrdObj~Selection~Find~Execute(text) then
aline = 'Found'
else
aline = 'Did NOT Find'

aline = aline text "In" current_doc
say aline
wrdObj~Documents~Close()
return

On Tue, 6 Jul 2004 08:43:02 -0400, "Gary Richtmeyer"
<glricht-R...@imailbox.com> wrote:

Gary Richtmeyer

unread,
Jul 7, 2004, 2:46:02 PM7/7/04
to
Thanks Lee and Chris!

I'll give both examples a try and see how it goes.

Gary Richtmeyer


"Gary Richtmeyer" <glricht-R...@imailbox.com> wrote in message
news:p4OdnaxXX43...@giganews.com...

Sebastian Schildt

unread,
Jul 9, 2004, 1:27:46 PM7/9/04
to
Gary Richtmeyer wrote:
> Does anybody have/know of a routine to allow a REXX program to be able to
> read a MS Word (in it's .doc format) file in order to do some bulk data
> extraction? I have thousands (literally!) of Word files that I need to get
> some info, but with the number of files involved, it's not feasible to save
> each as a txt file in order to do the extraction.

Hi!

If using an external application is an option, you might consider
http://wvware.sourceforge.net/
This is a library which allows access to word documents and runs on
quite a lot plattforms. The distribution contains a small utility named
wvText which converts Word files to plain ASCII text.

I used the library a few years ago and it worked quite good back then.

MfG

Sebastian

0 new messages