Here's some of what I have so far:
oWord = CreateObject("Word.Application")
oWord.Visible = True
oDoc = oWord.Documents.Open("C:\SomeWordDoc.doc", , True)
Dim rng As Word.Range
With oWord.Selection
.HomeKey(wdStory)
rng = .Range
End With
>>> Here's a point where I'm stuck. I can find the phrase "Issue date:"
>>> but then I need to read the text AFTER that (but not including the
>>> phrase itself)
>>> For example, the line in the doc might read "Issue date: March 25, 2009"
>>> I need to extract the "March 25, 2009" part.
rng.Find.Text = "Issue date::"
If rng.Find.Execute() Then
'MsgBox("found")
rng = oWord.Selection.Range
rng.End = rng.Next(wdLine, 1).End ' rng.MoveEnd(wdLine)
MsgBox(rng)
Else
MsgBox("Not found")
End If
>>> Then the next line below that doesn't have anything to cue me into that
>>> line. I just need the entire line below the date noted above. How do
>>> I move to the next line and read the entire line?
'move to linebelow "Issue Date:" to get county
>>> The line below "Issue Date:" would be like this: "Orange County"
Help with the above will really get me started well on this. I'd really
apprecate it.
Thanks,
Keith
"Keith G Hicks" <k...@comcast.net> wrote in message
news:Owi61T%23gKH...@TK2MSFTNGP04.phx.gbl...
> Is this not possible?
>
You probably should ask in an MS Word group:
microsoft.public.word.*
You might be using VB.Net but the code you're
working on is MS Word object model. It will only
make sense to people who use MS Word and who
have experience with MS Word/Office automation.
I'm sure it is possible within word, but I would grab all the text, and
use regular expressions to search for the pattern you want. You only
seem to be using word, as that is the form of the original doc.
--
Mike
"Family Tree Mike" <FamilyT...@ThisOldHouse.com> wrote in message
news:O3UMaqXh...@TK2MSFTNGP04.phx.gbl...
"mayayana" <mayaX...@rcXXn.com> wrote in message
news:%23xEFHIX...@TK2MSFTNGP04.phx.gbl...
Yes, that's what I meant. MS Office automation
is COM. You've got a COM object model, which is
adaptable to any COM-centric language. VB.Net is not
COM, so there's no direct translation. If it were me
I'd ask only in the Word group, get the VB/VBA code,
then figure out how to translate that to .Net. Even if
you were using a COM-centric language like VB or
VBScript, the Word group would still be the place
to ask, because your question is not about a language.
It's about the object model of the Word.Application
automation object.
Also, this may not help, but if you're dealing
only with .doc files (not .docx) and you're considering
just dealing with the text string as Family Tree Mike
suggested -- the .doc spec. has been published.
I think this is it:
http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7
AD886/Word97-2007BinaryFileFormat(doc)Specification.pdf
I downloaded it when it was first released and wrote
a VBScript to extract text from .doc files. It seems
to work quite dependably. The details of plain text
storage in .doc files (as opposed to formatting, images,
etc.) are not very complex.
Keith
"Keith G Hicks" <k...@comcast.net> wrote in message
news:Owi61T%23gKH...@TK2MSFTNGP04.phx.gbl...
From looking at the byte data of several files, I observed that
1. The body text starts at byte number 2562
2. The body text ends when you encounter the first 0 decimal value byte.
3. So simply read in the data between those two points.
I tried this on about six files. It worked for them. I can't guarantee
that it will work for all since I couldn't decipher in the Word file
documentation, for which someone posted the link, exactly where the text
began and its length. I simply looked at a few files.
"Keith G Hicks" <k...@comcast.net> wrote in message
news:Owi61T%23gKH...@TK2MSFTNGP04.phx.gbl...
It's somewhat more involved than that, but not too
bad. See here for a VBScript version:
http://www.jsware.net/jsware/scripts.php5#desk
You can pretty much see the text if you just open
a Word .doc in Notepad, but it needs to be cleaned up.
Dim oWord As Word.Application
Dim oDoc As Word.Document
oWord = CreateObject("Word.Application")
oWord.Visible = True
oDoc = oWord.Documents.Open("c:\SomeWordFile.doc", , True)
oWord.Selection.WholeStory()
Dim wholeText As String = oWord.Selection.Text
I was going to do that and use RegEx to find everything I need but I got
answers to how to read the file as a word doc (not as just text) in the
word.vba.general newsgroup. Reading this as text and using RegEx is a
problem due to the fact that I can't use RegEx to find everything. I need to
find specific line #'s as well. I need all the info on line 4 and the info
on theat line will vary to the point that RegEx would be impractical. Greg
Maxey in the other newsgroup gave me some sample code. I put it into .net
and it got me going in the right direction.
Thanks.
"mayayana" <mayaX...@rcXXn.com> wrote in message
news:e%23AHQoqh...@TK2MSFTNGP05.phx.gbl...
1. The body text starts at byte number 2562
2. The body text ends when you encounter the first 0 decimal value byte.
3. So simply read in the data between those two points.
4. In that data only retain those bytes that are less than 123 and greater
than 31 along with line feeds and carriage returns.
That will give you the text and show where the line breaks are. No RegEX
needed as far as I can see to identify the lines.
In your alternate VBA approach, you are using late binding. You might want
to modify this to use early binding as below, where you have set a reference
in your project to the .net Microsoft.Office.Interop.Word, ver. 12. Using
that, you can also read docx files.
The code below displays any word file in a rich text box.
Me.OpenFileDialog1.Title = "Select Word Document"
Me.OpenFileDialog1.FileName = ""
Me.OpenFileDialog1.Filter = "Word Doc (*.doc)|*.doc|Word docx
(*.docx)|*.docx"
If Me.OpenFileDialog1.ShowDialog = Windows.Forms.DialogResult.OK
Then
Path = Me.OpenFileDialog1.FileName
End If
Dim oWord As New Microsoft.Office.Interop.Word.Application
Dim oDoc As New Microsoft.Office.Interop.Word.Document
oDoc = oWord.Documents.Open(Path)
oWord.Selection.WholeStory()
Me.rtbText.Text = oWord.Selection.Text
"Keith G Hicks" <k...@comcast.net> wrote in message
news:uBySBzvh...@TK2MSFTNGP02.phx.gbl...