Is there a way to do this? I thought that the Parent property or ID
property would somehow do this for me.
Example of document:
This is the first paragraph
Table - Row 1, Cell 1 Row 1, Cell 2 Row 1, Cell 3 Row 1, Cell 4 Row
1, Cell 5
Row 2, Cell 1
This is the second paragraph
This is the third paragraph - now I want a list:
1. This is the first item in the list
2. This is the second item in the list
3. This is the third item in the list
This is the 4th paragraph.
Thanks in advance for any help that you can provide.
you need to investigate the <range>.Information(LOTS_OF_OPTIONS)
object.
Have a ripper day.
Teresa Rippeon <teresa....@mantech-stc.com> was spinning a yarn
that went like this:
Steve Hudson, Word Heretic
HDK List MVP
Word tools: her...@tdfa.com
Please post replies/further questions to the newsgroup so that all may benefit.
If I don't provide enough information, please feel free to ask for more :-)
Considering that tables *contain* paragraphs, this could get mighty
complicated! And then what about collections like Lists - a paragraph may or
my not be a member of a List; and the paragraphs that make up a single List
can be non-contigious - so your "structure" could quickly look like
spagghetti junction.
What are you going to use this information for?
Regards
Dave
"Teresa Rippeon" <teresa....@mantech-stc.com> wrote in message
news:3C6047A8...@mantech-stc.com...
I previously had RTF documents that I converted to XML tagged documents, but it
seemed much simpler to go the route of converting the Word Document directly to
XML tagged documents, versus going through the process of converting the Word
document to RTF then to XML. It seemed like a good idea at the beginning...
Thanks.
Teresa
Dave Rado wrote:
--
Teresa Rippeon
Mantech
9189 Red Branch Road
Columbia, MD 21045
410-772-3452
Hi Teresa,
If you tried this before, you know already that converting Word
documents to XML doesn't make much sense if the docs aren't formatted
with styles. OTOH, if you use list styles for example, you don't have
to worry about lists, because they will be tagged automatically when
you tag the paragraphs.
You'll also know already that most of the time it doesn't make much
sense to tag *everything* (else, it would be easier to save as HTML
and take it from there).
There are quite a few commercial/shareware/freeware utilities to do
the job.
If you search the Word newsgroups for "XML" with Google, you'll find
evaluation software and free downloads.
I have checked out only a few; often they seemed veeery slow, or very
limited in the features they support.
A converter by Microsoft looks promising, but I wasn't able to
evaluate (because I still work under Win98):
Search the MSDN library
http://msdn.microsoft.com/library/default.asp
for "Export a Word Document to XML".
If you want to do it yourself, I post some code below that tags simple
tables, paragraph styles, character styles, bold, and italic.
The code is a shortened version; there is much room for improvements
(tag foot-/endnotes, comments, sections, chapters, pictures..., change
tags so they are valid XML tags, build a DTD, tag "upper" Unicode
characters as &#xXXXX; ...).
Good luck with your project!
Klaus
Sub WordToXML()
' Tags character styles, paragraph styles,
' and bold/italic manual formatting that isn't
' applied on top of character styles;
' puts in very simple HTML table tags.
Dim myStyle As Style
Dim myStyleName As String
Call TagTables
Call FixVbCrAndVbTab
ActiveWindow.View.Type = wdNormalView
Selection.HomeKey Unit:=wdStory
' Tag character styles first,
' so they are nested in paragraph style tags:
For Each myStyle In ActiveDocument.Styles
If myStyle.InUse = True Then
If myStyle.Type = wdStyleTypeCharacter Then
If myStyle <> _
ActiveDocument.Styles(wdStyleDefaultParagraphFont) Then
myStyleName = myStyle.NameLocal & ">"
Selection.Find.ClearFormatting
Selection.Find.Style = myStyle
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Style = _
ActiveDocument.Styles(wdStyleDefaultParagraphFont)
With Selection.Find
.text = ""
.Replacement.text = "<" & myStyleName _
& "^&" & "</" & myStyleName
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute _
Replace:=wdReplaceAll
End If
End If
End If
Next myStyle
' Paragraph styles:
For Each myStyle In ActiveDocument.Styles
If myStyle.InUse = True Then
If myStyle.Type = wdStyleTypeParagraph Then
If myStyle <> _
ActiveDocument.Styles(wdStyleNormal) Then
myStyleName = myStyle.NameLocal & ">"
Selection.Find.ClearFormatting
Selection.Find.Style = myStyle
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Style = _
ActiveDocument.Styles(wdStyleNormal)
With Selection.Find
.text = "([!^13]@)^13"
.Replacement.text = "<" & myStyleName _
& "\1" & "</" & myStyleName & "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute _
Replace:=wdReplaceAll
End If
End If
End If
Next myStyle
Call TagBoldAndItalic
End Sub
Private Sub FixVbCrAndVbTab()
' Set para marks and tabs to DPF, so that
' character styles and manual font formatting
' are neatly nested in para styles
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Style = _
ActiveDocument.Styles(wdStyleDefaultParagraphFont)
With Selection.Find
.text = "[^13^9]"
.Replacement.text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub TagBoldAndItalic()
Selection.HomeKey Unit:=wdStory
With Selection.Find
' Tags should be in DPF (for proper nesting of tags)
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.Forward = True
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Style = _
ActiveDocument.Styles(wdStyleDefaultParagraphFont)
.text = "\<[!\<\>]@\>"
.Replacement.text = "^&"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
.ClearFormatting
.Font.Bold = True
.Replacement.ClearFormatting
.Replacement.Font.Bold = False
.text = ""
.Replacement.text = "<em>^&</em>"
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
.ClearFormatting
.Font.Italic = True
.Replacement.ClearFormatting
.Replacement.Font.Italic = False
.text = ""
.Replacement.text = "<i>^&</i>"
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
End With
End Sub
Sub TagTables()
' Very simple HTML table tags without colspan.
' Doesn't allow for different paragraph styles in a single cell.
Dim myTable As Table
Dim myCell As Cell
Dim rngCell As Range
Dim rngRow As Range
Dim myString As String
Dim myPara As Paragraph
Dim SIwdStartOfRangeRowNumber
Dim SIwdEndOfRangeRowNumber
Dim rowspan
For Each myTable In ActiveDocument.Tables
' Replace ś with tags in cells:
With myTable.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.text = "^p"
.Replacement.text = "<CR/>"
.Execute Replace:=wdReplaceAll
End With
' Tag cells:
For Each myCell In myTable.Range.Cells
myCell.Select
Set rngCell = myCell.Range
SIwdStartOfRangeRowNumber = _
Selection.Information(wdStartOfRangeRowNumber)
SIwdEndOfRangeRowNumber = _
Selection.Information(wdEndOfRangeRowNumber)
rowspan = 0
If SIwdStartOfRangeRowNumber <> _
SIwdEndOfRangeRowNumber Then
rowspan = SIwdEndOfRangeRowNumber - _
SIwdStartOfRangeRowNumber
End If
rowspan = rowspan + 1
myString = "<td"
If rowspan > 1 Then
myString = myString & " rowspan="
myString = myString & rowspan
End If
myString = myString & ">"
rngCell.InsertBefore myString & Chr(182)
rngCell.InsertAfter "<CR/>" & "</td>"
Next myCell
With ActiveDocument.Bookmarks
.Add Range:=myTable.Range, Name:="table"
End With
myTable.ConvertToText Separator:=Chr(182)
' Tag rows:
Selection.GoTo What:=wdGoToBookmark, _
Name:="table"
For Each myPara In Selection.Paragraphs
Set rngRow = myPara.Range.Duplicate
rngRow.MoveEnd wdCharacter, -1
rngRow.InsertBefore "<tr>" & Chr(182)
rngRow.InsertAfter Chr(182) & "</tr>"
Next myPara
' Tag table:
Selection.InsertBefore "<table>" & Chr(182)
Selection.MoveEnd wdCharacter, -1
Selection.InsertAfter "</table>" & vbCr
Next myTable
Selection.WholeStory
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.text = "<CR/>"
.Replacement.text = "^p"
.Execute Replace:=wdReplaceAll
.Replacement.ClearFormatting
.Replacement.Style = _
ActiveDocument.Styles(wdStyleNormal)
.text = Chr(182)
.Replacement.text = "^p"
.Execute Replace:=wdReplaceAll
.text = "</tr>"
.Replacement.text = "^&"
.Execute Replace:=wdReplaceAll
End With
' Format tags in DPF:
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Style = _
ActiveDocument.Styles(wdStyleDefaultParagraphFont)
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchWildcards = True
.text = "\<[!\<\>]@\>"
.Replacement.text = "^&"
.Execute Replace:=wdReplaceAll
End With
End Sub
Thanks for all of your suggestions. I totally agreed with your comments
about converting Word documents to XML. I had previously worked on a
project where we were trying to convert Word documents to XML using
BladeRunner products. We had Word templates to define styles and then
"map" various styled objects to our DTD elements.
However, in this project, the conversion is actually very simple. We're
just taking the entire document and putting everything into paragraphs,
lists, or tables (and figures). So, I was hoping that this simpler
approach would lend itself to the conversion to XML.
I'll think your sample code will be very helpful, and I will check out the
converter from Microsoft. Thanks very much!
Teresa
Klaus Linke wrote:
--