Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Removing hyphens

26 views
Skip to first unread message

Tim

unread,
Sep 15, 2005, 5:18:56 PM9/15/05
to
I have a document that was scanned and OCR ran on it and it has been
converted to a .txt file. I am trying to do two things:
1. Remove all the hyphens at the end of the line. It appears that there
is a hyphen, space, and a carriage return. Is there a way to make this
do the whole document.
2. remove the carriage returns at the end of each line of a paragraph.
My macro to do this now goes like:
Selection.MoveDown Unit:=wdParagraph, Count:=1
Selection.TypeBackspace
This works one line at a time, but maybe there is a better way of doing
this.

This is Word2004 11.1 using Mac 10.4.2.

Elliott Roper

unread,
Sep 15, 2005, 5:54:59 PM9/15/05
to
In article <1126819136.0...@g14g2000cwa.googlegroups.com>, Tim
<payto...@yahoo.com> wrote:

1. Creative use of find and replace?
find "- ^p" replace "" without the "s of course

If OCR gives you two consecutive para marks at the real end of a
paragraph, try three passes through find and replace

A. find ^p^p replace /\ or something else not in the doc.
B. find ^p replace " " without the "s of course
C. find /\ replace with ^p

This might mangle formatting, but then OCR that preserves formatting is
mostly an accident.

You could of course put all this in a macro. I have one to do one
selection at a time, since the risk of mashing tables and lists is
pretty high.

you are welcome to this one:

Sub One_Paragraph()

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p"
.Replacement.Text = " "
.Forward = True
.Wrap = wdFind
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = " ^w"
.Replacement.Text = " "

You will note that it collapses multiple spaces to a single space in
the last three lines. That gets rid of doubled-up spaces where lines
ended in space para before you start.

Select the text you want to make into a single para and run the macro
(a keyboard shortcut for it is useful)

--
To de-mung my e-mail address:- fsnospam$elliott$$
PGP Fingerprint: 1A96 3CF7 637F 896B C810 E199 7E5C A9E4 8E59 E248

Elliott Roper

unread,
Sep 15, 2005, 6:05:26 PM9/15/05
to
In article <150920052254598042%nos...@yrl.co.uk>, Elliott Roper
<nos...@yrl.co.uk> wrote:

> In article <1126819136.0...@g14g2000cwa.googlegroups.com>, Tim
> <payto...@yahoo.com> wrote:
>
> > I have a document that was scanned and OCR ran on it and it has been
> > converted to a .txt file. I am trying to do two things:
> > 1. Remove all the hyphens at the end of the line. It appears that there
> > is a hyphen, space, and a carriage return. Is there a way to make this
> > do the whole document.
> > 2. remove the carriage returns at the end of each line of a paragraph.
> > My macro to do this now goes like:
> > Selection.MoveDown Unit:=wdParagraph, Count:=1
> > Selection.TypeBackspace
> > This works one line at a time, but maybe there is a better way of doing
> > this.

Oh bugger, for the second time today I have carelessly pasted stuff
into a post on this group. I do apologise.
There is more to this of course.

> Sub One_Paragraph()
>
> Selection.Find.ClearFormatting
> Selection.Find.Replacement.ClearFormatting
> With Selection.Find
> .Text = "^p"
> .Replacement.Text = " "
> .Forward = True
> .Wrap = wdFind
> .Format = False
> .MatchCase = False
> .MatchWholeWord = False
> .MatchWildcards = False
> .MatchSoundsLike = False
> .MatchAllWordForms = False
> End With
> Selection.Find.Execute Replace:=wdReplaceAll
> With Selection.Find
> .Text = " ^w"
> .Replacement.Text = " "

In full it is:

Sub One_Paragraph()

End Sub

sorry.
E

0 new messages