On Fri, Oct 30, 2009 at 7:40 PM, Ellgar209 <GglGroups...@mailinator.com> wrote:
> I'm cleaning up text files and I've come across something I don't know > how to fix because it involves periods.
Looks like the quick answer would be just "escape dot with a backslash"
> I get this type of situation (without quotes): > " .................." > " ........ " > "..... " > " ......."
> etc., etc., when I'd like to convert all those to this format here: > " ... " > (1 space before, only 3 periods, and 2 spaces after)
And here goes not so quick answer. 1) If those series of dots occure only after linebreaks.
look for: [\r\n]+\s*\.+\s*
replace with: a line break, 1 space , 3 periods, 2 spaces after)
2) If they happen in the middle of a line. I guess you dont want single periods in your text to turn into " ... " So , define a minimal quantity of sequential periods to be replaced, Say it will be 5. Maximum quantity not limited.
Then you look for \s*\.{5,}\s*
and replace with " ... "
-- And don't forget to backup before you try even a 100% good solution! :)
I do apologize, it was case #2 <g>. I should have mentioned that this
is in between text and not after linebreaks. Though got to thinking
that that might happen, so I'll keep note of the regex in case needed
down the road.
Also, all this is needed for cleaning up text snippets.
I made the most marvellous discovery yesterday. My text editor of
choice for years has been Metapad. I found out yesterday what the
"external viewers" option is in it. The external viewers are just 2
launchers possible from within Metapad. Once I found that out, I
assigned EmEditor, another text editor with macro capabilities, as
the "primary viewer" and a freeware spell checker as the second.
The text editor EmEditor has great macro command abilities which can
also use regex.
Firefox has an extension (plugin) that allows us to save text snippets
off of webpages.
So, with this regex command here (which works pretty wonderfully so
far, btw), I'll select text in Firefox and send to Metapad with the
extension by the easy mouse clicks of selecting text and right-
clicking on a special context entry. Metapad pops up with my selected
web text. I click on the primary viewer and the EmEditor pops up with
text dumped into it automagically <g>. A click of a toolbar button
pointing to the clean-up macro and the text gets cleaned up of all the
gunk can be in the html to text "translation" and the I save and close
EmEditor. Metapad in the background comes back to the fore and a
simple refresh [F5] and the text is updated with the EmEditor cleaned
up text. So in a series of mouse clicks I now have almost perfectly
cleaned up text done in a couple of seconds rather than me doing it
all by hand, as I have for years.
But formatting varies on the web, as people know. Your regex S&R of:
Then you look for
\s*\.{5,}\s*
and replace with " ... "
works like a charm! It cleaned up all but one case, the case where
the format is like this, something I hadn't thought of.
-------------------------------------------------------------------------
This actual test text:
hello ... how ......... are ................. you ......
dkidkeij...fdksdkfjdlskjfle ..............
fdskfjldsakfjalseiaelfj............dasflkdslafjksalkfslakf ...........................................................
test test...dfjslkjf
gets changed to this with the above regex:
hello ... how ... are ... you ... dkidkeij...fdksdkfjdlskjfle ...
fdskfjldsakfjalseiaelfj ... dasflkdslafjksalkfslakf ... test
test...dfjslkjf
-------------------------------------------------------------------------
The only one not fixed is when there is no spacing at all between the
text and periods. The number of periods seems to get fixed so far to
just three, but the spacing isn't corrected in between. So
"xxxx...xxxx" doesn't get changed to "xxxx ... xxxx".
How can this slight addition be accommodated for, anyone know?
Periods stump me because they're part of regex command in themselves.
<g>.
Okay, despite posting, I always keep trying. However, only found half
of the problem but I think my solution to that half is probably going
to turn out to be too narrow. Everything I tried, however, didn't
work and this did, so there you go <g>.
For those instances where I ended up with
xxxx...xxxx xxxx xxx xxxxxx...xx
doing this search does find those cases where there was no space in
between the letters and periods:
[a-z]\.\.\.[a-z]
However, I couldn't use this to fix:
[a-z]\s\.\.\.\s\s\s[a-z]
because despite being in regex mode, for some reason the replace
function takes these characters literally so I ended up with a literal
string of [a-z]\s\.\.\.\s\s\s\[a-z], rather than achieving the spacing/
periods fix.
Replacing with " ... ", instead of using a regex string, doesn't
work either because I end up losing the last letter of the word
before, and the first letter of the words after the dot-dot-dot.
So, guys, begging your indulgence, how could one search for those
instances where there is no spacing between the words and the ellipsis
(the 3 dots)?