How do I do this in Linux (edit some text)

Adams-Blake Co.

unread,

Sep 14, 2002, 4:33:32 PM9/14/02

to

We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
found the answer.

We often get pure text docs (MyText.txt) that were saved in Windows and
have a cr/lf at the end of each line. Ex:

This is first line [cr/lf]
This is second line [cr/lf]
[cr/lf]
This is first line of second paragarph [cr/lf]
This is second ine of seconc paragaph [cr/lf]

In the Windows world we can take this into Word and do a 3 step process:

Change all [cr/lf][cr/lf] (pairs) to some symbol like %%
Change remaining [cr/lf] to space
Change %% back to [cr/lf][cr/lf]

which give you:

This is first line. This is second line

This is first line of second paragraph. This is second line of second
paragaph.

I tried like hell to do this in StarOffice but there does not seem to be a
way to find/repace the paragraph symbol (which is "^p" in Word.)

I tried in editors like emacs and Kate, but could not figure it out.

Maybe there is a utilty in Linux that can do this for me? We're a book
publishing company so obviously this is pretty important to us. If you have
a SIMPLE solution or maybe a script, please let us know.

Thanks,

Al
Adams-Blake Company, Inc.
www.adams-blake.com

Bit Twister

unread,

Sep 14, 2002, 4:47:59 PM9/14/02

to

man tr
Exampe is for Carriage Return, you can enter
man ascii for the contorl P character

tr -s '\15' '\12' < input_file > output_file

B. Joshua Rosen

unread,

Sep 14, 2002, 4:48:19 PM9/14/02

to

Use Xemacs. In your .xemacs/init.el file add the following,

(add-hook 'find-file-hooks 'remove-or-convert-trailing-ctl-M)
(defun remove-or-convert-trailing-ctl-M ()
"Propose to remove or convert trailing ^M from a file."
(interactive)
(save-excursion
(goto-char (point-min))
(if (search-forward "\^M" nil t)
;: a ^M is found
(if (or (= (preceding-char) ?\^J)
(= (following-char) ?\^J) )
;: Must find a way to display the buffer before this question
(if (y-or-n-p "Remove trailing ^M ? ")
(progn (goto-char (point-min))
(perform-replace "\^M" "" nil nil nil)
(pop-mark) )
(message "No transformation.") )
(if (y-or-n-p "Convert ^M into ^J ? ")
(progn (goto-char (point-min))
(perform-replace "\^M" "\^J" nil nil nil)
(pop-mark) )
(message "No transformation.") ) )
;:(message "No ^M in this file !")
)
))

When you open a text file it it will ask you if you want to convert it.

Xaonon

unread,

Sep 14, 2002, 4:57:47 PM9/14/02

to

Ned i bach <3d83...@monitor.lanset.com>, Adams-Blake
Co. <aremovet...@adamsremovethis-blake.com> teithant i thiw hin:

> In the Windows world we can take this into Word and do a 3 step process:
>
> Change all [cr/lf][cr/lf] (pairs) to some symbol like %%
> Change remaining [cr/lf] to space
> Change %% back to [cr/lf][cr/lf]

Try defining a macro like this in Emacs:

(fset 'function-name
[?\M-< ?\M-% ?\C-q ?\C-j ?\C-q ?\C-j return ?% ?% return ?!
?\M-< ?\M-% ?\C-q ?\C-j return ? return ?!
?\M-< ?\M-% ?% ?% return ?\C-q ?\C-j ?\C-q ?\C-j return ?!])

Do 'recode ibmpc..lat1' on the file before running this, and then afterwards
do 'recode lat1..ibmpc'. This should work, although I don't have any
Windows text files to try it out on.

--
Xaonon, EAC Chief of Mad Scientists and informal BAAWA, aa #1821, Kibo #: 1
Visit The Nexus Of All Coolness (a.k.a. my site) at http://xaonon.cjb.net/
"Why should I perspire to death on the subway, when I could be flying around
in Dick Cheney's invisible nuclear helicopter or whatever?" -- mnftiu.cc

mjt

unread,

Sep 14, 2002, 5:01:44 PM9/14/02

to

Adams-Blake Co. wrote:

> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
> found the answer.

... 'man tr' or check the 'dos tools': 'man mtools', specifically mcopy

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Michael J. Tobler: motorcyclist, surfer, # Black holes result
skydiver, and author: "Inside Linux", # when God divides the
"C++ HowTo", "C++ Unleashed" # universe by zero

Chris Gordon-Smith

unread,

Sep 14, 2002, 7:51:10 PM9/14/02

to

Adams-Blake Co. wrote:

It seems that you've found a gap in Star Office's functionality.

However, here is a solution to your problem that really is simple. Use a
hexadecimal editor such as KHexEdit.

KHexEdit can display the text you are editing alongside the hexadecimal
ascii codes corresponding to the text characters. This means that you can
search for and replace non-printing characters such as [cr/lf][cr/lf]. The
editor enables you to easily see that the hexadecimal ascii codes
corresponding to [cr/lf][cr/lf] are '0d0a0d0a'. For example, you can do a
simple search and replace to convert hexadecimal '0d0a0d0a' to text '%%'.

I have just performed the (eqivalent of the) sequence of steps you describe
successfully using KHexEdit.

Since KHexEdit is part of KDE its easy to use. Anyone who is currently
doing the procedure you describe in MSWord should be able to do the same
thing in KHexEdit after a maximum of 10 minutes training.

I'm not sure whether KHexEdit comes with Mandrake 8.2, but I would expext
so; it should be part of the KDE installation. If you don't have it then
its available at RPMFIND. I'm using KHexEdit version 0.8.5 with SuSE 7.3,
which is KDE 2 based. I would think this version would also be OK for
Mandrake 8.2 (which is also KDE 2 based). Later versions of KHexEdit look
as though they are compatible with KDE 3, and so might not run on your
installation.

Chris Gordon-Smith
Londo UK

mjt

unread,

Sep 14, 2002, 6:56:13 PM9/14/02

to

Chris Gordon-Smith wrote:

> Since KHexEdit is part of KDE its easy to use. Anyone who is currently
> doing the procedure you describe in MSWord should be able to do the same
> thing in KHexEdit after a maximum of 10 minutes training.
>

... tr is WAY easier ... you can have a file converted before
KHexEdit is finished loading and and that GUI jazz :))

Chris Gordon-Smith

unread,

Sep 14, 2002, 8:44:26 PM9/14/02

to

mjt wrote:

> Chris Gordon-Smith wrote:
>
>> Since KHexEdit is part of KDE its easy to use. Anyone who is currently
>> doing the procedure you describe in MSWord should be able to do the same
>> thing in KHexEdit after a maximum of 10 minutes training.
>>
>
> ... tr is WAY easier ... you can have a file converted before
> KHexEdit is finished loading and and that GUI jazz :))
>

It depends who is using it. The original poster says that he is new to
Linux and that he works for a publishing company. He also says that his
first thought was to reproduce what he already does in MS Word using Star
Office. The likelihood is that the people who will be converting the text
are publishing types with very little technical background, and little
interest in learning to use tr or Emacs.

Even if a script were setup using tr such that non-technical users could
just 'press a button' to convert text, they would get stuck as soon as any
'non-standard' text occurred which was beyond the scope of the script. Then
they would be forced to learn tr or use a tool like KHexEdit.

So far as loading time goes, I really don't think its relevant. My 5 year
old PC load KHexEdit in about 5 seconds.

I think the more general point here is that newcomers to GNU/Linux will
often want to do things in a way similar to the way in which they already
work. If they find that every time they want to do a simple task they have
to learn to use a new set of tools they will just get fed up and say that
GNU/Linux is unusable. The average office worker will not want to read man
pages.

Of course I don't know exactly what kind of people work at the original
poster's company. Perhaps they are fact technically minded after all and
would be happy with tr or Emacs. At any rate, they now have a choice.

Chris Gordon-Smith
London UK

Robert Heller

unread,

Sep 14, 2002, 10:51:52 PM9/14/02

to

"Adams-Blake Co." <aremovet...@adamsremovethis-blake.com>,
In a message on Sat, 14 Sep 2002 13:33:32 -0700, wrote :

"C> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
"C> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
"C> found the answer.
"C>
"C> We often get pure text docs (MyText.txt) that were saved in Windows and
"C> have a cr/lf at the end of each line. Ex:
"C>
"C> This is first line [cr/lf]
"C> This is second line [cr/lf]
"C> [cr/lf]
"C> This is first line of second paragarph [cr/lf]
"C> This is second ine of seconc paragaph [cr/lf]
"C>
"C> In the Windows world we can take this into Word and do a 3 step process:
"C>
"C> Change all [cr/lf][cr/lf] (pairs) to some symbol like %%
"C> Change remaining [cr/lf] to space
"C> Change %% back to [cr/lf][cr/lf]
"C>
"C> which give you:
"C>
"C> This is first line. This is second line
"C>
"C> This is first line of second paragraph. This is second line of second
"C> paragaph.

Here is a filter written in Tcl. Save this as a file (such as
/usr/local/bin/filterextraNLs.tcl) and make it executable
(chmod +x /usr/local/bin/filterextraNLs.tcl).

#!/usr/bin/tclsh

fconfigure stdout -translation crlf
fconfigure stdin -translation crlf

while {[gets stdin line] >= 0} {
if {[string length "$line"] > 0} {
puts -nonewline "$line "
} else {
puts
puts
}
}

Then you can do this:

mv foo.txt foo.txt.bak
/usr/local/bin/filterextraNLs.tcl <foo.txt.bak >foo.txt

"C>
"C> I tried like hell to do this in StarOffice but there does not seem to be a
"C> way to find/repace the paragraph symbol (which is "^p" in Word.)

StarOffice is not a text editor. Neither is MS-Word. Word processors
tend to suck as *plain text editors*. It is not what they are designed
for. I know, lots of people use MS-Word as a text editor. "If all you
have is a hammer, *everything* looks like a nail." An unfortunate
truism. It really makes sense to use a *variaty* of tools. You end up
doing a better job, with less mess and other problems.

"C>
"C> I tried in editors like emacs and Kate, but could not figure it out.
"C>
"C> Maybe there is a utilty in Linux that can do this for me? We're a book
"C> publishing company so obviously this is pretty important to us. If you have
"C> a SIMPLE solution or maybe a script, please let us know.
"C>
"C> Thanks,
"C>
"C> Al
"C> Adams-Blake Company, Inc.
"C> www.adams-blake.com
"C>
"C>

Adams-Blake Co.

unread,

Sep 14, 2002, 10:57:32 PM9/14/02

to

Chris Gordon-Smith wrote:

> mjt wrote:
>
>> Chris Gordon-Smith wrote:
>>
>
> It depends who is using it. The original poster says that he is new to
> Linux and that he works for a publishing company. He also says that his
> first thought was to reproduce what he already does in MS Word using Star
> Office. The likelihood is that the people who will be converting the text
> are publishing types with very little technical background, and little
> interest in learning to use tr or Emacs.
>

This is correct. While I have a lot of technical expertise, the people who
will be doing the text transformation don't. They are used to doing this in
Word and I want to give them a "visual" way to do it in Linux if possible.

I think KhexEdit would work for us. Unfortunately it does not come with
Mandrake 8.2 so we will try to find it in rpmfind or just wait a few weeks
until 9.0 comes out and hope that it is included with KDE 3.0.

I appreciate all of the script and regular expression solutions, but this
is a case where the solution is harder than the problem. In the time that
it takes to figure out how to use tr or sed or awk or whatever, we can boot
to Word (or use Crossover), do the transformation, and be done with it.

ANC

mjt

unread,

Sep 15, 2002, 12:20:41 AM9/15/02

to

On Sat, 14 Sep 2002 13:33:32 -0700, "Adams-Blake Co." <aremovet...@adamsremovethis-blake.com> revealed:

> I tried like hell to do this in StarOffice but there does not seem to be a
> way to find/repace the paragraph symbol (which is "^p" in Word.)

okay, here's the deal - you're using staroffice, right? you can do
everything right there, with no external tools. i tested this right
here on one of my machines and used staroffice to do everything.

here's a test for you to try ... in staroffice:
file -> new -> text document, then type some lines, using
some hard carriage returns:
text text text text
text text text text
text text text text

now: file -> save -> "dos.txt" -> (file type dropdown):
"Text DOS" -> save button

open up a hex editor, we'll use Chris's suggestion: khexedit and
then open the file named 'dos.txt'. guess what? the file has
the CR-LF pairs, just like in dos.

back in staroffice, close the document window, then:
file -> open -> select 'dos.txt' -> open button. you can select
'Text DOS' as the type, but you dont have to, <All> is sufficient.

the file's text output does look "normal" (no extra lines). now,
to get it as a native *nix text file type:
save as -> 'unix.txt' -> (file type): 'Text' (not ansi or dos/etc),
then back to khexedit and file -> open -> 'unix.txt' and voila!
you have the correct LF (\x0a) as the line-terminating-char

as another exercise, i used 'kedit' .... i merely opened the
file and turned right around and did, file -> save and then
checked the resulting file and the CR's were removed.

Adams-Blake Co.

unread,

Sep 15, 2002, 12:31:36 AM9/15/02

to

Robert Heller wrote:

I'll give this script a try. While I'm no expert on Tcl it looks to me that
this will strip out every cr/lf. I only want to strip out the the cr/lf's
at the end of each line except then it is followed by a line with only a
cr/lf.

xxxxxxxxxxx cr/lf
yyyyyyyyyy cr/fl
cr/lf
zzzzzzzzzzz cr/lf

becomes

xxxxxxxxxxyyyyyyyyyyy cr/lf
cr/lf
zzzzzzzzz

Would it be possible to just create 3 scripts. The first substitutes cr/lf
cr/lf (pairs) to some character (like %). Would 'tr" do this? Then the
second script takes out all remaining cr/lfs. Finally the third script
would transform the special characters (%) to cr/lf crlf pair.

Basically this is what I do in Word.

Thanks,

Al

Adams-Blake Co.

unread,

Sep 15, 2002, 1:37:50 AM9/15/02

to

mjt wrote:

I'm very, very appreciative of your effort to help, but it does not really
solve my problem. The problem is that I get files with hard returns at the
end of each line... and then a line with only a return as a space between
paragraphs. What I want to do is take out the returns on all the lines to
form a paragraph but leave a return on a line between paragraphs. In most
word processor packages they use a "paragraph" marker at the end of each
paragraph which (among other things) cause a hard return. Thus I need a
hard return at the end of the paragraph and a hard return on the next line
which has no text.

Al

Robert Heller

unread,

Sep 15, 2002, 8:09:28 AM9/15/02

to

"Adams-Blake Co." <aremovet...@adamsremovethis-blake.com>,
In a message on Sat, 14 Sep 2002 21:31:36 -0700, wrote :

"C> Robert Heller wrote:
"C>
"C> > "Adams-Blake Co." <aremovet...@adamsremovethis-blake.com>,

"C> > In a message on Sat, 14 Sep 2002 13:33:32 -0700, wrote :
"C> >

"C> > "C> We're new to Linux (Mandrake 8.2) and have Star Office as well as a
"C> > bunch "C> of editors (emacs, Kate, gedit, etc.) I looked all over Google
"C> > and have not "C> found the answer.

"C> > "C>
"C> > "C> We often get pure text docs (MyText.txt) that were saved in Windows

"C> > and "C> have a cr/lf at the end of each line. Ex:

"C> > "C>
"C> > "C> This is first line [cr/lf]

"C> > "C> This is second line [cr/lf]
"C> > "C> [cr/lf]
"C> > "C> This is first line of second paragarph [cr/lf]
"C> > "C> This is second ine of seconc paragaph [cr/lf]

"C> > "C>
"C> > "C> In the Windows world we can take this into Word and do a 3 step

"C> > process: "C>

"C> > "C> Change all [cr/lf][cr/lf] (pairs) to some symbol like %%

"C> > "C> Change remaining [cr/lf] to space
"C> > "C> Change %% back to [cr/lf][cr/lf]

"C> > "C>
"C> > "C> which give you:
"C> > "C>
"C> > "C> This is first line. This is second line
"C> > "C>
"C> > "C> This is first line of second paragraph. This is second line of second

"C> > "C> paragaph.
"C> >
"C> > Here is a filter written in Tcl. Save this as a file (such as
"C> > /usr/local/bin/filterextraNLs.tcl) and make it executable
"C> > (chmod +x /usr/local/bin/filterextraNLs.tcl).
"C> >
"C> > #!/usr/bin/tclsh
"C> >
"C> > fconfigure stdout -translation crlf
"C> > fconfigure stdin -translation crlf
"C> >
"C> > while {[gets stdin line] >= 0} {
"C> > if {[string length "$line"] > 0} {
"C> > puts -nonewline "$line "
"C> > } else {
"C> > puts
"C> > puts
"C> > }
"C> > }
"C> >
"C> > Then you can do this:
"C> >
"C> > mv foo.txt foo.txt.bak
"C> > /usr/local/bin/filterextraNLs.tcl <foo.txt.bak >foo.txt
"C> >
"C> > "C>

"C> > "C> I tried like hell to do this in StarOffice but there does not seem to

"C> > be a "C> way to find/repace the paragraph symbol (which is "^p" in Word.)
"C> >
"C> > StarOffice is not a text editor. Neither is MS-Word. Word processors
"C> > tend to suck as *plain text editors*. It is not what they are designed
"C> > for. I know, lots of people use MS-Word as a text editor. "If all you
"C> > have is a hammer, *everything* looks like a nail." An unfortunate
"C> > truism. It really makes sense to use a *variaty* of tools. You end up
"C> > doing a better job, with less mess and other problems.
"C> >
"C>
"C> I'll give this script a try. While I'm no expert on Tcl it looks to me that
"C> this will strip out every cr/lf. I only want to strip out the the cr/lf's
"C> at the end of each line except then it is followed by a line with only a
"C> cr/lf.

No what it does is:

Strips every newline (appends lines together with a *space*).
When it comes to a zero length line (saw two newlines together), it
spits out two newlines.

Here is the script with lots of comments.

#!/usr/bin/tclsh
# The above line is UNIX Scripting 'scenery'

# 'newline' on output is cr/lf
fconfigure stdout -translation crlf
# 'newline' on input is cr/lf
fconfigure stdin -translation crlf

# While gets does not fail (gets will 'fail' on EOF)

while {[gets stdin line] >= 0} {

# If the line is not empty (not between a pair of newlines (cr/lf/cr/lf)

if {[string length "$line"] > 0} {

# Output the line, with a space instead of a newline.
puts -nonewline "$line "
# ^--- note the space
} else {
# otherwise, output cr/lf/cr/lf (end current line and add another)
puts
puts
}
}

"C>
"C> xxxxxxxxxxx cr/lf
"C> yyyyyyyyyy cr/fl
"C> cr/lf
"C> zzzzzzzzzzz cr/lf
"C>
"C> becomes
"C>
"C> xxxxxxxxxxyyyyyyyyyyy cr/lf
"C> cr/lf
"C> zzzzzzzzz
"C>
"C> Would it be possible to just create 3 scripts. The first substitutes cr/lf
"C> cr/lf (pairs) to some character (like %). Would 'tr" do this? Then the
"C> second script takes out all remaining cr/lfs. Finally the third script
"C> would transform the special characters (%) to cr/lf crlf pair.

This is massive overkill. And really unnecessary. This is like defining
addition of two numbers by first dividing the first number into two,
then adding the resulting three numbers together, on the basis that
multi-addend column addition is somehow easier then just adding two
original numbers. Eg: replacing

44
+27
----

with

22
22
+27
----

You are thinking in terms of the problem as an exercise in how to deal
with a dumb 'text' edit problem, which is not capable of being programmed
to do 'smart' substitutions, rather than as a problem that can be
solved with a filter.

Yes, there are ways to do this with tr, sed, and awk, but it would take
additional steps and would not really be easier. Like a said before:
it is possible to pound in a screw with a hammer, but it really works
better if you use a screwdriver.

"C>
"C> Basically this is what I do in Word.

"C>
"C> Thanks,
"C>
"C> Al
"C>

"C>

Duane Clark

unread,

Sep 15, 2002, 2:22:05 PM9/15/02

to

Adams-Blake Co. wrote:
> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
> found the answer.
>
> We often get pure text docs (MyText.txt) that were saved in Windows and
> have a cr/lf at the end of each line. Ex:
>
> This is first line [cr/lf]
> This is second line [cr/lf]
> [cr/lf]
> This is first line of second paragarph [cr/lf]
> This is second ine of seconc paragaph [cr/lf]
>
> In the Windows world we can take this into Word and do a 3 step process:
>
> Change all [cr/lf][cr/lf] (pairs) to some symbol like %%
> Change remaining [cr/lf] to space
> Change %% back to [cr/lf][cr/lf]
>
> which give you:
>
> This is first line. This is second line
>
> This is first line of second paragraph. This is second line of second
> paragaph.

If you are accustomed to the using the find and replace method, then
perhaps try using nedit.

http://www.nedit.org/

Nedit understands how to read and write dos format files, in addition to
being an excellent text editor. I tried your example, and it worked
reasonably well. See if this works for you. I typed it out in detail,
but after doing it once, it goes pretty quick. This is much easier with
a 3 button mouse, but if you don't have that, clicking both buttons
simultaneously should emulate the center button.

1) Open the file (nedit will automatically recognize that it is DOS
format), and open the "replace" dialog.
2) With your mouse, select the two cr/lf pairs that indicate a paragraph
mark. Move the mouse to the "find" box in the replace dialog, and click
the center button. This will paste the cr/lf pairs into the box.
3) Put the "%%" character into the replace box, and click the "Replace
all in: Window" button.
4) Click in the Find window, hit the forward arrow once (to make sure
you are not at the beginning of the line) and then hit backspace once.
The box now contains a single cr/lf pair.
5) Put a space in the replace box and again do a "Replace all in: Window".
6) Again click in the Find window, hit the forward arrow once and then
backspace, then enter the "%%" characters.
7) Now to get the cr/lf pair into the replace box, go to the text
window, click on the beginning of the line and hit return. Now select
the return, as in 2) above, and paste it with the center mouse button
twice. Do the "Replace all in: Window" and you should have what you
want. Delete the return you put into the beginning of the file, if desired.
8) When you save the file, nedit will automatically save it in DOS format.

--
My real email is akamail.com@dclark (or something like that).

Eric Worrall

unread,

Sep 15, 2002, 4:28:18 PM9/15/02

to

Then use a graphical solution

cat > convert.sh <<eof
#!/usr/bin/wish

proc refreshlists {} {
.fdf.dirf.dirs delete 0 end
.fdf.filef.files delete 0 end
foreach filename [split [exec ls -a] \n] {
if [file isdirectory $filename] then {
.fdf.dirf.dirs insert end $filename
} elseif [expr [string last ".txt" $filename] >= 0] then
{
.fdf.filef.files insert end $filename
}
}
}

proc changedir {} {
set cursel [.fdf.dirf.dirs curselection]
cd [.fdf.dirf.dirs get $cursel]
refreshlists
}

proc doconversion {} {
set cursel [.fdf.filef.files curselection]
if [llength $cursel] then {
set filename [.fdf.filef.files get $cursel]
set cmds "s/\\\r$//"
exec sed $cmds $filename > "$filename.tmp"
exec mv -f "$filename.tmp" $filename
}
}

wm title . "DOS to Linux"
frame .fdf -borderwidth 0
frame .fdf.filef -borderwidth 0
frame .fdf.dirf -borderwidth 0
label .fdf.dirf.dirslbl -text "Directories"
label .fdf.filef.fileslbl -text "Files"
listbox .fdf.dirf.dirs -height 10 -yscrollcommand { .fdf.dirf.dirsscroll
set }
scrollbar .fdf.dirf.dirsscroll -orient v -command { .fdf.dirf.dirs yview
}
listbox .fdf.filef.files -height 10 -yscrollcommand {
.fdf.filef.filesscroll set }
scrollbar .fdf.filef.filesscroll -orient v -command { .fdf.filef.files
yview }
button .conv -text "Convert" -command { doconversion }
button .refresh -text "Refresh" -command { refreshlists }
button .qt -text "Quit" -command { exit }

pack .fdf.dirf.dirslbl -side top -anchor nw
pack .fdf.dirf.dirs -side left -expand 1 -fill both
pack .fdf.dirf.dirsscroll -anchor w -side left -fill y
pack .fdf.filef.fileslbl -side top -anchor nw
pack .fdf.filef.files -side left -expand 1 -fill both
pack .fdf.filef.filesscroll -anchor w -side left -fill y
pack .fdf.dirf .fdf.filef -fill both -expand 1 -anchor nw -side left
-padx 2
pack .fdf -fill both -anchor nw -expand 1
pack .qt -side left -padx 5 -pady 5 -anchor sw -expand 1
pack .refresh -side left -padx 5 -pady 5 -anchor s -expand 1
pack .conv -side left -padx 5 -pady 5 -anchor se -expand 1

bind .fdf.dirf.dirs <<ListboxSelect>> +changedir;

refreshlists
eof

chmod 755 convert.sh
./convert.sh

Eric Worrall

--
You have just received an Etech Solution
For all your Linux requirements contact
ewor...@netcomuk.co.uk

Neal P. Murphy

unread,

Sep 15, 2002, 5:21:41 PM9/15/02

to

Adams-Blake Co. passionately intoned:

> I'm very, very appreciative of your effort to help, but it does not really
> solve my problem. The problem is that I get files with hard returns at the
> end of each line... and then a line with only a return as a space between
> paragraphs. What I want to do is take out the returns on all the lines to
> form a paragraph but leave a return on a line between paragraphs. In most
> word processor packages they use a "paragraph" marker at the end of each
> paragraph which (among other things) cause a hard return. Thus I need a
> hard return at the end of the paragraph and a hard return on the next line
> which has no text.

I *thought* that's what you were trying to do.

Using vi(), you can use the J command to join lines into a single
line. But this is a manual process, and you likely would rather
have something that is a little more automated.

The following shell script should do what you want. Put it into a file
called, for example, remhard and then 'chmod +x remhard'. Note that
the control characters in the ed comand (not in the comments) are not
<caret><letter>. Rather, they are <CNTRL/letter>. If you use vi to
create the file, you would use <CNTRL/V><CNTRL/letter> to enter these
control characters in the script. If you have trouble, I can email the
script to you. This script *does* remove the blank line between paragraphs,
which, IMHO, is the correct thing to do for typesetting. It is easy enough
to put a blank line between paragraphs if that is preferred. It would also
be fairly easy to modify this script to work on a bunch of files, and to
copy the files to a new location before changing them (so the originals
are not touched). You could even modify the script to insert SGML markup.

Fest3er

---------------
#! /bin/sh

# User must specify a file to edit. If not, print usage and quit
if [ $# -eq 0 ]; then
echo "Usage: $0 filename"
exit 1
fi

# Save the filename argument
FILE="$1"

# Now use ed() to modify the file
# There are 7 commands (each starts with %, meaning 'the whole buffer)
# 1. delete all <CR> in the file (using vi, the ^M is entered via ^V^M -
# <CTRL/V><CTRL/M>
# 2. append a space to each line
# 3. replace blanks lines with ^L (form feed, entered in vi using
# <CTRL/V><CTRL/L>)
# 4. join all lines into a single line
# 5. replace every " ^L" with a newline (splitting the paragraphs;
# this command uses two lines!)
# 6. remove all spaces at the end of every line
# 7. replace all multiple spaces with a single space
# 8. write the buffer back to the file and quit

ed "$FILE"<<END
%s/^M//g
%s/$/ /
%s/^ $/^L/
%j
%s/ ^L/\\
/g
%s/ *$//
%s/ */ /g
%wq
END
---------------

s. keeling

unread,

Sep 16, 2002, 2:08:59 AM9/16/02

to

On Sat, 14 Sep 2002 13:33:32 -0700, Adams-Blake Co. <aremovet...@adamsremovethis-blake.com>:

> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
> found the answer.
>
> We often get pure text docs (MyText.txt) that were saved in Windows and
> have a cr/lf at the end of each line. Ex:
>
> This is first line [cr/lf]
> This is second line [cr/lf]
> [cr/lf]
> This is first line of second paragarph [cr/lf]
> This is second ine of seconc paragaph [cr/lf]

tr is good.

For months, it seems, I've been fighting with something that ought to
be easy. It was, in the end.

I get entries from syslogd reporting portscans. Unfortunately, the
log entries are formatted in various different ways. A cut and paste
from an email that greps messages for "Packet log:" produces:

- lines that are broken into two lines, Unix \n at the end of each.

- lines that are not broken, but are missing critical spacing
between entries.

- lines that are correct.

while( <INFILE> ) {
chomp( $line = $_ );

# make up for the various format types of "Packet log"
# syslog entries. some are correct, some are broken
# into two discrete lines, some are missing critical
# spaces.
#
if( length( $line ) > 60 && length( $line ) < 117 ) {

# this is the first line of a two line pair. stick
# a space onto the end of it.
#
$line .= q( );

# get the second part of this two line pair.
#
chomp( $line2 = <INFILE> );

# stick the two together into one.
#
$line .= $line2;

# ensure critical spacing is correct.
#
$line =~ s/\wL=/ L=/;
$line =~ s/\wS=/ S=/;

} else {

# this is one long line (correct) but it may be missing
# critical spacing. even if this is done to thoroughly
# correct log entries, perl's split() will do the right
# thing.
#
$line =~ s/L=/ L=/;
$line =~ s/S=/ S=/;

}

etc.

--
Any technology distinguishable from magic is insufficiently advanced.
(*) Give up Spammers; I use procmail. www.spots.ab.ca/~keeling
- - http://learn.to/quote (Ger.) http://quote.6x.to (Eng.)

Steve O'Neill

unread,

Sep 16, 2002, 3:05:10 PM9/16/02

to

On Sat, 14 Sep 2002 22:37:50 -0700, Adams-Blake Co.

...(earlier stuff removed)

>
>I'm very, very appreciative of your effort to help, but it does not really

>solve my problem. The problem is that I get files with hard returns at the

>end of each line... and then a line with only a return as a space between
>paragraphs. What I want to do is take out the returns on all the lines to
>form a paragraph but leave a return on a line between paragraphs. In most

>word processor packages they use a "paragraph" marker at the end of each

>paragraph which (among other things) cause a hard return. Thus I need a
>hard return at the end of the paragraph and a hard return on the next line
>which has no text.
>

>Al
>

I recommend using "flip". This small program will convert line endings
between DOS and UNIX with ease, and, IIRC, will _not_ delete lines that have
only a cr/lf or newline on them. I've been using it for years and it's never
failed me. You can find it at:

ccrma-www.stanford.edu/~craig/utility/flip/

SJO

kd6ozk

unread,

Sep 16, 2002, 3:37:03 PM9/16/02

to

mjt <mjtobler@removethis_consultant.com> wrote in message news:<h8Pg9.2175$Zb2.72...@newssvr12.news.prodigy.com>...

> Chris Gordon-Smith wrote:
>
> > Since KHexEdit is part of KDE its easy to use. Anyone who is currently
> > doing the procedure you describe in MSWord should be able to do the same
> > thing in KHexEdit after a maximum of 10 minutes training.
> >
>
> ... tr is WAY easier ... you can have a file converted before
> KHexEdit is finished loading and and that GUI jazz :))

Apart for all the other replies to this I think, as one who also
receives various files from clients, that I would want to examine the
file before running it through sed or tr. And, since the file is
already open, something like the emacs solution would be better.

Of course, if every file came in with the same formatting then I would
agree that tr or sed are the faster alternatives to any editor, GUI or
CLI.

spi...@freenet.co.uk

unread,

Sep 20, 2002, 7:13:01 AM9/20/02

to

Adams-Blake Co. <aremovet...@adamsremovethis-blake.com> wrote:
> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
> found the answer.

> We often get pure text docs (MyText.txt) that were saved in Windows and
> have a cr/lf at the end of each line. Ex:

> This is first line [cr/lf]
> This is second line [cr/lf]
> [cr/lf]
> This is first line of second paragarph [cr/lf]
> This is second ine of seconc paragaph [cr/lf]

My preferred editor is joe.
To read a file so that cr/lf don't show up, start joe with
joe -crlf <filename>
To remove the spurious characters, start joe normally and do a
search/replace.

^KF
Find (^C to abort):`<ctrl>M
(I)gnore (R)eplace (B)ackwards Bloc(K) NNN (^C to abort):r
Replace with (^C to abort):<enter>
Replace (Y)es (N)o (R)est (B)ackup (^C to abort)? r

alternatively, use dos2unix or one of the other filter scripts.

Martin Wickman

unread,

Sep 20, 2002, 12:59:21 PM9/20/02

to

In article <trvema...@freenet.co.uk>, spi...@freenet.co.uk wrote:
> Adams-Blake Co. <aremovet...@adamsremovethis-blake.com> wrote:
>> We're new to Linux (Mandrake 8.2) and have Star Office as well as a bunch
>> of editors (emacs, Kate, gedit, etc.) I looked all over Google and have not
>> found the answer.
>
>> We often get pure text docs (MyText.txt) that were saved in Windows and
>> have a cr/lf at the end of each line. Ex:
>
>> This is first line [cr/lf]
>> This is second line [cr/lf]
>> [cr/lf]
>> This is first line of second paragarph [cr/lf]
>> This is second ine of seconc paragaph [cr/lf]
>
> My preferred editor is joe.
> To read a file so that cr/lf don't show up, start joe with
> joe -crlf <filename>
> To remove the spurious characters, start joe normally and do a
> search/replace.

Just a note. Emacs (and probably vi) will handle that
automatically. If its a DOS file then emacs with end all lines with
\r\n, including any inserted/changed lines! Dito for MAC and Unix files as
well. Transparent most of the time.

Philipp Pagel

unread,

Sep 23, 2002, 11:36:14 AM9/23/02

to

Adams-Blake Co. <aremovet...@adamsremovethis-blake.com> wrote:
> We often get pure text docs (MyText.txt) that were saved in Windows and
> have a cr/lf at the end of each line. Ex:

> Maybe there is a utilty in Linux that can do this for me? We're a book

> publishing company so obviously this is pretty important to us. If you have
> a SIMPLE solution or maybe a script, please let us know.

Hi!

This litle perl script will chop off the cr/lf, add a space to the end
of the line and put the cr/lf back in case the line was empty. The extra
space prevents subsequent words on separate lines from beeing
concatenated:

-----------------snip-------------------
#!/usr/bin/env perl
while (<>) {
$/ = "\r\n";
chomp;
print "$_ ";
print $/ if $_ eq '';
}
-----------------snip-------------------

I guess that's what you wanted to do - right?
So just make the script executable:

> chmod u+x whaterveryoucalledit

and run it on your files:

> ./whateveryoucalledit < infile.txt > outfile.txt

hope it helps

Philipp

--
Dr. Philipp Pagel Tel. +49-89-3187-3675
Institute for Bioinformatics / MIPS Fax. +49-89-3187-3585
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1
85764 Neuherberg
Germany

Philipp Pagel

unread,

Sep 23, 2002, 12:23:57 PM9/23/02

to

It doesn't really matter but I meant to write:

-----------------snip-------------------
#!/usr/bin/env perl
$/ = "\r\n";
while (<>) {

chomp;
print "$_ ";
print $/ if $_ eq '';
}
-----------------snip-------------------

I also read the original posting again and found that you wanted to keep
the empty lines rather than use them only as paragraph delimiters. In
that case things get slightly more complicated:

-----------------snip-------------------
#!/usr/bin/env perl
$/ = "\r\n";
$f = 0;
while (<>) {
chomp;
if ($_ eq '') {
print $_, $/;
print $/ if $f;
$f = 0;
} else {
print ' ' if $f;
print;
$f = 1;
}
}
-----------------snip-------------------

This one also takes care of the unecessary spaces that the other script
added at the end of the paragraph.

cu

James Brost

unread,

Sep 23, 2002, 4:49:07 PM9/23/02

to

unix2dos-2.2 does this easily.

dos2unix MyText.txt
unix2dos MyText.txt

Philipp Pagel

unread,

Sep 24, 2002, 3:04:05 AM9/24/02

to

James Brost <kb2...@hotmail.com> wrote:
> unix2dos-2.2 does this easily.

> dos2unix MyText.txt
> unix2dos MyText.txt

Not quite. He wants to eliminate the \r\n within paragraphs but keep
them at the end of the paragraph and in any empty lines. dos2unix only
changes \r\n to \n (and converts other non-ascii characters apropriately).

cu