On 11.05.2013 07:26,
jack...@gmail.com wrote:
> I have to convert a slew of publication lists to html and because this would
> be quite tedious, I decided to write a short script to do the work for me.
>
> The simple script below reads each line of the file. If the line has text,
> it just prints the line. If the line is blank, it prints the closing "</div>"
> and the opening "<div>" with an in-line style set for the time-being. When it
> encounters a 2nd consecutive blank line, it just prints the blank line using
> the flags.
For such tasks (reading all lines consecutively, match patterns, insert or
replace text) it is advantageous to switch to a more appropriate tool (like
awk, perl, ...) it will make the task a lot easier. For example...
awk '
NF { bl=0; print; next }
!bl { bl++; print "</div>\n<div style=\"...\">"; next }
{ print "" }
' publications.txt
(This is untested code.) Mind that on WinDOS you may have quoting issues;
in that case put the awk program (everything between the single quotes) in
a file and call that file with awk -f awkfile .
Some more notes on your shell code...
>
>
> while IFS="
> " read -r line
> do
> if [ "$( echo "$line" |grep \"[0-z]\" )" = "" ]
You can use a command pipeline with grep directly after 'if' and avoid
the test expression and comparison. To invert expressions use grep -v,
and newer shells allow negation of the whole command after 'if' with '!'.
The expression [0-z] is not safe; better use character classes.
> then
> if [ "$flag" = "false" ]
> then
> printf "%s\n" "</div>"
> printf "%s\n" '<div style="margin-bottom: 0.75em">'
> flag="true"
> else
> printf "%s\n" ""
> fi
> else
> printf "%s\n" "$line"
> flag="false"
> fi
> done < publications.doc
>
>
> However, since this is often a Microsoft Word document, I also encounter
> Microsoft's non-ascii characters such as their:
>
> beginning quote mark ( � ) ending quote mark ( � ) hyphen ( � )
Note that such characters *may* be multi-byte encodings (not sure
what Word uses here). I'd export from Word in a portable standard
format, or convert the output with iconv before further processing.
>
>
> Is there any way to test for these characters as I read each line? I just
> want to replace them with standard ascii quote marks and dashes.
It depends on the used tools how they handle such characters and
whether they support multi-byte character encodings. For character
replacements I use the tr command, for string replacements I use
the sed command. If you want the substitution embedded in the above
awk program you'd use the awk function gsub(), for example as in
{ gsub(/�/,"-",$0); gsub(/[��]/,"\"",$0) }
Use the GNU gawk version of awk to be on the safe side.
Janis
>
> Thanks.
>