I run cmd.exe from Windows Vista Business SP2.
I have discovered
...\gawk\examples>awk "{ print """Hello, world""" }" test
Hello, world
...\gawk\examples>
Why is it necessary to use triple quotes to get an embedded quote?
Let me put my question into context. I have a text-processing job which
is probably beyond the reasonable capability of sed. I think it can be
handled by awk. I have forgotten both awk and perl and need to refresh
my knowledge. I am relearning awk with sed & awk, ISBN 1565922255.
The first awk example therein is
$ echo 'this line of data is ignored' > test
$ awk '{ print "Hello, world" }' test
Hello, world
That will work happily in many UNIX shells. However and unsurprisingly
in my environment, it results in:
...\gawk\examples>echo 'this line of data is ignored' > test
...\gawk\examples>awk '{ print "Hello, world" }' test
awk: '{
awk: ^ invalid char ''' in expression
...\gawk\examples>
[I am using a Windows port of gawk 3.1.6 for awk.]
I messed around with quoting after this failure:
...\gawk\examples>awk "{ print ""Hello, world"" }" test
awk: { print "Hello,
awk: ^ unterminated string
...\gawk\examples>
I am happy to have discovered
...\gawk\examples>awk "{ print """Hello, world""" }" test
Hello, world
...\gawk\examples>
I had found nothing useful when I googled for a solution.
Now I look for triple quotes, I suspect my finding is not original.
Off topic stuff starts here:
Recently Transport for London (Tfl) responded to a Freedom of
Information Act request for London Underground working timetables with a
set of .pdf files accessible from <
http://www.whatdotheyknow.com/request
/wtts_for_london_underground_line#incoming-257325>
I find .pdf files inflexible and intend to convert them to text.
I have started on <
http://www.whatdotheyknow.com/request/104512/response
/257325/attach/8/W%20C5.pdf> which is the simplest file. The text output
by PDF-rendering software is garbled. This is mainly due to the way in
which times are presented. The precision of timing is the quarter minute
and there is a necessity to represent one quarter, half, and three
quarter characters. Rather than use the Unicode characters
U+00BC Vulgar Fraction One Quarter
U+00BD Vulgar Fraction One Half
U+00BE Vulgar Fraction Three Quarters
in timetables, 3 small glyphs are used.
[charmap is handy for looking at unusual characters.]
e.g. three quarters is represented as
3
-
4
Worse than that, when Foxit - the least bad software I found for
converting PDF files to text - converts the PDF to text, it shuffles
text. e.g. if my readers will allow me to represent those 3 characters
by Q, H and T, then
06 25Q 06 30 06 35H 06 40T
might be converted to text as
1 3
06 25 06 30 06 351 06 404
4 2
and I can easily lose the fractions in sed, as I don't need that level
of precision, with regular expression such as
\([0-2][0-9] *[0-5][0-9]\)\([1-4]\)
However, it might also be converted as
1 06 30 3
06 25 06 351 06 404
4 2
While I can easily get to
06 30
06 25 06 35 06 40
with sed, going from there to
06 25 06 30 06 35 06 40
is beyond my knowledge of sed. I am happy to work out how to do it with
a more powerful tool and awk seems suitable. ;)
--
Walter Briscoe