Given a text file where lengths are given by a floating point
number followed by the letters "inch" divide the floating
point number with 2.54 and put in the resulting number followed
by "cm".
Here is the Perl solution and it is a one-liner.
#!/usr/local/bin/perl -p
s|(\d+\.?(\d+)?)\s*inch| sprintf("%0.2f cm", $1 / 2.54); |eg;
--
___ ___
/ o \ o \
Dov Grobgeld ( o o ) o |
The Weizmann Institute of Science, Israel \ o /o o /
"Where the tree of wisdom carries oranges" | | | |
_| |_ _| |_
Win Treese Cambridge Research Lab
tre...@crl.dec.com Digital Equipment Corp.
Um.. this problem feels a bit contrived. [That is, chosen to fit a
language feature rather than chosen to solve an applicative problem.]
Given a text file where lengths are given by a floating point
number followed by the letters "inch" divide the floating point
number with 2.54 and put in the resulting number followed by
"cm".
See, if you'd really have had some information you wanted to do this
with, you'd probably either want to do it by hand, or change the
routine which was generating the information. And, uh... you need to
_multiply_ a measurement in inches by 2.54 to get centimeters.
Also, measurements in inches are often expressed in forms other than
floating point.
Anyways, the best I've been able to come up with isn't what I'd call
elegant. I can build a heuristic which picks out all occurances of
floating point numbers and all occurances of the word 'inch', and I
can have conversion rules which do the above transformations (find
text representing division by 2.54, and replace 'inch' with 'cm')
where appropriate (uses backtracking where conversion fails). Not
only is this not elegant, it's not fast.
--
Raul Deluth Miller-Rockwell <rock...@socrates.umd.edu>
Still trying to figure out a good textual search/replace primitive for J.
:#!/bin/awk
:awk -Fi '{printf("%0.2f cm\n", $1/2.54)}'
Sorry, those aren't the same. All the awk program does is take the first
field and output the conversion. The perl one finds the inches anywhere in the
line, and converts it, outputting the entire record with just #.## inches
converted in cm. Under perl, this input record
I have 20 inch piece of wire to make a 2.3 inch spindle.
produces this output record:
I have 7.87 cm piece of wire to make a 0.91 cm spindle.
The awk program doesn't.
And what's the -i option to awk? I find it not in awk, nawk,
or gawk, and don't you mean `#!/bin/awk -f'?
--tom
------------------------------------
Roger Hui, Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9
Phone: (416) 925 6096; Fax: (416) 488 7559
>converted in cm. Under perl, this input record
> I have 20 inch piece of wire to make a 2.3 inch spindle.
>produces this output record:
> I have 7.87 cm piece of wire to make a 0.91 cm spindle.
Then it's a pretty silly program, since 20 inches is 50.8cm...
Mike Percy | gri...@hubcap.clemson.edu | I don't know about
Sr. Systems Analyst | msp...@clemson.clemson.edu | your brain, but mine
Info. Sys. Development | msp...@clemson.BITNET | is really...bossy.
Clemson University | (803) 656-3780 | (Laurie Anderson)
Pedantic? Perhaps. But challenges should be stated fairly precisely.
Hm, make that $x*2.54, not $x/2.54 then.
--tom
Richard Harter:
Is the perl solution correct? That is, does it preserve the exact
white space sequence between the number and the word?
Well, I think there are more important issues to address here, but to
answer your question:
$ cat >test && chmod +x test
#!/usr/local/bin/perl -p
s|(\d+\.?(\d+)?)\s*inch| sprintf("%0.2f cm", $1 / 2.54); |eg;
$ echo '5 inches' | ./test
1.97 cmes
"There it was, gone completely..."
: I have my
: reservations about the "elegance" of these solutions, one liners or
: not. A quick (and probably not accurate) count of characters in the
: perl solution gives me 29 punctuation chars, 20 alphabetic characters,
: 6 digits, and 5 white space characters. The J solution is even worse.
Be glad that Perl doesn't require backslashes on parens--it would have been
worse. Actually, the Perl solution is overly complex. Note that (\d+)?
is exactly equivalent to \d*.
: I expect it's all a matter of taste -- some people like punctuation
: rich code.
It's certainly possible to do it more verbosely in Perl, if you want...
: Is the perl solution correct? That is, does it preserve the exact
: white space sequence between the number and the word? Perl aficianados
: will tell us, I am sure.
Easy enough. This also handles plurals.
s/(\d.?\d*)(\s*)inch(es)?/sprintf("%0.2f$2cm", $1 * 2.54)/eg;
: One could also quibble about the use of a
: fixed format for the converted size. In addition one could quibble
: about not handling exponential notation, but that is easily taken care
: of.
One could recognize exponential notation with
s/(\d.?\d*([eE][+-]?\d+)?).(\s*)inch(es)?/sprintf("%0.2f$3cm", $1 * 2.54)/eg;
As for custom output, just call a subroutine on the replacement side:
s/(\d.?\d*([eE][+-]?\d+)?).(\s*)inch(es)?/&myformat($1,$3)/eg;
: This seems like a reasonable problem to me; I wouldn't call it contrived,
: so much as specialized. If we look at it more generally, we can state
: it like this:
:
: Find all instances of three successive strings matching three specified
: patterns. For each instance replace the first by a operation on the string,
: leave the second alone, and replace the third by an another string.
In general
s/($foo)($bar)$abc/&somefunc($1) . $2 . $xyz/eg;
but see below.
: The general language question, then, is can the language provide a
: good mechanism for finding a sequence of patterns, can it identify the
: patterns in question, and can it do the replacement of the patterns in
: situ.
The main problem with Perl in this regard is the lack of differentiation
between parentheses which produce substrings, and the parentheses which
are used for grouping. Note that we used $2 in one case, and $3 in another.
(Perl is not the only regexp implementation that has this problem, of course.)
As a workaround, we can use $+, which returns the last substring, but then
we'd have to write inche?s? instead of inch(es)?. Eventually I hope to
separate the two uses of paren in Perl.
Larry Wall
lw...@netlabs.com
Roger Hui:
: > A solution in J:
: > ;@(<@(,&(' cm',10{a.))@(%&2.54&.".)@(_4&}.);._2)
Richard Harter:
: I don't know J but there seems to be an inch missing.
Well, you didn't want 'inch' in the result, did you?
Seriously, Roger Hui's code would solve an instance of the problem
proposed by Dov Grobgeld. The details of this were documented in
Roger's post. Specifically, it would convert text in the form:
1 inch
2 inch
3.5 inch
4 inch
1.006e_3 inch
0 inch
It would not, however, deal with a more free-form english text:
Page 67 inchoates chapter 5, on how the B-2 inches along. The B-2,
measuring 3 ft, 8 inches from nose to tail, is one of the least
successful examples of grey plywood ever seen by man. Most plywood
is cut in sections 72 by 72 inches across.
[Note to those people who don't read perl: No, the perl programs
wouldn't yield the "intuitively correct" result for this sample,
either. What's significant is that this is more-or-less within the
original specification -- though it's improbable that Dov Grobgeld had
this sort of construct in mind.]
Anyone who understands enough of perl to read the original perl
fragment would have a pretty good idea of what that program would do.
And, if it weren't for the fact that the code (nand the specification)
did not implement a valid conversion from inch -> cm, you might be
tempted to think that the Dov's code was designed to solve a real
problem, of that form. [Unfortunately, my experience proofreading
leads me to think of different sorts of examples.]
On the other hand, Dov has raised a rather interesting problem
(generalized search and replace involving computation). Still, an
attempt to solve such a general problem, without first establishing
what that general problem _is_, seems rather futile.
%{
double atof();
%}
d [0123456789]
s [ ]
%%
{d}+\.?({d}+)?{s}*inch { printf("%0.2f cm", atof(yytext)); }
. ECHO
--
-- Peter da Silva, Ferranti International Controls Corporation
-- Sugar Land, TX 77487-5012; +1 713 274 5180
-- "Have you hugged your wolf today?"
> Richard Harter:
> Is the perl solution correct? That is, does it preserve the exact
> white space sequence between the number and the word?
> Well, I think there are more important issues to address here, but to
> answer your question:
> $ cat >test && chmod +x test
> #!/usr/local/bin/perl -p
> s|(\d+\.?(\d+)?)\s*inch| sprintf("%0.2f cm", $1 / 2.54); |eg;
> $ echo '5 inches' | ./test
> 1.97 cmes
Amusing. So much for clever one-liners. :-)
Perhaps if we restate the problem a little bit more generally as:
Given a text document in which units are given in inches. Locate
all instances of the pattern NUMBER-WHITESPACE-UNITS where number
may be integer, floating point, or in exponential format, and UNITS
are English length units, and convert to metric units while preserving
format and grammatical correctness.
Perhaps it isn't quite as susceptible to clever but faulty one-liners.
On the other hand it seems like a reasonable indicial problem for
grading the effective of a language for text processing.
--
Richard Harter: SMDS Inc. Net address: r...@smds.com Phone: 508-369-7398
US Mail: SMDS Inc., PO Box 555, Concord MA 01742. Fax: 508-369-8272
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
One could still do this in Perl with
s/(\d+\.?\d*([eE][+-]?\d+)?)(\s*)($units)/&myformat($1,$3,$4)/eg;
if one defined $units to be "inches|inch|feet|furlongs" and so on. But
I'd probably do it another way, splitting up the paragraph as if the
numbers were the delimiters:
#!/usr/bin/perl
$/ = ""; # paragraph mode.
while (<>) {
@chunks = split(/(\d+\.?(\d*)([eE][+-]?\d+)?)(\s*)/);
print shift @chunks; # initial text chunk
SPOT:
while (@chunks) {
($num,$frac,$exp,$white,$_) = splice(@chunks, 0, 5);
CASE:
{
# Note that the following are optimized into a switch internally.
s/^inch(es)?\b/cm/ && ($num *= 2.54, last CASE);
s/^in\./cm/ && ($num *= 2.54, last CASE);
s/^feet\b/m/ && ($num *= .3048, last CASE);
s/^ft\.?/m/ && ($num *= .3048, last CASE);
s/^quarts?\b/cc/ && ($num *= 946.3529, last CASE); # :-)
# and so on...
print $num, $white, $_;
next SPOT;
}
# a simpleminded format selector
$precision = length($frac) || 2;
if ($exp ne '') {
$num = sprintf("%0.${precision}e", $num);
}
else {
$num = sprintf("%0.${precision}f", $num);
if ($num =~ /0000\.00|^0.0/) {
$num = sprintf("%0.${precision}f", $num);
}
}
print $num, $white, $_;
}
}
With an input of
A 3 inch B 3.12345 in. C 6.02e-23 feet D 4 quarts.
this spits out
A 7.62 cm B 7.93356 cm C 1.83e-23 m D 3785.41 cc.
Each recognized unit selects its preferred conversion unit, so you can
change that cc to ml or even, horrors, lit(er|re)s.
As to grammar and punctuation, arbitrary modifications can be done to
the next chunk of text depending on which particular kind of unit is
found. If it's desirable to modify the previous chunk of text, we can
treasure it up in a variable rather than printing it directly as we do
here. Note that we can also handle special cases like "20 by 30 feet"
or "2x4" fairly easily by recognizing "by" or "x" as a special kind of
unit.
It would also be pretty easy to use the "units" program as a back end
to determine the proper scaling factors, or even which kind of metric
unit is conformable to the word found after the number, if any. (You'd
probably want to include additional words if the next one was, say, "per".)
Larry Wall
lw...@netlabs.com
[re: the following line of Perl written by ??]
s|(\d+\.?(\d+)?)\s*inch| sprintf("%0.2f cm", $1 / 2.54); |eg;
Is the perl solution correct? That is, does it preserve the exact
white space sequence between the number and the word? Perl aficianados
will tell us, I am sure. One could also quibble about the use of a
fixed format for the converted size. In addition one could quibble
about not handling exponential notation, but that is easily taken care
of.
Here's a Haskell/Gofer version that I think does everything stated
above (i.e preserves whitespace, handles exponentials), it even copes
with lines like :-
An artificial example just created to split the length 2.37
inches over two lines.
>bar [] = []
>bar l@(c:cs) =
> if isDigit c
> then if trailingWord == "inch" || trailingWord == "inches"
> then cmsAsString ++ separator ++ "cm" ++ (bar next)
> else c:(bar cs)
> else c:(bar cs)
> where
> [(inches, afterNum)] = readFloat l
> (separator, afterSeparator) = span isSpace afterNum
> cmsAsString = showFloat (inches * 2.54)
> trailingWord = (head . words) afterSeparator
> skipOriginalUnit = drop (length trailingWord)
> next = skipOriginalUnit afterSeparator
>
>main = interact bar
This is far from a perfect solution, as although it answers the above
question, it wouldn't cope with the free form examples given by Raul
Rockwell. However, as Raul has pointed out, the question doesn't
contain enough detail to create a proper solution in these cases. I
also agree with Raul's point that the input format is rather poor, it
would be better to try and change the way the "XXX inches" were
produced in the first place.
Anyone care to provide a formal definition of the problem e.g. some
VDM or Z so we can all agree on the problem were trying to solve :-)
Stephen J. Bevan be...@cs.man.ac.uk
BTW this whole message is a literate Haskell/Gofer script.
>Is the perl solution correct?
As we've seen, no it isn't. But there are other problems. For
example, do we really want "about 7 inches" to turn into "about 17.78 cms"?
Do these various solutions allow the number and the word to be split over a
line break? Etc. Any program that solves the more general problems in a
reasonable way is going to be a lot more complicated than "one line". My
suspicion is that a realistic problem with a one-line solution must have
enough structure in it that many of the available tools/languages would
be suitable.
[... problem generalises to ...]
>Find all instances of three successive strings matching three specified
>patterns. For each instance replace the first by a operation on the string,
>leave the second alone, and replace the third by an another string.
Isn't this exactly the sort of thing that a simple compiler is
good for? For "simple" operations, Lex [or a similar lexical scanner]
could do this easily; for messier cases, building a small compiler
with Lex and Yacc [or whatever] is surely the way to go. That way,
the "inch[es]" -> "cm[s]" (and presumably you also want "foot|feet|yard[s]"
-> "metres", "mile[s]" -> "km[s]", "acre[s]" -> "ha", etc), the white
space problem, and the generalised number format are all easily described
and transformed.
--
Andy Walker, Maths Dept., Nott'm Univ., UK.
a...@maths.nott.ac.uk
I don't disagree. I presumed the intent was allow someone unfamiliar with
such units as firkins per fortnight to browse a document containing such
units and get a better idea of the quantities involved. Certainly, if
you want to have approximations, computers can do that too...
: Conversion from any measurement system to any other requires true semantic
: content, including intelligent roundoff, choice of units depending on the
: quantity and kind of thing being measured, the standard measures for the
: country, and so on. The 3785.41 cc in your example, Larry, would certainly
: be 3.78 litres in any metric country, since cc are never used (save possibly
: for car engines for historical reasons), but if the quantity were 785.41 cc
: then it would be 785 ml in Canada, and 78.5 cl or 7.85 dl in Europe due to
: national standards.
You don't need to tell me that. Why do you think I put a smiley on the
conversion from quarts to cc? I also said, immediately after the
portion you quoted:
: > Each recognized unit selects its preferred conversion unit, so you can
: > change that cc to ml or even, horrors, lit(er|re)s.
Please don't paint me stupid, Central North American that I am.
: So hey, the whole thing is a bad example.
: Let's see Perl do something useful, I'm interested.
On one level, yes, it's a bad example. But on another level, it's a
reasonable test of how good a language is at making contextually
sensitive changes. If you want to see useful Perl examples, we can
certainly arrange to have comp.lang.misc flooded with them, but I don't
think that would help the advance of civilization. :-)
But I'm glad that you found the code-wars amusing. That is, after all,
their primary purpose, as far as I'm concerned. If some learning
happens at the same time, that's even better.
Larry Wall
lw...@netlabs.com
>#!/bin/awk
>awk -Fi '{printf("%0.2f cm\n", $1/2.54)}'
This fragment should obviously be disqualified as being slightly shorter
and a tad more readable than the posted perl equivalent.
--
Eric W. Sink, Spatial Analysis and Systems Team
USACERL, P.O. Box 9005, Champaign, IL 61826-9005
1-800-USA-CERL x449, e-s...@uiuc.edu
This whole subject is silly. Perl is clearly Turing equivalent, so it
can do anything any other language can do, and like all languages, sometimes
better, sometimes not as good. And like all languages, whether you find
it your cup of tea or not depends as much on you as on the language. I
personally like it, because once I got a good feel for it, I can quickly
write stuff that would be more difficult to write and slower to run in
a shell script, and much more difficult to write (although perhaps faster
in some cases) in C. In other words, it saves me time, and for me
that's very useful.
--
Brian L. Matthews b...@6sceng.UUCP
No, it should be disqualified as not being equivalent to the posted
perl equivalent. :-) The perl one-liner would change the units anywhere
in the line that had a number followed by "inch".
[I'm not sure if this discussion is evidence for the Whorf Hypothesis
or not. On the other hand, perhaps the Whorf Hypothesis is itself
evidence for the Whorf Hypothesis. Linguists do think differently from
other people. If you don't know what the Whorf Hypothesis is, don't
ask, unless you want to start thinking like a linguist... :-)]
[[Yes, I know it's really the the Sapir/Whorf Hypothesis. Sheesh...]]
Larry Wall
lw...@netlabs.com