On Wednesday, May 18, 2016 at 10:43:50 AM UTC-4, Ed Morton wrote:
> function new_striptags(HTML)
> {
> #strip anything between < and next occurrence of >
> gsub(/<\/?[^>]*>/,"",HTML)
> return(HTML+0 == HTML ? HTML+0 : HTML)
> }
I hate to be a party pooper, but the (x+0 == x) test works only for strnum values, not for strings. Here's an example:
bash-4.2$ echo junk5.555555555 | gawk '{x = $1; print typeof(x); gsub(/junk/,"", x); print typeof(x); print x; print (x+0 == x)}'
strnum
string
5.555555555
0
Without the gsub, everything is good:
bash-4.2$ echo 5.555555555 | gawk '{x = $1; print typeof(x); print (x+0 == x)}'
strnum
1
When gsub succeeds, it changes the type from strnum to string. If the gsub doesn't match, it doesn't seem to have this effect:
bash-4.2$ echo 5.555555555 | gawk '{x = $1; print typeof(x); gsub(/junk/,"", x); print typeof(x); print x; print (x+0 == x)}'
strnum
strnum
5.555555555
1
When a string contains a numeric value like "5.555555555", the (x+0 == x) test occurs as a string comparison. So x+0 is converted back to a string using CONVFMT:
bash-4.2$ echo junk5.555555555 | gawk '{x = $1; print typeof(x); gsub(/junk/,"", x); print typeof(x); print x; print (x+0 == x); print (x+0)""}'
strnum
string
5.555555555
0
5.55556
This is a really subtle and annoying issue. Also, you should note that the (x+0 == x) test ignores leading and trailing white space. So a strnum containing " 5" or "5 " would pass the (x+0 == x) test. That may or may not be what you want.
Lastly, if you have a variable of uncertain type, you can convert it to a strnum forcibly using the split or match functions. So you could do something like this:
bash-4.2$ cat /tmp/test.awk
function isnumeric(x, f) {
match(x, /^(.*)$/, f)
x = f[1]
return (x+0 == x) # ignoring white-space issues
}
bash-4.2$ gawk -i /tmp/test.awk 'BEGIN {x = "5.555555555"; print (x+0 == x); print isnumeric(x)}'
0
1
This is really yucky stuff. You can also use split() to convert a string to a strnum if you know of an FS character that is guaranteed not to be in the string. Maybe that's faster than match. I haven't tested the relative performance, but I guess that the match() call is pretty slow.
Regards,
Andy