On 05/14/15 11:26, Kenny McCormack wrote:
> In article <mj1cj3$6a4$
1...@news.m-online.net>,
> Janis Papanagnou <
janis_pa...@hotmail.com> wrote:
>> To get the number of digits of a decimal number we can use the log function.
>> Ideally that would be log10(x), but since there's none available in awk we
>> need to calculate log(x)/log(10). This seems to work fine in gawk:
>>
>> $ awk 'BEGIN {print (log(999)/log(10))+1}'
>> 3.99957
>> $ awk 'BEGIN {print (log(1000)/log(10))+1}'
>> 4
>>
>> But to get whole integer numbers for the number of digits we also need to
>> apply the int() function:
>>
>> $ awk 'BEGIN {print int(log(999)/log(10))+1}'
>> 3
>> $ awk 'BEGIN {print int(log(1000)/log(10))+1}'
>> 3
>>
>> Doh! - And this result is really bad. It seems that implicit rounding issues
>> don't work well if those two functions, log() and int(), are combined.
>
> It seems to me that the key question here is "What are you looking for?"
> I would imagine that you're not really interested in workarounds - of which
> there are dozens. You're not interested in either being "usered" or "XYed".
(Don't know what "usered" means.)
>
> Just for the record, here's a workaround:
>
> $ gawk '{print length(sprintf("%d",$1))}'
Good to know.
> 1000
> 4
> 999
> 3
> 999.999999
> 3
> ^C
> $
>
> Anyway, I did a little investigation, and, although I can't really prove
> anything, I *think* the issue is that there's just enough round-off error
> in your calculations to cause the actual value to be just a smidge below 4.
That's also what I think. Where I start wondering is that *without* the
int() bracket around the log() division there's a correct result, thus
I assumed correct rounding is at least possible - but maybe it's just
correct display and incorrect internal representation? (Which would at
least leave a bad taste. - If "collapsing" an expression to a variable
I'd understand that binary representation problems could lead to such an
effect, but in expressions - that's what I experienced 35 years ago when
I was disassembling a scientific calculator - there's the possibility to
achieve a better rounding behaviour.) My suspicion (without having any
evidence) went along the lines that maybe either the int() function may
internally be missing some "correct"-rounding call, while other functions
do that, or that some "collapsing" of a sub-expression to an inherently
inaccurate stored [temporary] variable destroys the existing accuracy
information that's necessary to show (like ksh does) the correct result.
> That is, I don't think the error is in the int() function, but in the logs
> and the division. The vagaries of AWK and how it converts numbers for
> printing probably account for the result (without int()) being displayed as
> 4 even though it is actually just shy of 4. Again, this is all conjecture,
> but it is based on the next observation (results):
>
> # Here, we call the log10 function directly, and we get better results...
> # Note: This particular version of 'gawk' has 'call_any' compiled in.
> $ gawk '{print int(call_any("dd","log10",$1))+1}'
Assuming "call_any()" directly calls the (C-)library functions I'd be
interested in how that result would look like if there's a function
composition similary to the awk sample program (based on the log()
division) would be invoked. (I assume it's only possible with variable
assignments of intermediate results, thus bearing the danger to lead
to the same flawy result.) - Note: I'd also use log10() of course, but
use the division just to better understand where the issue stems from.
> 1000
> 4
> 999
> 3
> 1001
> 4
> 999.99
> 3
> ^C
> $
>
> This suggests that adding log10 to the core gawk language would be a good
> idea. In the meantime, of course, you might want to write an extension
> lib...
>
> Anyway, I don't know if any of this helps you or not, but what else can I say?
Thanks for the thorough reply. Actually for my case I went the way to let
the shell evaluate the expression and pass the value as variable to awk,
since (from the awk's program's perspective) it's a constant. I usually
also do no heavy FP calculation with awk (just percentages and such), so
this is no pressing issue for me. I just wanted to bring that question to
the public's attention; because of the discrepancy (if compared to ksh)
there *might* be a subtle bug.
>
> P.S. I also wonder if using the new GMP/MPFR stuff might give more
> interesting results. I don't know, because I have to admit that I don't
> really understand it (the GMP/MPFR stuff).
Well, I've used it once where I operated on large numbers, but I'd not
consider MPFR to be the appropriate answer in case it's a rounding issue.
Janis