Proposal: Change Float.to_string defaults

146 views
Skip to first unread message

Ed W

unread,
Jun 17, 2015, 6:30:02 AM6/17/15
to elixir-l...@googlegroups.com
Hi, I just accidentally implemented the classic paper on printing
floating point decimals accurately before discovering there is already
an implementation in Erlang:
https://github.com/ewildgoose/elixir-float_pp

Oh well. However, I notice that it's not directly accessible in the
Elixir stdlib libraries and in fact String.Chars doesn't use use
Float.to_string, instead using the more correct erlang "fwrite_g/1" call

I would like to propose that we behave more like Erlang and:

- A straight call to Float.to_string/1 will instead call:
":io_lib_format.fwrite_g(thing)"

I can prepare a patch if required, but just scribbling without testing
we get:

@spec to_string(float) :: String.t
def to_string(float) do
:io_lib_format.fwrite_g(float)
end

There is perhaps an argument also for making Float.to_string(float, [])
also do the same, but perhaps its worth leaving the current behaviour as
is..?



The benefit of this change is much saner handling of serialisation and
float output. At present, without digging into the Erlang libraries
(which I didn't even know about and wasted a bunch of time
re-implementing...) we get output like this:

iex(1)> Float.to_string(1.2)
"1.19999999999999995559e+00"

Thanks for considering this

Ed W

Eric Meadows-Jönsson

unread,
Jun 17, 2015, 6:37:11 AM6/17/15
to elixir-l...@googlegroups.com
The Kernel.to_string and Float.to_string functions have different goals and use cases. Float.to_string is for serialization, a more accurate way of printing a floating point number than Kernel.to_string. If you want pretty printing you can use Kernel.to_string as you already found.

This behaves like erlang (although we chose different function names and locations for the functions); erlang:float_to_string has the same output as Float.to_string.



Ed W

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/55814C28.7020405%40wildgooses.com.
For more options, visit https://groups.google.com/d/optout.



--
Eric Meadows-Jönsson

Ed W

unread,
Jun 17, 2015, 7:56:01 AM6/17/15
to elixir-l...@googlegroups.com
On 17/06/2015 11:37, Eric Meadows-Jönsson wrote:
> The Kernel.to_string and Float.to_string functions have different
> goals and use cases. Float.to_string is for serialization, a more
> accurate way of printing a floating point number than
> Kernel.to_string. If you want pretty printing you can use
> Kernel.to_string as you already found.
>
> This behaves like erlang (although we chose different function names
> and locations for the functions); erlang:float_to_string has the same
> output as Float.to_string.

Hi, I hadn't twigged about the existence of Kernel.to_string - thanks.
Perhaps it would be nice if it were mentioned in the Float.to_string docs?

However, the point is that Float.to_string is horrible (and NOT more
accurate). I can (nearly) see the point when you want a specified
precision and understand the implications, but I feel quite strongly
that the "no option" version of Float.to_string should be an alias to
Kernel.to_string.

Updated proposal:
- set Float.to_string/1 to be an alias to Kernel.to_string


I'm sensing further justification is needed here? If you look at the
evolution of this on the Erlang side you can see a frustrated email to
fix float output some years back. I can't find the subsequent
correspondence, but it seems as though a partial fix is implemented
based on mochiweb code in that we now have *two* float output functions:
a) proper/correct output through 'fwrite_g' but without control over
precision or format.
b) nasty (sometimes) incorrect output but with control over output
format and precision

In my opinion what is desirable is for the existing Float.to_string to
be completely removed and new output functions to be produced using the
fwrite_g digits (but re-implementing control over format/precision). I
would think this is arguably most desirable to push into OTP stdlib,
however, if there were appetite then I already published my
re-implementation of the original paper in Elixir and all it requires is
final output formatting in order to be a complete replacement for the
existing Float.to_string/2 ? I don't plan to finish this unless there
is interest to push this to Elixir core though?

Please, take the time to read the original paper on this stuff before
responding (There are a number of good followup papers to this one as well):
http://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf

The issue is somewhat subtle and non obvious and I think some might
understand the issue back to front:

Essentially the core issue is that IEEE arithmetic stores "intervals",
not numbers. Many naive to_string implementations will instead output
one of the boundaries or centre of the range. Without much disagreement,
a better solution is to pick the shortest number representation (in the
output base) which falls into the given range. Another definition of
success is that we can round trip the conversion (caveat we stay within
the limits of double precision IEEE)

So for example, given the input number 1.2
- I can round trip this to string and back to float using "the correct
algorithm", ie Kernel.to_string
- I get all kinds of trouble if I use Float.to_string


Further reasons that the current erlang output is garbage is that we
only have around 15 significant figures of accuracy with IEEE doubles,
but erlang defaults to trying to output 20 decimal places... Even worse
the output control only lets us specify decimal places, so attempting to
get the output correct would seem to require several passes of doing a
trial conversion, parsing the string, then re-requesting the conversion
with updated decimal places... The algorithm used by fwrite_g completely
avoids all this trouble and has provably optimal output (it just lacks
control over format and precision)

Thanks for listening

Ed W

Eric Meadows-Jönsson

unread,
Jun 17, 2015, 8:15:12 AM6/17/15
to elixir-l...@googlegroups.com
Sorry for misunderstand the first time around. I agree that Float.to_string should change to use a more correct printing algorithm but we should keep the current options to keep API compatibility.



Ed W

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Eric Meadows-Jönsson

José Valim

unread,
Jun 17, 2015, 8:22:37 AM6/17/15
to elixir-l...@googlegroups.com
However, the point is that Float.to_string is horrible (and NOT more accurate).  I can (nearly) see the point when you want a specified precision and understand the implications, but I feel quite strongly that the "no option" version of Float.to_string should be an alias to Kernel.to_string.

That's a very good point. If it is not more accurate plus it is inconsistent with Kernel.to_string/1 and harder to "read" then, indeed, I see little point in having it.

I am +1 for replacing Float.to_string/1 to use the same as Kernel.to_string and it seems Eric as well.

Even worse the output control only lets us specify decimal places, so attempting to get the output correct would seem to require several passes of doing a trial conversion, parsing the string, then re-requesting the conversion with updated decimal places... The algorithm used by fwrite_g completely avoids all this trouble and has provably optimal output (it just lacks control over format and precision)

I may be misunderstanding this part but if you want some particular precision in the most compact format, you could use Float.to_string/2 with compact and decimals options, no?

iex> Float.to_string 7.1, decimals: 7, compact: true
"7.1"

PS: watch out for comments like "please read the paper before commenting". I am aware it can be harder to discuss matters when others are not as equally informed as you, but setting such barriers hinders the discussion and will hide important feedback from developers that do not have the same background as you (which are equally important). :)


Ed W

unread,
Jun 17, 2015, 8:56:19 AM6/17/15
to elixir-l...@googlegroups.com
On 17/06/2015 13:22, José Valim wrote:
However, the point is that Float.to_string is horrible (and NOT more accurate).  I can (nearly) see the point when you want a specified precision and understand the implications, but I feel quite strongly that the "no option" version of Float.to_string should be an alias to Kernel.to_string.

That's a very good point. If it is not more accurate plus it is inconsistent with Kernel.to_string/1 and harder to "read" then, indeed, I see little point in having it.

I am +1 for replacing Float.to_string/1 to use the same as Kernel.to_string and it seems Eric as well.

Thanks Eric and Jose for taking the time to answer.  I will offer to extend my implementation here to be a complete replacement for Float.to_string, including the extra formatting options for Float.to_string/2 ?

The reason I hesitate is that this is duplicating functionality in OTP:
    https://github.com/erlang/otp/blob/maint/lib/stdlib/src/io_lib_format.erl#L369

I don't feel familiar enough with Erlang to compose an Erlang solution and push upstream, but I can effectively extend the Erlang solution in Elixir if someone might assist with getting it upstream?

Apart from some duplication of code, I see no problems with implementing an Elixir float pretty printer in stdlib.  It would also potentially allow us to offer arbitrary rounding functions and anything else anyone wants to help with serialisation...?

See below for more info on sizing this work:



Even worse the output control only lets us specify decimal places, so attempting to get the output correct would seem to require several passes of doing a trial conversion, parsing the string, then re-requesting the conversion with updated decimal places... The algorithm used by fwrite_g completely avoids all this trouble and has provably optimal output (it just lacks control over format and precision)

I may be misunderstanding this part but if you want some particular precision in the most compact format, you could use Float.to_string/2 with compact and decimals options, no?

iex> Float.to_string 7.1, decimals: 7, compact: true
"7.1"


The issue is that we specify *decimals*. However, the IEEE format basically stores a number as around 15 significant digits plus a scale (not really, but something like that).  (Decimals are the numbers after the dot, significant figures are the digits starting from the first non zero on the left. They become relevant with very large or very small numbers)

So the referenced "pretty print" paper gives you an algorithm digits():
    digits(float) -> {decimal_position, array_of_digits}

So for example if you ask for digits(123000.0), or digits(0.000123) then you get back the array [1,2,3], plus the location of the decimal point.  Where we are today is how to take those digits and display them to the user.  The current erlang imlementation (and my Elixir re-implementation) is simply to add extra zeros to whichever end and pop the decimal point in the correct spot, ie you get a fixed output with the "correct" number of decimals for complete representation of the IEEE float.

Next step: Offering scientific notation is fairly easy because you just put the decimal point in position 2 and then put "eN" on the end where N is basically the decimal place.  Simples!

Final problem, the existing API offers an option to choose *decimal places* in our output.  This requires some implementation of rounding (as opposed to straight truncation).  eg consider the two examples above being displayed with "decimals:2).  We get

    123000.00
              0.00

But if you ask for
    PP.to_string 1.5000001, decimals: 2
    PP.to_string 1.4999999, decimals: 2

You might expect a "1.50".

What about:
        PP.to_string 1.505, decimals: 2
        PP.to_string 1.515, decimals: 2
        PP.to_string 1.525, decimals: 2

There are several options here.  Personally I would prefer "round to even" in the case of tiebreak. Its something like a dither function on the output, so on average your error averages out with repeated calculations.  BUT, erlang's Kernel.round implements "round away from zero", so there is an argument for consistency... (See my other email on elixir-lang-talk on problems with Float.round).  The wikipedia article here has some thoughts on rounding options:
    https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules



I'm struggling to communicate these points, so I'm not sure if the above will turn out to answer your question?  In case not, let me try a second example, please ignore this if you already get the above?!

- The issue is how to specify "decimals: n"

a) If my input number is 1234567890.1, then I only have about 15 sig figures to play with (due to the IEEE storage format), so using say "decimals: 7" will give some garbage output in the final decimal positions

    iex> Float.to_string(1234567890.1, [decimals: 7])
    "1234567890.0999999"

or worse:
    iex> Float.to_string(1234567890.1, [decimals: 17])
    "1234567890.09999990463256835"

b) However, If my input number is 0.12345678901 then we are ok with "decimals: 7" because we are well within our 15 sig digits.

So in general I will need to somehow be aware of how many decimals the output number requires in order to choose an appropriate value for decimals: in order not to print garbage output


BUT: this is solved using the pretty printer algorithm BECAUSE it only outputs real digits that contribute to the solution, so if we ever ask for more decimals than exists in our input, they will have to be supplied by appending zeros (rather than garbage)




PS: watch out for comments like "please read the paper before commenting". I am aware it can be harder to discuss matters when others are not as equally informed as you, but setting such barriers hinders the discussion and will hide important feedback from developers that do not have the same background as you (which are equally important). :)

Apologies, no offence intended.  The conversation seemed to be heading off course and I was repeatedly failing to correctly explain my point.  Reference to the source material seemed helpful to avoid an extended argument about the wrong subject

I realise I'm not communicating these points effectively.  Its not deliberate though.. Doing my best!

Ed W

José Valim

unread,
Jun 17, 2015, 9:39:01 AM6/17/15
to elixir-l...@googlegroups.com
The reason I hesitate is that this is duplicating functionality in OTP:
    https://github.com/erlang/otp/blob/maint/lib/stdlib/src/io_lib_format.erl#L369
 
Yeah, I am not sure they would accept either because io_lib_format is meant to be used by io_lib:format/2 only and unless we expose it there, it wouldn't serve much purpose.

However, that poses the question: if their implementation is private, shouldn't we be moving to our own implementation anyway? It would be pity to drive the implementations apart though...

- The issue is how to specify "decimals: n"

a) If my input number is 1234567890.1, then I only have about 15 sig figures to play with (due to the IEEE storage format), so using say "decimals: 7" will give some garbage output in the final decimal positions

    iex> Float.to_string(1234567890.1, [decimals: 7])
    "1234567890.0999999"

or worse:
    iex> Float.to_string(1234567890.1, [decimals: 17])
    "1234567890.09999990463256835"

b) However, If my input number is 0.12345678901 then we are ok with "decimals: 7" because we are well within our 15 sig digits.

Perfect explanation, thank you.

BUT: this is solved using the pretty printer algorithm BECAUSE it only outputs real digits that contribute to the solution, so if we ever ask for more decimals than exists in our input, they will have to be supplied by appending zeros (rather than garbage)

Could we solve this by asking users to round it and then call Kernel.to_string/1? (or Float.to_string/1 in the future when it works expected). If this is a viable solution, it would be preferable because we could still rely on Erlang pretty printer and we would automatically follow the same rounding rules everywhere.
 
Apologies, no offence intended.  The conversation seemed to be heading off course and I was repeatedly failing to correctly explain my point.  Reference to the source material seemed helpful to avoid an extended argument about the wrong subject

No problem! Referencing the source material definitely helps and you are doing a great job on answering everyone's questions and concerns.

Ed W

unread,
Jun 17, 2015, 11:47:16 AM6/17/15
to elixir-l...@googlegroups.com
On 17/06/2015 14:38, José Valim wrote:
The reason I hesitate is that this is duplicating functionality in OTP:
    https://github.com/erlang/otp/blob/maint/lib/stdlib/src/io_lib_format.erl#L369
 
Yeah, I am not sure they would accept either because io_lib_format is meant to be used by io_lib:format/2 only and unless we expose it there, it wouldn't serve much purpose.

However, that poses the question: if their implementation is private, shouldn't we be moving to our own implementation anyway? It would be pity to drive the implementations apart though...

OK, I think that confirms we should drive for our own implementation?

Regarding the history of this, my limited googling on "erlang mochinum float bob ippollito" turns up 2008 and 2009 threads complaining about the core issue, then somewhere between those dates it presumably gets slipped into Erlang OTP core (which I missed completely).  However, it *looks* as though the change was just to put in a quick fix to cover the core complaint (default output format), but in my *opinion* it still doesn't resolve the case of wanting to do printing formatted to a certain number of decimal places

As such I think there is an argument to rebuild the whole float print functionality and then perhaps we will be left with something the Erlang guys can take back as a whole and upstream?



or worse:
    iex> Float.to_string(1234567890.1, [decimals: 17])
    "1234567890.09999990463256835"

BUT: this is solved using the pretty printer algorithm BECAUSE it only outputs real digits that contribute to the solution, so if we ever ask for more decimals than exists in our input, they will have to be supplied by appending zeros (rather than garbage)

Could we solve this by asking users to round it and then call Kernel.to_string/1? (or Float.to_string/1 in the future when it works expected). If this is a viable solution, it would be preferable because we could still rely on Erlang pretty printer and we would automatically follow the same rounding rules everywhere.

I don't think so.  The issue is:
- You have 15 *significant figures* of accuracy (ish)
- You want to round to N *decimal places*
- Relating significant figures to decimal places is basically the whole problem we are trying to solve!  (eg consider

However, another idea is that I looked quickly at Eric's neat looking decimal library:
    https://github.com/ericmj/decimal/

He seems to have some really nice arbitrary rounding functions in there... I wonder if we could borrow those and potentially offer some extremely flexible options for string output? 

Thinking aloud, but I personally think more people should be encouraged to use a dedicated decimal library for purposes they accidentally are getting away with using a float for... I wonder if there is an opportunity to moot at least part of the decimal library being a candidate for inclusion as a core datatype?  The motivation being that it's a datatype we should be pushing many users towards... Perhaps getting off track..?



Anyway, to recap, current status of my code is:
- Fixed decimal output seems to work ok
- Support for scientific output needs minor work (you would also get a "compact:true" option for scientific format at no extra cost!)
- Rounding needs to be implemented (and first specified).  I would like to hear Eric's view on rounding here, eg making it configurable and if so where to put the rounding library?  If we want a decision then sticking with the current fixed option of "round away from zero" seems consistent, although I will nail my colours to the mast and say I prefer "round to even".

Eric, could you comment please?

Ed W

Reply all
Reply to author
Forward
0 new messages