Dart Language and Library Newsletter (2017-10-13)

54 views
Skip to first unread message

Florian Loitsch

unread,
Oct 13, 2017, 5:25:52 PM10/13/17
to General Dart Discussion


Dart Language and Library Newsletter

Welcome to the Dart Language and Library Newsletter.

Follow Up

Evaluation Order

Last week we discussed the evaluation order of method calls, like o.foo(bar()). As some readers correctly pointed out, the section was partially misleading and sometimes based on wrong assumptions. Thanks for the feedback.

We claimed that there was an exception to the left-to-right evaluation in the spec. That's misleading: the is spec is careful (and it uses a lot of otherwise unnecessary machinery) to say that in e.foo(e1, ..., eN)e.foo is not an expression that is evaluated. As such, it is meaningless to talk about the "order of evaluation" of method calls, when there are no expressions that are evaluated.

Even after our change - evaluating getters first when they are used as call-targets - it is not meaningful to talk about the evaluation order of expressions, since o.foo still isn't an expression.

We also claimed that the expected behavior was to evaluate from left to right. We missed the fact that some method-based languages only look up the method before doing the call. C# and Java, both evaluate the arguments before reporting nullerrors. In contrast, JavaScript and C++ (at least for virtual functions in gcc) report null errors before evaluating the arguments to the call.

JavaScript actually changed their behavior in ECMAScript 5.0 (see es5). The change was that GetValue(ref) in step 2 is now evaluated before evaluating the arguments. The reason for this change was for similar reasons: initially, ECMAScript did not have getters, but with getters it made sense to evaluate the GetValue(ref) first, so that the evaluation went from left to right.

Fundamentally, the question boils down to: Is o.foo(bar()) the same as (o.foo)(bar())?

With the proposed change it will be for most programs. There are some issues with noSuchMethod when a function is invoked with the wrong number of arguments, but this corner-case is only relevant in broken programs.

Did You Know?

double.toString

Dart has two number types: integers and floating-point numbers. For the latter, Dart uses the IEEE double-precision floating-point type. There are some interesting alternatives, like unums, but these are not directly supported by our CPUs and are not yet universally available.

In this section I will focus on Dart's toString methods of double. There are 4 methods:

  • toString
  • toStringAsExponential
  • toStringAsFixed
  • toStringAsPrecision

The double.toString method is the most commonly used way of converting a double to a string. It uses a mixture of toStringAsExponential and toStringAsFixed to provide the most visibly appealing output. For large numbers (more than 20 digits) it emits an exponential representation as in 1.3e+21. The same is true for very small floating-point numbers, strictly smaller than the one represented by 0.000001. For example 0.000000998 is printed as 0.98e-7. All other numbers are printed in the "normal" decimal representation: 5.01.23, or 0.001.

The other methods just force a specific representation. I encourage readers to experiment with them, but I won't explain them in this section.

This is, however, only the easy part of printing a floating-point number. The much harder part is to compute the individual digits. Converting floating-point numbers to strings is surprisingly hard. Not only do floating-point numbers have multiple possible string representations, computing them requires some thought.

In general, the minimal requirement we want is that the output of toString can be parsed back into a floating-point number with double.parse and yields the same number. This is called the internal identity requirement. For a long time, C libraries were known to not even satisfy this requirement. Lack of communication (no Internet) was probably a big reason why it took the libraries so long.

The first known publication of a correct algorithm was alread in 1984. In J. T. Coonen's there is a correct (and very efficient) algorithm for correct conversions. That thesis (or at least the algorithm in it) went largely unnoticed...

It took until 1990 when G. L. Steele Jr. and J. L. White published their printing-algorithm (called "Dragon") at a big conference (PLDI) that C libraries and languages finally started to do the "right" thing. (Note that a draft of the paper existed for a long time and was even mentioned in "Knuth, Volume II" in 1981).

There are two reasons why converting floating-point numbers to strings is difficult:

  1. Loss of precision doing operations.
  2. Multiple possible representations.

Loss of Precision

The term "floating-point" refers to the fact that the decimal point can "float". It's helpful to look at examples in decimal representation. Imagine a floating-point number type that allows for 5 decimal digits, and 2 exponent digits. This would allow for numbers in the range 00001e-99 to 99999e99 (and, of course, 0 itself). This covers a really big range, but the precision is often pretty poor. The best approximation of the number 123456789 would be 12346e04. Let's see what happens, if we add 1 to that number. The correct result would be 123460001. Since that number doesn't fit, we need round again, and end up with 12346e04 again. This means that the 1 was lost due to imprecision. Readers can try the same experiment in IEEE double floating point by writing 1e20 + 1. The 1 is simply lost.

For a long time algorithms used doubles themselves to compute the digits of another double. However, this can lead to imprecision, because every operation on the input introduces rounding errors.

The correct way to compute the digits of a double is to use a number type that has a higher precision than the input. There are algorithms that manage to compute the correct output with extended precision floating-point numbers (aka "long doubles"). Other algorithms use 64 bit integers (which is more precise than the 53 bit precision of doubles).

While a slightly bigger data type is enough to compute the correct result, it doesn't always yield the optimal string representation. There are multiple representations for every floating-point number and some require more work than others.

Multiple Possible Representations

Floating-point numbers have a precise value, but, for a specific configuration, are spaced out, so that two consecutive numbers always have a gap between them. Getting back to our decimal floating point numbers, we can see that the two consecutive numbers 12345e42 and 12346e42 are at a distance of 1e42 from each other. For simplicity we think of floating-point numbers as representing intervals. Any real value x that lies closer to one floating-point number f than to the neighboring numbers is part of the interval represented by f. One of the qualities of floating-point numbers is that the gap between two consecutive numbers (and thus the interval) is dependent on the value of the number: the smaller the number, the smaller the gap.

For a string-conversion this means that any output that lies within the interval satisfies the internal identity requirement. Programming languages then have to decide which properties they value when a user asks for the conversion.

C libraries tend to provide the most precise representation, but let users decide how many of those digits they want.

Languages that were designed after 1990 generally tend to prefer representations that use the shortest amount of precision digits. This includes Java, C#, ECMAScript, and Dart.

Let's look at an example: 0.3.

The most precise representation of this number is: 0.299999999999999988897769753748434595763683319091796875.

In C users can now ask for a specific amount of digits (6 by default):

#include <stdio.h>

int main() {
  printf("%f\n", 0.3);  // => 0.300000
  printf("%.54f\n", 0.3);  // => 0.299999999999999988897769753748434595763683319091796875
}

Modern C libraries make it possible to print floating-point numbers with any precision, but they don't tell users where they can stop. Modern (C11) libraries provide macros (like DBL_DECIMAL_DIG) to give users a way to emit at least enough digits to be able to read them in again, but in many cases (like 0.3) there are shorter valid representations, and there is no way in C to just emit the shortest valid representation.

In Dart, 0.3.toString() simply returns "0.3". That doesn't mean that Dart (like ECMAScript) will always return the shortest representation. For example 0.00001 is converted to "0.00001" and not "1e-5" which would be shorter. However, the number of significant digits is chosen to be minimal.

While the algorithm for finding these digits efficiently is non-trivial, the idea is straightforward: given that the floating-point number represents an interval, one just needs to look through every possible string representation in that interval and look for the one that is:

  1. the shortest possible, and
  2. the closest possible if there is more than one possible choice.

This sounds complicated, but is actually very simple for humans. Let's take a few examples:

double d = 0.3.
Lower boundary: 0.2999999999999999611421941381195210851728916168212890625
Exact value:    0.299999999999999988897769753748434595763683319091796875
Upper boundary: 0.3000000000000000166533453693773481063544750213623046875

double d = 0.1234567
Lower boundary: 0.123466999999999986481480362954243901185691356658935546875
Exact value:    0.1234669999999999934203742668614722788333892822265625
Upper boundary: 0.123467000000000000359268170768700656481087207794189453125

Given the lower and upper boundary for these numbers, humans have no trouble finding the shortest that lies inside the interval. Once the leading digits differ, we can discard all remaining digits. The reason the algorithms are complicated is due to corner cases (like when the shortest representation lies exactly on the boundary), and because of the desire to use data types that are small and efficient.

Language and Library Design

As part of the language and library team we frequently get requests like "can we not just simply add feature X" to the language or libraries. In this section I will explain why even simple requests often require much more thought and why they often take so much time.

One of the fundamental qualities of a language and library is a consistent feel. This requires that library and language design is done holistically. Features that work in other languages might need a different design to fit into Dart, or should be provided through completely different means. I will provide some examples in the remainder of this section. I want to emphasize that these examples are chosen because they show how many features require to think about many other unrelated and often more complex features. They are not necessarily planned, and some might have already been rejected.

Let's start with the simple request that some classes should have a way to serialize themselves to JSON.

There are immediately several questions:

  1. Should this work for all classes or just some selected ones? Dart could provide a new value type that is serializable and let class types stay non-serializable.
  2. Should it be limited to JSON or would the feature make it possible to serialize to other formats, like Google's protobuffs.
  3. Does the caller know which class gets serialized (the static shape), or is the instance dynamic with many possible concrete types. Very often, the type of the object that needs to be encoded is fully known, and the callsite could do the serialization without any help.
  4. Can a user provide a serialization scheme for classes they can't modify?
  5. Is the amount of work small?
  6. Can the same instance be encoded differently for different RPC calls?
  7. Is it cheap?

In some sense, Dart already supports one way of serializing to JSON: the toJson method. For instances that the JSON encoder doesn't recognize, it simply invokes toJson to make the instance encode itself in a way that is suitable for JSON.

Looking at the questions above, we have:

  1. Yes. Works for all classes.
  2. No. Limited to JSON.
  3. No. The caller doesn't need to know. JSON.encode(randomObject) works as long as the randomObject has a toJson.
  4. Yes/No. The converter has a toEncodable parameter, but that's an additional hook and not toJson.
  5. No. There is too much boilerplate code.
  6. No. The same instance is always encoded the way the toJson specifies.
  7. Yes. Adding a simple method doesn't cost significantly in output size and in speed.

Dart features another way of serializing instances: mirrors. Using reflection libraries can inspect any object. This makes it trivial to write a library that looks at each instance, gets its fields and emits them. The responses to the questions are now completely different:

  1. Yes. Works for all classes.
  2. Yes. Works for every possible serialization scheme.
  3. No. The caller can encode any dynamic object.
  4. Yes/No. Every class is serializable. If the serialization needs to be adapted, it's up to the library to provide additional hooks.
  5. Yes. The users of the library don't need to do any work. The work is for the authors of the serialization library.
  6. Yes/No. The same instance can only have different serializations if there are hooks in the library.
  7. No. The current reflection is too expensive to be usable in compiled Dart programs (dart2js or AoT).

We could imagine a better version of mirrors, let's call it dart:reflect that is more suitable for compilation. If it is similar to the reflect package it would require annotations on a class to make it reflectable (and thus serializable). Depending on the concrete implementation this could solve the performance problem, but might come with the limitation that external users wouldn't be able to provide serializations for classes they didn't write. A new dart:reflect library would, in any case, be a major undertaking.

An alternative would be to provide a new "class" type that automatically provides serialization methods. For example, we could add a value keyword that takes the place of class:

value Point {
  int x;
  int y;
}

This value classes would not need any toJson and (at least in the example) would also come automatically with a constructor. We could even add a hashCode and == implementation that uses the fields.

Once we have such a class, we need to think about value types. That is, should these instances have an identity or not? Should we specify these value classes in such a way that it would be possible to pass them on the stack?

If yes, would these classes have mutable or non-mutable fields? If they were mutable, then their semantic would differ from normal classes:

var p = new Point(1, 2);
var p2 = p;  // Value type -> copy.
p2.x = 3;
print(p); // 1, 2
print(p2); // 3, 2

Without mutable fields the behavior for value types and non-value types would only be visible when using identical.

Similarly, value types have lots of questions with respect to Object (are they Objects or not?), and how lists of them should be implemented in the VM.

When talking about value types we should also think about tuples. Tuples are (for this document) just a collection of values with different types. For example a Tuple<int, String> would be a class with two fields (let's just call them fst and snd) of types int and String. It would make sense to make tuples value types.

One could even imagine that value is just a layer on top of tuples. That is, that every value class is just a different view of a similarly typed tuple.

value Point {
  int x;
  int y;

  bool get isOrigin => x == 0 && y == 0;
}

value Vector {
  int x;
  int y;

  int get length => sqrt(x*x + y*y);
}

var p = new Point(3, 4);
print(p.isOrigin);  // false.
Tuple<int, int> t = p;  // It's the same thing.
Vector v = t;  // Still the same thing.
v.print(v.length);  // 5.

This approach would reduce the output code size since many value types could be mapped to the same tuple. However, they would make our serialization go out of the window. The serialization function would only be able to see that the receive object was a tuple, but not which one.

String serialize(Object o) {
  if (o is Tuple) { /* now what? */ }
  ...
}

That doesn't mean that the value/tuple approach is doomed. We could provide static descriptions for classes that users could pass to the serialization function. Every value class could have a static field shape that returns the shape of the value type (or even for every class):

var p = new Point(3, 4);
var shape = Point.shape;
serialize(p, shape);

String serialize(Object o, Shape shape) {
  if (o is int) print(o);
  if (o is String) ...;
 
  // A non-trivial object:
  print("{");
  for (field in shape.fields) {
    print("${field.name}: ${serialize(field.value(o), field.shape)},");
  }
  print("}");
}

This would require that every call to serialize knew the static type of the object that should be serialized. It wouldn't work if the serialization needed to do dynamic is-checks. This sounds like a big restriction, but in practice, serialization often happens on concrete types, or the type of the object can be determined without is checks.

In some sense this static value type would also be very similar to extension methods. They would almost be extension classes. This means that we should look at extension methods too, and make sure that they would fit.

The exploration doesn't really stop here, but let's have a look at another, completely different alternative for serialization: macros.

If Dart had macros, then users could just write the value type themselves.

define-macro value ID { CLASS_BODY };

Ast value(Identifier id, ClassBody body) { ... }

This is, admittedly, just a very hypothetical macro system, but it would immediately remove the need for a value type that provides the toJson method. In fact, a good macro system makes many language features unnecessary, since they can just be implemented on top of macros.

The disadvantage of macros is that they can get complicated to get right (even Scheme has two variants) and that they are almost too powerful. They would definitely be a major undertaking for Dart.

This section is getting long, so I won't cover other alternatives, but I hope I showed how language features can have dependencies that overlap. For each of the related features we would now explore other alternatives and see if they would conflict with a specific direction.

As mentioned in the beginning of this section, we value a consistent feel of the language. This means that these explorations are very important to make sure that things fit together. This is also the reason that features that work well in other languages might not be a great fit for Dart. We need to evaluate each feature and see how it behaves with the current and the future planned features of Dart.

Reply all
Reply to author
Forward
0 new messages