Vet this! Proposal for language change to implement destructuring and aggregating assignment

140 views

Skip to first unread message

Sam Hughes

unread,

Sep 13, 2021, 11:08:20 AM9/13/21

to golang-nuts

Go is very intent on not implicitly casting types. This makes the expression of a destructuring assignment difficult to imagine in Go. The following represents a proposal to express both destructuring assignment (x,y := ZtoXandY(Z)) and aggregating assignment (x := XfromYandZ(y, z)), across arbitrary but assignable constituent types, with a syntax that has no compatibility risk, and which feels idiomatic to Go.

The following is based on the Go2 Language change proposal guidelines, found on the golang github.

I am an intermediate Golang developer, with a few years under my belt, but in a mixed-languages environment.
I claim proficiency in C/++, Javascript/Typescript, Rust, and....*ahem*python*ahem*-What'd I say? oh, nothing, nothing. Passing familiarity with BASIC, PHP, ColdFusion, and Java.
This change will likely have little net impact on difficulty. It adds a new syntactic form, but allows convenience patterns that more experienced devs would have baked in, or which might be precluded by external constraints, without requiring any refactoring.
Similar concepts have been proposed, but for specific use-cases. [1], [2]. This proposal describes a syntax that accidentally serves both earlier use-cases.
The proposal specifically considers API implementation having external contracts allowing for type-conflation, but also modeling and graphics libraries.
The proposed change is as follows:
- The proposal guidelines suggest clarifying what specific changes would be needed, the impact this would have on the Go language spec, and then an informal description of the language change. Starting with an explanation of the utility of the proposal:
  - Given that any allocated type can be described by the pair of a raw pointer and length, and may then be interpreted validly or invalidly by any type which has the same value domain and size, the following should hold:
    - These statements are well-defined, correlate well to literal instructions, and should be legal:
      - zero_th, one_th, two_th := [3]int{10,20,30}
      - zero_th, one_th, two_th := struct{One,Two,Three int}{10,20,30}
      - converted := [3]int{10,20,30}.{struct{One,Two,Three int}}
      - converted := struct{One,Two,Three int}{10,20,30}.{[3]int}
      - partial, two_th := [3]int{10,20,30}.{[2]int, int}
      - partial, two_th := struct{One,Two,Three int}.{struct{One,Two int}, int}
      - partial, _ := [3]int{10,20,30}.{[2]int, int}
      - _, two_th := struct{One,Two,Three int}.{struct{One,Two int}, int}
    - Conversely, such a syntactic feature should render the following statements illegal, as they include declared variables, rendering the assignments as implicit type conversions.
      - var zero_th, one_th, two_th uint = [3]int{10,20,30}
      - var zero_th, one_th, two_th *int = [3]int{10,20,30}
    - If that is dealt with by providing an explicit cast with memory compatible types, the following should be legal:
      - var zero_th, one_th, two_th uint = [3]int{10,20,30}.{[3]uint}
      - var zero_th, one_th, two_th uint = [3]int{10,20,30}.{uint, uint, uint}
    - Because the class of statement represents a reinterpretation of that memory, constrained by type-memory compatibility, the following also should be legal:
      - var zero_th, one_th, two_th *int; *zero_th, *one_th, *two_th = [3]int{10,20,30}
    - A point I expect to be controversial would be the following related statement:
      - var r *struct{X,Y int}; *r = struct{X,Y,Z int}.{struct{X,Y int}, int}
    - More controversial still would be the following, for reasons I trust are obvious:
      - var r *struct{X,Y int}; *r = struct{x,y,z int}.{struct{X,Y int}, int}
  - An important class of statements that I believe would be immensely convenient, but which I expect to be controversial is in a situation as follows. Given the following definitions:
    - type Point struct{X, Y float64}
    - Box struct{X,Y,W,H float64}
    - Circle struct{XY,R float64}
    - func Move(p *Point)...
    - boxes := []*Box{&Box{1.0, 1.0, 1.0, 1.0}, &Box{0.0, 1.0, 1.0, 1.0}}
    - circles := []*Circle{&Circle{1.0,1.0, 1.0}, &Circle{0.0,1.0, 1.0}}
  - I propose the following syntax be legal:
    - var p *Point; for _, box := range boxes {; *p, _ = box.{Point, Point}; Move(p);}
    - var p *Point; for _, circle := range circles {; *p, _ = circle.{Point, float64}; Move(p);}
  - While clearly interfaces are an existing solution for a problem such as above, the following points apply:
    - it is most ironically an XY class problem, standing in for API implement where the structure is at some level ruled by external contracts, and where that contract may require a large number of different but strongly related types.
    - Without capability to render structured types into component structured types, one can structure types as structs with the component types embedded, but implementing that pattern for new logic in an existing environment can require significant refactoring. Providing capability to re-render the memory backing a type, in a safe, type-checked manner, offers a reduction in the amount of code needed to implement new features.
  - Above, only multiple-assignment has been exampled. Single-assignment from multiple value should also be permitted, such as:
    - p := Point{1.0,0.5}; var b Box; b.{Point, Point} := p, Point{}
  - Examples of situations where appropriate handling is less clear:
    - Should it be possible to reinterpret slices as slices of other elements? I suggested handling arrays of primitives as being interpretable as primitives and as fields of a struct, but when the array element type is structured data, and even more so when the element of a slice is structured data, it's not clear that there is a natural way to express that while maintaining rough equivalence with the underlying instructions a language element is rendered as. For instance, syntax such as:
      - x, y, r := circles.{[]float64, []float64, []float64}
    - Would require significant compiler gymnastics to render. Supporting interpretation of only the options of either the entire type-qualified memory, e.g. the entire array, to the last byte, or else each unit individually, ensures that new compiler machinery or any special exceptions are not needed to interpret the symbols being processed.
    - Would casting to a new memory-width result be permitted in some way? E.G. A pointer to an int is the same width as a pointer to a uint8, and the meaning is clear, provided the referenced value fits in a uint8, but a byte-for-byte reading of a struct with field of *int isn't thrown off by interpreting the pointer as a pointer to castable type. A reinterpretation rendering an int as a uint8 is clearly inappropriate, and breaks alignment. A possible solution would be to add an additional construct:
      - x,y,z := circles[0].{float64 as float32. float64 as float32. float64 as float32}
    - It is not clear to me what level of enforcement sits behind the exported-field/unexported-field language feature, but it strikes me as potentially problematic if exportedness is not respected. I'm not clear on whether that would mean that any unexported fields are skipped, rendering the type that much "skinnier" for importability, or if fields assigned from using destructuring would be rendered as a zero-value, and be unchanged when assigned-to. Both options seem very problematic, and it seems more sane to not respect exportedness. This could actually be the most difficult single issue with this proposal.
- The impact this would have on the Go language spec is actually fairly minor. Given that it uses no new symbols, and introduces only a single new expression syntax. The new expression syntax is:
  - For an identifier or call expression x, and for N type expressions, the following statement can be used to reinterpret the memory representation of the value of x:
    - x.{T, ...N}
  - for x as a call expression:
    - if x returns a single primitive value, the type list may only include one type expression which is one of:
      - a primitive type, which is convertible to by the type of x.
      - a single-field struct, having a field that is either a simple value of a type which assignable by x, or a length-1 array of a type which is assignable by x.
      - a length-1 array of a type which is convertible by x, or a struct that fits the above description
    - if x returns more than 1 primitive value, the above applies between each item, matched ordinally, between the list of values returned by x and the list of types in the type list.
    - if x returns any number of complex values:
      - each value can be reinterpreted, but any value treated so must be entirely described, whether it is fully utilized in assignment or not.
      - no combination of values on the same side of an assignment operator can be made. A reinterpretation cannot span multiple elements of a returned collection, a reinterpretation cannot span multiple return values, and if a field in a single returned value is a compound type, it cannot be treated as if contiguous with the fields around it. It must either match a field entirely in the reinterpreting type, be selected as an independent value in the the reinterpretation type list, or else that field can be itself subject to reinterpretation as a distinct set of 1 or more types in the type list.
      - Each field or element interpreted must be convertible to the interpreting type by casting.
    - if x is an identifier, the above describes the use of x subject of assignment, but x may also be used as target of assignment. Where appropriate, the rules are inverted, such as if a field in x is a complex type, it cannot be treated as contiguous with the rest of x, and that field must be uniquely subject to one or more values on the right-hand side of the assignment operator.
  - By way of informal description, Go's type system boils down to just a combination of some type data and a big old lump of memory that follows it. With simple, primitive values, you can easily address the entirety of that lump of memory. Compound data types can be quite a bit larger, leaving a lot of room for subtle differences and variations.
    
    If you have a large number of subtly different datatypes, generally, you end up reproducing a lot of logic in associated methods and in function calls that use that data.
    
    This proposal is to provide a way to mask over the parts that are the same, and hide the parts that are different, and then you can define logic specifically for just those sections of fields that each of those related types share.
    
    There are already tools like interfaces that allow you to reduce that repeated-logic load, but even with interfaces, you still frequently end up writing the same code repeatedly. Even if you shift that logic into associated types, you still have to write the glue code that allows your types each to satisfy those interfaces, and any getters/setters you need for behind the interface barrier.
    
    You can also do much of what I'm describing in this proposal today by using raw pointers from the unsafe package, but this is explicitly not covered by any safety guarantees, and limitations on expressions involving type identifiers can still render this a special sort of gymnastics.
    
    This proposal is a suggestion for a syntax to describe instances of compound data types in a way that would allow such trait-centric logic, without requiring as much glue-code as with interfaces, and while working with the type system, to avoid potentially writing code that won't be supported tomorrow.
- This change is entirely backwards compatible. This involves a symbol combination (".{") which is nowhere else valid, and is thus not at risk of presenting an ambiguous API.
- Per suggestion of example code before and after the change, in the proposal section, I gave a number of examples. In most cases, the change simply provides improved expressiveness, but it also introduces distinctly new capabilities, as in the case of taking a reference to a struct representing a subset of the fields from instances of one or more other types, and acting on that struct reference as a proxy for the described. This does resemble interaction as with interfaces, but promises to be considerably more svelte in code-weight.
- Tooling impact of this change isn't something I've spent time considering, but I am confident that it represents impact to vet and to gofmt.
  - Otherwise, I don't expect much change. It is a focused change, using existing symbols but in a combination not previously valid, and will not have impacts beyond the level of expression resolution. As such, go vet would need to be enhanced to validate a new kind of expression, and gofmt would need to be enhanced to properly adjust expressions bearing this new syntax.
  - This proposal was designed with symbol-resolution as an important factor, and was actually inspired by going over instruction representation of compiled code. As such, I'm fairly confident that the changes would be unobservable past IR resolution, and the runtime cost would be non-existent. I'm intentionally not arguing for performance benefits, but I'm much more inclined to expect improved performance from code using the syntax proposed, over interfaces or raw pointers.
- I prepared a rudimentary example using raw pointers, to demonstrate the concept in a few of my suggested cases. I noted the possible issue of resolving unexported fields, and I am taking no steps to resolve that. I'm also not taking steps validate convertibility, and I am intending this purely for demonstration.
  
  Turns out that when you're hastilly scribling at 4AM, you make mistakes. I'll fix it later, but right now it's not passing. Also, inconveniently, I wrote it using generics, meaning you'll need to compile locally using the compiler param `-gcflags=-G=3` to run. As of 2021-09-14, expect to see the structs only partially copied.
  - https://play.golang.org/p/mDp04vPCRvt
- Impact on language spec was described as part of the "Proposed change" section. It is stylistically similar to type annotation, but is a distinct syntactic construct, and is not at risk for ambiguity with any other features.
- The orthogonality topic has been addressed, but in short this bears similarity in role to the following features:
  - Interfaces: where interfaces represent generic representation over method subset, this proposal represents generic representation over field-subset at specific offset. Where interfaces are designed around broad applicability of implementations, this feature proposal represents very specific implementation of behavior which would be nearly identical across "implementing" types, for very specific functionality on a range of subtly different types.
  - generics: where generics offer an API by which general behaviors, templating, and collection can be written to apply to many possible types, this represents an API by which a very specific type which may exist as a subset of the fields of a range of types can be implemented once for that specific type, and rather than being general across that range of types, this API offers mechanism to reduce the variety of types to a single, distinct, concrete type, which can be subject to clearly defined behaviors.
- The final three bullets are for performance goals, error handling, and generics. I have explicitly avoided arguing from prospective performance, though I expect some very positive implications.
- Error handling is not covered at all under this proposal, and is entirely outside of this scope.
- This proposal is comparable to generics, being motivated by cases of behavior shared across subtly different structured data types, but it is distinctly focused on clearly defined behavior which can apply to a range of well defined types, in stark contrast to both the generics feature and the well-established interfaces functionality.

Reply all

Reply to author

Forward

0 new messages