[Proposal] Fix Enum to return enumerables instead of lists

53 views
Skip to first unread message

Mihai Potra

unread,
Oct 16, 2023, 3:23:02 AM10/16/23
to elixir-lang-core
The Enum module currently in many of its defined functions (i.e. reject/2) relies on lists and forces Enumerable implementations to operate and returns lists.

One example of this is the Enum.reject/2 function that operates with Enumerable.reduce/3 but then it pipes everything into :lists.reverse/1. This forces all Enumerable implementations to return lists so that Enum.reject/2 always returns a list.

Ideally, Enum module would work with any Enumerable implementation and its function return the Enumerable data type that the implementation provides. Having Enum conform to operating with Enumerables this way would allow less friction in working with enumerables and better support for implementing Enumerable/Collectable concepts with structs and data types.

As an example, removing a key or key-value from a map would be a simple Enum.reject call instead of piping the map into the Enum.reject/2 and then into a list to map conversion.
I've stumbled upon this while writing a library that would operate with Source and Sinks as enumerables and collectables to allow for all sorts of predefined types and custom defined structs, and got blocked at abstracting away some of its core features.

The good news is that finding a solutions might not be that complicated and I'd be happy to make a PR for it (already tried replacing :lists.reverse/1 with a different function that respects the data type returned by Enumerable.reduce/3)

Challenges
The biggest challenge here is obviously backwards compatibility with code that relies on the Enum functions that return a list to keep returning a list, so here's where I'd want a discussion and guidance on the best course of action. As far as I can see, there's three options:

1. Have a new module written out of `Enum` (say Enum2) and deprecate Enum in the future - a better module name than just Enum2 is definitely a must (looking for suggestions). I suspect this would also open the door to simplifying the Stream module.

2. Extend the Enumerable protocol to include a `reverse/1` function, and a `new/1` function to replace the empty list constructor `[]` used by Enum. Need to consider existing user code that implements Enumerable as it would break without defining these two new functions.

3. Create a new protocol that Enum would use, with fallback to current behaviour. Best for supporting current user implementations of Enumerable and keeping current behaviour, but doesn't make sense on the long term.

Also as a last decision to be made is how to deal with Map. Ideally, Enum functions would also return a Map if the enumerable is a map, but many rely on the current behaviour of it returning lists because this has been the status quo. This makes me favour option 1 of having a new module to replace Enum in the future.

Looking forward to hearing your thoughts and suggestions/proposals for how to best take on this.

Kind regards,
Mihai

Kurtis Rainbolt-Greene

unread,
Oct 16, 2023, 3:36:42 AM10/16/23
to elixir-l...@googlegroups.com
I believe there's actually a name for this pattern and it's "endofunctor": https://www.quora.com/What-is-an-endofunctor

ramda (and my own unction) adheres to this philosophy and it works very well.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/9d0f6096-43aa-441b-9060-1a4949ff8e67n%40googlegroups.com.


--
Kurtis Rainbolt-Greene,
Software Developer & Founder of Difference Engineers

José Valim

unread,
Oct 16, 2023, 3:37:01 AM10/16/23
to elixir-l...@googlegroups.com
Enum always returns lists by design, because it was designed to be a single abstraction that can:

1. enumerate elements while skipping
2. support both eager and lazy with the same protocol
3. support halting (such as take)
4. support zipping
5. does not cause dangling resources (so you can work with files)

Many resources and operations above do not work with the concept of returning the same data structure. For example, how do I return the same data-structure for Enum.map(1..10, &:rand.uniform/1)? Or for an io stream? Or for a repo stream?

Not even something as simple as: Enum.map(map, fn {_, v} -> v * 2 end)? So when exactly Enum.map would return a map data structure?

And, as you said, this would be majorly backwards incompatible.

What you want to achieve needs to be done by a new protocol and a new API. So I recommend exploring it within the context of your library. :)

Marten Wijnja

unread,
Oct 17, 2023, 7:09:11 AM10/17/23
to elixir-l...@googlegroups.com

To give some more context to 'exploring a new protocol + API in the context in a library':

As Kurtis Rainbolt-Greene also mentioned: (A datatype supporting this) structure-preserving transformation operation is known as an `(Endo)functor` (and in some other places as a `Mappable`).

`Enumerable`, on the other hand, implements the transformation operation known in other languages as `Foldable` (or, in more maths-heavy contexts, a `Catamorphism`). This transformation is inherently not structure-preserving.
And while that means the structure is lost, it also allows it to be implemented by a much wider range of commonly-used datatypes (such as maps, sets, file handles, etc.) that cannot implement a structure-preserving mapping.

It also means that `Enum`/`Stream` play more nicely with `for` expressions, which are built around the same abstraction.

Not everything always turns the result into a list; that is what `Collectable` is for (which is known in some other languages as `Semigroup`/`Monoid`). Many `Enum`functions and also `for` allow you to specify a `Collectable` directly. But for those who do not, there is no issue with first turning the result into an intermediate list and turning that into a `Collectable` later at the end of a pipeline  But it is also always fine to first create a list and only turn it into the desired `Collectable` later using `Enum.into`. This is always possible (because of what lists are, mathematically speaking,) and also usually reasonably efficient.

And for some (but not all) collections, structure is not really lost when converting to a list and back. (Such as maps (when viewed as key-value pairs) and sets).


Does that mean that `Functor`s are never used/never useful in Elixir? No. There are definite use-cases, though not really in Elixir's standard library.
Currently we see them used in two ways in the ecosystem:
1) Certain datastructures (ex: trees, graphs, tensors) really desire a way to transform their elements while preserving their structure. Elixir libraries implementing them often define a specialized `map` or `transform` function.
2) Libraries exist which define a generalized `Functor` protocol. The most well-known is probably Witchcraft. But for many projects a dependency like this might be overkill.


I hope this information can help you to make an informed decision on how to design your Sources/Sinks library. There is quite a bit of prior art out there which you can look at to get started :-)

~Marten/Qqwy

Reply all
Reply to author
Forward
0 new messages