[Proposal] String.chunk_by/2

35 views
Skip to first unread message

jonar...@gmail.com

unread,
Dec 1, 2020, 12:37:20 PM12/1/20
to elixir-lang-core

This is a generalization of the existing (and oddly specific) String.chunk/2 function that takes a string and a single-argument predicate function, returning a list of strings.

e.g.
String.chunk_by(" foo   bar ", &(&1 =~ ~r/\w/))
# => [" ", "foo", "   ", "bar", " "]

The above example makes problems such as "wrap text at <line limit> preserving whitespace" or "truncate string to x number of words while preserving whitespace" simpler.

The actual change would be pretty small and consist of renaming `chunk` to `chunk_by`, removing a line, and re-defining `chunk` as a slightly specialized call to `chunk_by`

shanes...@gmail.com

unread,
Dec 2, 2020, 1:52:12 PM12/2/20
to elixir-lang-core
Are there use-cases that you see for this feature that don't fall under String.split/3 with a regex argument?

jonar...@gmail.com

unread,
Dec 2, 2020, 2:55:20 PM12/2/20
to elixir-lang-core
Good question! The implementation of String.split/3 is such that matches of the splitting pattern are discarded. For instance:

iex(1)> String.split("hello world", ~r/\s/)
["hello", "world"]
iex(2)> String.split("hello\tworld", ~r/\s/)
["hello", "world"]
iex(3)> String.split("hello\nworld", ~r/\s/)
["hello", "world"]

The output doesn't include the whitespace the string was split on, and thus a problem such as "truncate string to x number of words/characters while preserving whitespace" cannot be easily solved this way.
Reply all
Reply to author
Forward
0 new messages