[Proposal] strict binary parts split API

48 views

Skip to first unread message

christ...@gmail.com

unread,

May 2, 2017, 12:50:31 AM5/2/17

to elixir-lang-core

Hey all!

Binary split functions (Regex and String) to return a list from the underlying :binary.split/3 (global) strategies, and support a specific parts: n option to limit the depth of the split:

https://hexdocs.pm/elixir/Regex.html#split/3

https://hexdocs.pm/elixir/String.html#split/3

Additionally, Regex, String, and Path have default split patterns pre-programmed:

https://hexdocs.pm/elixir/Regex.html#split/1

https://hexdocs.pm/elixir/String.html#split/1

https://hexdocs.pm/elixir/Path.html#split/1

All of these return lists instead, even the parts variants. The parts variants return lists guaranteed to be at least as long as, but not as long as, the part count requested.

What I'm wishing for right now is a more assertive series of binary splitting functions, in-between pure strict-length binary pattern matching and fast-and-loose list results: I'd like a function that guarantees the full amount of parts will be returned or nothing more.

My current use-case is splitting apart a binary format that embeds null-byte-delimited headers where a lack of the precise number of headers indicates a corrupted data block. Re-asserting the returned list has the requested length feels like an unnecessary handshake with the split-parts API. I make this proposal with the dim recollection of wanting this several times before, though.

If I can garnish some support from core, I'd like to propose a &split!/3 API for Regex and String.

These functions would have a split!(binary, pattern, parts) signature that raises if the requested parts cannot be generated.

To help enforce the parts-length requirement, they could be returned as tuples instead, similar to the rest of the two-tuple split functions mentioned at the end of this post.

There are two reasons why I can conceive they might merit a place in the stdlib:

This is a common-enough desire that people amongst core can empathize with the desire for such functions.
This can be easily optimized beyond a dumb split/3, list length validation, &List.to_tuple/1 implementation.

However I'm not confident in either two points so I thought I'd flight this here before investigating a PR.

An additional capability would be to support a split!(binary, parts) implementation for Regex and String that leverages the underlying &split/1 pattern default present in both modules.

Also, if it proves to make sense, Path.split!/{2,3} could be a part of such a feature. However, I doubt the underlying :filename.split/1 call would respond to the optimizations in point 2 I propose, should they exist, although I imagine those variants could be implemented by hand fairly efficiently.

(Key-value split functions have two-tuple results, but they don't apply to this discussion.)

https://hexdocs.pm/elixir/Dict.html#split/2

https://hexdocs.pm/elixir/HashDict.html#split/2

https://hexdocs.pm/elixir/Keyword.html#split/2

https://hexdocs.pm/elixir/Map.html#split/2

(Enumerable splitting has two-tuple results but with very simpler intentions than common binary splitting strategies. However converging around a Enum.split(enumerable, item, parts: n) version might find a place there, too alongside a Enum.split!/{2,3} implementation. But lacking a default split strategy, it would have to omit a default all-parts strategy of Enum.split(enumerable, item) to avoid clashing with the existing Enum.split(enumerable, count).)

https://hexdocs.pm/elixir/Enum.html#split/2

christ...@gmail.com

unread,

May 2, 2017, 12:54:29 AM5/2/17

to elixir-lang-core

To clarify, my issue is that I want something more assertive than the current String.split/3, that lets me skip the boilerplate of:

string = "abcd"

parts = 3

result = String.split(string, "e", parts: 3)

if length(result) == parts do

{:ok, List.to_tuple(result)}

else

{:error, :i_need_more_parts}

end

I omitted this in my original post by accident.

Reply all

Reply to author

Forward

0 new messages