term_to_binary/2 with atom cache and/or pid_info/1

20 views
Skip to first unread message

Loïc Hoguin

unread,
Mar 22, 2021, 7:08:25 AM3/22/21
to erlang-q...@erlang.org
Hello,

Currently the Erlang Term Format has two variants:

* the full featured format that includes different forms of atom caches

* the simpler term_to_binary/1 format that does not

This is not a satisfying state of affairs: sometimes we want to use
term_to_binary/1 for protocols or when exchanging data, but the lack
of atom cache can result in us sending a lot of 'undefined' atoms in
string form.

=> Should term_to_binary/1 allow setting up an atom cache?
Perhaps the cache could be maintained as a map to be encoded
separately by the user. This could also allow predefining
the most common atoms that could then never be sent (for
example #{undefined => 1, true => 2, false => 3}). Whatever
the interface we should reuse as much of the distribution
header atom cache code as possible.

An alternative would be to build our own format loosely based on the
Erlang Term Format. But in that scenario we end up lacking at least
the pid_info/1 and ref_info/1 functions that would allow us to encode
a pid/reference without having to use either term_to_binary/1 or
{pid,ref}_to_list/1. On the other side the pid/reference can be
recomposed via a pid_from_info/1 or ref_from_info/1 type of function.

These functions can be useful to have regardless of the answer to the
first question above. For example pid_info/1 is used in Mnesia here:

https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270

And also in RabbitMQ here, as well as pid_from_info/1:

https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl

I've also been writing similar code when experimenting with custom
distribution drivers.

=> Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
This is the strongest case as there's code in the wild
already doing this.

=> Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?

It's possible that ports and funs may benefit as well, but I have
a hard time figuring out when we would want to use a port that
way, and funs I believe that we already have everything we need
as long as they're not anonymous funs.

Cheers,

--
Loïc Hoguin

Rickard Green

unread,
Mar 23, 2021, 10:46:43 AM3/23/21
to Loïc Hoguin, erlang-q...@erlang.org
It is a bit unfortunate that the "creation" value of the node part is so well hidden since the full identifier of a node is its nodename together with its creation. It would have been nice if the node/1 BIF had returned '{Nodename, Creation}' instead of just 'Nodename', but that is too late to change now. Perhaps a nid/1 BIF?

Currently pids, ports and references are the datatypes that contain node identifiers which also are the types the node/0 BIF can handle.

I think it is reasonable with functionality for creation of such data types from full information, so that alternative protocols wont have to go via the external term format.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB

Rickard Green

unread,
Mar 23, 2021, 10:49:08 AM3/23/21
to Rickard Green, erlang-q...@erlang.org
should have been: "the types the node/1 BIF can handle"
Reply all
Reply to author
Forward
0 new messages