Ignoring point 1 for now, and focusing on 2 ...
I've been thinking about the ways that DTrace D scripts use program
information; specifically how D scripts use data from a user
application, made available to it via DTrace probes, in D script
actions.
For example, it is common to correlate events across lots of calls
using a bit of program data. Concretely, a D script may observe a set
of system calls, but to make the observations more useful, it may
need to only record information for a specific file descriptor. This
is very easy to do with DTrace because function and system call
parameters are available to the D script, and those values can be
*copied* into D script variables for use in later actions. It is very
common to see a D variable being compared with a user-level function
parameter as part of a guard to a D script action. In this concrete
example, the actions only trigger if the file descriptor matches the
one being observed.
D script supports any types available in C. It doesn't support some
Erlang datatypes, for example lists, ports, PIDs.
Initial proposal: convert Erlang-unique datatypes to C-style '\0'
terminated strings (which are supported in D), using Erlang's
formatting functions.
Rationale: DTrace supports C-strings, so comparison will work, the
Erlang values are available to D scripts in a consistent format, so
filtering, correlation, etc all work fine. Importantly D associative
arrays work with strings, so the same value, in different variables
will match.
Concerns: imposes runtime overhead, which is best avoided if DTrace
is to be 'cheap' to use.
Other benefits: It should be easy to debug.
Alternative proposal: use references (pointers) for Erlang-unique
datatypes.
Benefits: likely to impose a minimal time and space overhead, which
is consistent with DTrace.
Concerns: Erlang garbage collection may 'move' pointers (i.e. move
the values pointed to), so that correlation etc. breaks. It may be
hard to debug as GC may be non-deterministic.
Mitigation action: find out if a pointer to a value *can* change,
because if it can't it may provide optimal performance.
Best of all worlds: where practical provide both reference based
interfaces, and string based interfaces, so that D scripts can 'have
it all'.
Does this make sense?
Garry