For a benchmark paper, I would specify part of this more formally, by
giving some typings that a solution should satisfy. As I mentioned in
my post of about a week ago, I propose using CIC's types to state
these typings, since that is a rich and well-known type system. (And
using CIC's types need not commit us to CIC's term language.) Another
advantage of using CIC is that one can at least type check
specifications in Coq.
trm : (list nat) -> Set.
trm_eval : (trm nil) -> (trm nil).
is_abs : forall(a:nat)(l l':list nat), trm (cons a l) -> trm l' -> bool.
returns_abs : forall(e1 e2:trm nil), (trm_eval e1) = e2 -> exists
a:nat, exists e2a:trm (cons a nil), is_abs a nil nil e2a e2 = true.
Here I am avoiding committing, in the specification of the problem, to
a particular datatype representing terms. Instead, the is_abs
function takes in a bound variable "a" and terms t1 and t2, and should
return true or false depending on whether or not t2 is equal to a
lambda abstraction of "a" over t1.
So, trm_eval's type expresses the first desired (internal) property
mentioned above, and return_abs expresses the second (external) one.
Of course, the fact that this is really a call-by-value interpreter is
not formally expressed. But it is to be expected that part of the
specification of the problem will remain informal.
Questions we might raise about this way of specifying the problem
include: should the problem specification allow datatype declarations,
or should those be avoided as I have done here? Also, if we deal with
general recursion, should we include a predicate (not found in CIC)
for divergence? If we had such, we could specify that the application
is strict in both arguments, for example, which could be an
interesting addition to the problem. Of course, there are also other,
perhaps more elegant ways to specify the internal property, which
would not require keeping a list of the free variables (perhaps just
an upper bound on them).
Looking forward to your feedback on this problem or the general idea
of a special issue on benchmark problems,
Aaron