Here is a trivial example.
Suppose we write a validator for this language.
expr -> expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| number
| ident
| '(' expr ')'
ident := [a-zA-Z][a-zA-Z0-9]+
number := [-/+]?[0-9]+
If we validate the string to conform to this language, then it loks
like "a + 3 / 4" and whatnot.
We reject strings that don't conform.
Then we can safely do this---almost!
snprintf(big_buffer, .... "echo $(( %s ))", str);
/* check for truncation */
FILE *pipe = popen(big_buffer, "r");
We have defined a safe arithmetic language that we can use the shell to
execute. It won't clobber anything in our host environment.
However, it provides unfettered access to environment variables.
Suppose that the environment has a sensitive, integer-valued environment
variable SECRET_ENV_VAR. The untrusted user can supply that expression
and thereby learn the value of that variable.
Thus, suppose we take this idea further and define a more useful
language than just a calculator language. We have to guard against
leaking secrets from the environment.
One way would be namespacing. The variables in our language like ABC
or def would not translate into the same-named shell variables, but
into, say, sb_ABC and sb_def ("sb" == sandbox).
We could allow that language to have some environment manipulation.
For that we would provide some API. Only certain environment variables
would be loaded into sandboxed variables. For instance if we consider
TERM to be safe, we could pre-load sb_TERM with the value of TERM.
Likewise, we would have a carefully controlled "export" feature, which
only allows certain variables.
If ABC is an export-allowed variable, then the statement
"export ABC=42" in the sandboxed scripting language would
translate to "sb_ABC=42; export ABC=$sb_ABC". I.e. set the local
variable, and then also export the corresponding environment variable
which really has to be called ABC.
Our compiler would gather a list of all variables referenced by the
program, and then for that subset of those variables which are
"environment-allowed", it would emit an initial code block like:
sb_FOO=$FOO ; sb_BAR=$BAR ; ...
# BAZ is not on the whitelist so doesn't appear above
to fetch the value of all referenced whitelisted values from the
environment. Thus the language could access the env var $FOO and $BAR,
but $BAZ would appear uninitialized even if there is such an environment
variable.