Thank you Richard, and thank you Michael for your replies.
I am afraid, I don't understand your answer, Richard. I probably didn't put my question very well. Please, allow me to rephrase it and to point out, why the solution may be of general interest.
Let me repeat some classic terminology (e.g. from Automata Theory, Languages, and Computation, by Hopcroft & Ullman).
I suppose, ECMAScript is a Context-free Language, i.e. we can define its entire syntax by means of production rules or syntax rules of the form
NonTerminalSymbol -> Some_term_made_of_terminal_and_nonterminal_symbols
For example,
IfStatement -> 'if' + '(' + Expression + ')' + Statement
| 'if' + '(' + Expression + ')' + Statement + 'else' + 'Statement'
When Esprima has parsed such an `IfStatement`, it abstracts away the notations and returns the actual content like so:
["IfStatement", Expression, Statement] // without `else` clause
["IfStatement", Expression, Statement_1, Statement_2] // with `else` clause
only that the result is not an array, but a plain object:
{
type: "IfStatement",
test: Expression,
consequent: Statement_1,
alternate: Statement_2 | null
}
More general, a **context-free grammar** is a quadruple G = (V,T,P,S), where
* `V` is the set of terminal symbols; these are lexically analysed as tokens by Esprima
* `T` is the set of nonterminal symbols; these are given as interfaces in The ESTree Spec and keys in `require('esprima').Syntax` ('AssignmentExpression', ..., 'YieldExpression').
* `P` is the set of production rules; this is essentially the ECMAScript standard
* `S` is the start symbol, one of the nonterminals; in case of Esprima, this is 'Program'.
The **language** of `G` is the set of all strings, that can be generated from `S` by applying the production rules until all nonterminal symbols are gone.
A **parser** for that language takes the other way, i.e. it takes a string and produces the tree that describes how that string is generated from `S`. Or it issues an error message, in case the string is not in the given language.
Now, with Esprima, we do have such a parser for ECMAScript. And that is great.
But in this general setup, the parser always parses a `S` phrase. Esprima always tries to parse a `Program`.
What I would like it to do, is parsing any kind of phrase, defined by a nonterminal symbol, say `IfStatement`.
I would like to be able to call the parser like so
esprima.parse (code, {}, 'IfStatement')
so that it does not try to parse a `Program` phrase, but an `IfStatement`, and that it issues an error, when `code` is not a well-formed if-statement.
And Michael, I understand from your reply that you suppose this is very hard to solve.
But I am not sure, really. I think to remember from my studies, that in fact, most algorithms and strategies are implemented that way, i.e. they have implicit parsers for every nonterminal symbol. But it is just not made explicit in the theory, because in general, there is only the question for the language/programs, not its sub-phrases. I didn't study the source code of Esprima. But if I am right, this additional functionality might not be so hard to provide, at all.
But why should this be interesting, at all? Why could it be valuable to add this option to Esprima?
Traditionally in JavaScript, we deal with text input from users and we often need to verify that input. For example, we need to make sure, that a certain string is a valid email address.
But when JavaScript is really trying to evolve and catch up with the possibilities of other higher-order languages, JavaScript/ECMAScript code itself becomes data. ECMAScript is already a functional language in the sense that functions are "first-class" values. But ECMAScript is a scripting language and there is a huge potential to embrace that fact and allow code to be data, as well!
By now, we only have `eval(code)` to convert code into values. But "eval is evil", it is too dangerous to allow any input code. Up to now, there is no built-in way to analyze the code, so that it would accept only certain kinds of expressions. Well, there is `JSON.parse`, which is a big step in that direction.
But I would like to filter and safely convert code of any kind: `String`, `Literal`, `Function`, `Expression`, etc. If Esprima would have an option to filter certain kind of code, it would immediately provide a verification for these data and input. And that would be really cool. ;-)
Cheers,
Thomas