As I discussed with José on ElixirconfEU after my talk on the parser, I would like to add an option to the tokenizer to be able to still being able to parse Elixir code without creating atoms.
The solution would be to have a callback function as an option to the tokenizer (exposed through `Code.string_to_quoted/2`), that would get called in the event of the tokenizer encountering an unexisting atom. Instead of raising, a callback function could be called with the token and the tokenizer metadata. The callback function returns the data structure that would be put in the AST instead of the atom. For instance, like in my talk, an 'atom marker' {:":", "atomname"}.
The callback gets 4 arguments:
- atom name (string)
- file
- line
- column
The default behaviour of this function could be to raise an error, like the existing_atoms_only: true option does now. So two questions before I implement this:
1) How should we call this option? I was thinking of either one of these:
- nonexisting_atom_callback:
- on_nonexisting_atom:
2) Should the callback function option be only applicable when existing_atoms_only: true is passed in as well? Or would it be like this:
Code.string_to_quoted(string_w_new_atoms) → normal behaviuor, creates atoms
Code.string_to_quoted(string_w_new_atoms, existing_atoms_only: true) → raises ("builtin" callback behaviour)
Code.string_to_quoted(string_w_new_atoms, nonexisting_atom_callback: &mycallback/4) → gets called
in the last case, existing_atoms_only: true is implicit.
However that creates a confusing situation when combining existing_atoms_only: false + nonexisting_atom_callback . So I prefer to be explicit:
Code.string_to_quoted(string_w_new_atoms, nonexisting_atom_callback: &mycallback/4) → raises RuntimeError
Code.string_to_quoted(string_w_new_atoms, existing_atoms_only: true, nonexisting_atom_callback: &mycallback/4) → mycallback gets called
Arjan