Proposal: atom creation callback in tokenizer

75 views
Skip to first unread message

Arjan Scherpenisse

unread,
Apr 10, 2019, 7:50:33 AM4/10/19
to elixir-lang-core
Hello all,

As I discussed with José on ElixirconfEU after my talk on the parser, I would like to add an option to the tokenizer to be able to still being able to parse Elixir code without creating atoms.

The solution would be to have a callback function as an option to the tokenizer (exposed through `Code.string_to_quoted/2`), that would get called in the event of the tokenizer encountering an unexisting atom. Instead of raising, a callback function could be called with the token and the tokenizer metadata. The callback function returns the data structure that would be put in the AST instead of the atom. For instance, like in my talk, an 'atom marker' {:":", "atomname"}.

The callback gets 4 arguments:
- atom name (string)
- file
- line
- column

The default behaviour of this function could be to raise an error, like the existing_atoms_only: true option does now. So two questions before I implement this:

1) How should we call this option? I was thinking of either one of these:
- nonexisting_atom_callback: 
- on_nonexisting_atom:

2) Should the callback function option be only applicable when existing_atoms_only: true is passed in as well? Or would it be like this:

Code.string_to_quoted(string_w_new_atoms) → normal behaviuor, creates atoms
Code.string_to_quoted(string_w_new_atoms, existing_atoms_only: true) → raises ("builtin" callback behaviour)
Code.string_to_quoted(string_w_new_atoms, nonexisting_atom_callback: &mycallback/4) → gets called

in the last case, existing_atoms_only: true is implicit.
However that creates a confusing situation when combining existing_atoms_only: false + nonexisting_atom_callback . So I prefer to be explicit:

Code.string_to_quoted(string_w_new_atoms, nonexisting_atom_callback: &mycallback/4) → raises RuntimeError 

Code.string_to_quoted(string_w_new_atoms, existing_atoms_only: true, nonexisting_atom_callback: &mycallback/4) → mycallback gets called


Arjan


Allen Madsen

unread,
Apr 10, 2019, 9:00:30 AM4/10/19
to elixir-l...@googlegroups.com
For those of us who haven't seen your talk, what's the reason for wanting to add this?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/ac2699c7-af26-4b0d-9734-c385aa59aef2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arjan Scherpenisse

unread,
Apr 10, 2019, 1:48:31 PM4/10/19
to elixir-lang-core
Yes, good one, sorry I didn't clarify.

One of the points of my talk was that I'm using the parser to parse user-written scripts, that look like elixir but are actually interpreted in a different way.
And currently there is no way to safely parse Elixir-ish code which contains unknown identifiers.

So in my talk I worked around that problem by escaping all such atoms in a predefined way, but after the talk José suggested using a callback function.

By the way, the slides are here:

video hopefully up soon.

Arjan




To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

José Valim

unread,
Apr 11, 2019, 6:09:36 AM4/11/19
to elixir-l...@googlegroups.com
HI Arjan,

I have some notes after looking a bit further into the tokenizer/parser.

1. There are two places where we call the binary_to_atom functions: in the tokenizer and in the parser. For constructs like :"foo#{bar()}baz", we don't convert to atom at tokenizer time but at runtime. So my suggestion is to call this new option :static_atoms_encoder. If it is not set, it will fallback to the existing_atoms_only behaviour. It is also important to document that static_atoms_encoder won't be invoked for operators, syntax keywords (fn, etc), and for interpolated atoms (which, as explained above, are runtime based).

2. The :static_atoms_encoder expects a fun with two arguments. The function will receive the atom name (as a binary) and a keyword list with the line and column. It should return {:ok, token :: term} | {:error, reason :: binary}.

3. Your current replacement token {:":", atom_as_binary} can be confusing. [":": "foo"] will have AST of [{:":", "foo"}] with the option disabled and [:foo] will have the same AST when enabled. My suggestion to avoid ambiguities is to make it a tuple with two elements but the first element should either be a PID or a REF.

4. We should also include a complete reference of where static atoms appear. From the top of my head: aliases, remote calls, local calls, var names, atoms and keyword lists.
 

Arjan Scherpenisse

unread,
Apr 11, 2019, 8:46:02 AM4/11/19
to elixir-lang-core
Thanks José, I think these are good suggestions. The 2-tuple notation is indeed confusing, I didn't put much thought into it. Good that with the :static_atom_encoder option this choice is up to the end user anyway. I'll be working on this, expect a PR somewhere next week.

Arjan

Rich Morin

unread,
May 18, 2019, 3:40:16 PM5/18/19
to elixir-l...@googlegroups.com
On occasion, I want to use multiple import statements for the same module, eg:

# import Common, only: [ ii: 2 ] # uncomment for debugging
import Common, only: [ str_list: 1 ]

By way of explanation, Common.ii/2 is a tracing routine which is mostly used
for debugging. So, I want it to be available when I need it, but not generate
compiler warnings when it's not being used. For details, see:

https://elixirforum.com/t/enabling-access-to-occasionally-used-eg-tracing-functions/22517

Unfortunately, when I tried the approach above, I found that only the last import
took effect; the first one was silently ignored (!). Could import be extended to
allow multiple imports of the same module?

-r

José Valim

unread,
May 18, 2019, 4:25:05 PM5/18/19
to elixir-l...@googlegroups.com
Hi Rich!

To clarify, the current behaviour is documented.

It is also worth noting that passing an empty :only is the only mechanism for you to fully unimport a module:

    # It was imported it, but we don't want it
    import SomeModule, only: []

Therefore if we change it to what you propose, the above no longer works. There are also some advantages for not "merging" imports, as proposed, as merging means one would need to look up at many difference places to find what is actually imported.

In any case, regardless if you agree or disagree with the current rules, changing its behaviour would be a strong backwards incompatible change.

José Valim
Skype: jv.ptec
Founder and Director of R&D


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/45C899A6-9830-422E-8968-AE9E50D06B49%40gmail.com.

Rich Morin

unread,
May 18, 2019, 5:04:00 PM5/18/19
to elixir-l...@googlegroups.com
Thanks for the clarifications. I certainly don't want to introduce a
"strong backwards incompatible change", but that may not be required.
The basic problem, IMHO, is that there's no convenient and trouble-free
way to do an optional import.

Given that import/2 already has the :except and :only keys, how about
adding a :maybe key, to be used as follows:

import Common,
maybe: [ ii: 2 ],
only: [ str_list: 1 ]

The :maybe functions would be made available, but wouldn't generate a
warning message if they aren't used.

-r

Rich Morin

unread,
May 18, 2019, 5:12:31 PM5/18/19
to elixir-l...@googlegroups.com
> On May 18, 2019, at 13:24, José Valim <jose....@plataformatec.com.br> wrote:
>
> There are also some advantages for not "merging" imports, as proposed, as merging means one would need to look up at many difference places to find what is actually imported.

I think that ship has already sailed. For example, would you assert:

There are also some advantages for not "merging" clauses for a function definition,
as merging means one would need to look up at many difference places to find what
is actually defined."

However, your point is taken; having multiple sources for information can cause problems.

-r


Rich Morin

unread,
May 19, 2019, 1:23:43 PM5/19/19
to elixir-l...@googlegroups.com
On ElixirForum, LostKobrakai pointed out that:

> You can also disable warnings on the callers side:
> https://hexdocs.pm/elixir/Kernel.SpecialForms.html#import/2-warnings 4

So, my proposed syntax:

import Common,
maybe: [ ii: 2 ],
only: [ str_list: 1 ]

could be replaced by:

import Common,
only: [ ii: 2, str_list: 1 ],
warn: false

However, this is less explicit and precise, because it doesn't specify
_which_ functions should be checked for usage.

-r

José Valim

unread,
May 19, 2019, 1:57:18 PM5/19/19
to elixir-l...@googlegroups.com
Yup, I thought about warn or building something on top of it, but nothing came out of it (yet).



José Valim
Skype: jv.ptec
Founder and Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

Rich Morin

unread,
Feb 15, 2020, 7:59:00 PM2/15/20
to elixir-lang-core
This issue is still annoying me; any chance it could be addressed?

> On May 18, 2019, at 14:03, Rich Morin <r.d....@gmail.com> wrote:
>
> ... The basic problem, IMHO, is that there's no convenient and trouble-free
> way to do an optional import.
>
> Given that import/2 already has the :except and :only keys, how about
> adding a :maybe key, to be used as follows:
>
> import Common,
> maybe: [ ii: 2 ],
> only: [ str_list: 1 ]
>
> The :maybe functions would be made available, but wouldn't generate a
> warning message if they aren't used.



> On May 19, 2019, at 10:57, José Valim <jose....@plataformatec.com.br> wrote:
>
> Yup, I thought about warn or building something on top of it, but nothing came out of it (yet).

-r

Manfred Bergmann

unread,
Feb 16, 2020, 5:15:29 AM2/16/20
to elixir-l...@googlegroups.com
Hi.

I don’t want to confuse things.
But what good would ‚maybe‘ do?
Either you import something or not.
If an imported function if not used it should be removed from the import.
It seems to me ‚maybe‘ adds a whole lot of complexity that isn’t necessary.



Manfred
> --
> You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/AB1BF653-D5D9-4B28-9950-3F8A8999F699%40cfcl.com.

Reply all
Reply to author
Forward
0 new messages