Guidance on dealing with TokenInfo objects

83 views
Skip to first unread message

Karim Tera

unread,
Apr 6, 2022, 7:30:37 PM4/6/22
to Verible Developers
Hello All, 

I am trying to add a method that expands a macro to the preprocessor, however I am facing some issues.

So, let's call this new method: ExpandMacro, it is used whenever a macro is called.

Now, what ExpandMacro doing is:
  1. Search for the macro definition in 'preprocess_data_.macro_definitions'
  2. Feed the found 'PP_define_body' token to the lexer (VerilogLexer)
  3. Append the new lexed tokens to  'preprocess_data_.preprocessed_token_stream' token stream, instead of the 'PP_define_body'
The test case:

`define FIRST a
`define SECOND b

module m;
wire `FIRST;
wire `SECOND;
endmodule

Result:
Both macros are expanded to b.
If we remove 'wire `SECOND;' line, then `FIRST is expanded to a.

The problem:
Filling a TokenStreamView ( 'preprocess_data_.preprocessed_token_stream' in this case) with const::iterators of 2 different TokenSequence objects, the original text corresponds to the SV code, and the new TokenSequence created from the new lexed tokens in ExpandMacro

It seems that for different calls of ExpandMacro, it creates new lexed TokenInfo with the same reference, so it overrides the values that are pointed to by the previous iterators of the previous call.

I am not sure, if that's the correct problem cause, can you help me on this a bit?

Thanks,
Karim

Henner Zeller

unread,
Apr 6, 2022, 7:44:38 PM4/6/22
to Karim Tera, Verible Developers, David Fang
On Wed, 6 Apr 2022 at 16:30, Karim Tera <karim...@gmail.com> wrote:
Hello All, 

I am trying to add a method that expands a macro to the preprocessor, however I am facing some issues.

Thanks for working on the preprocessor, that is very welcome!
 

So, let's call this new method: ExpandMacro, it is used whenever a macro is called.

Now, what ExpandMacro doing is:
  1. Search for the macro definition in 'preprocess_data_.macro_definitions'
  2. Feed the found 'PP_define_body' token to the lexer (VerilogLexer)
  3. Append the new lexed tokens to  'preprocess_data_.preprocessed_token_stream' token stream, instead of the 'PP_define_body'
The test case:

`define FIRST a
`define SECOND b

module m;
wire `FIRST;
wire `SECOND;
endmodule

Result:
Both macros are expanded to b.
If we remove 'wire `SECOND;' line, then `FIRST is expanded to a.

The problem:
Filling a TokenStreamView ( 'preprocess_data_.preprocessed_token_stream' in this case) with const::iterators of 2 different TokenSequence objects, the original text corresponds to the SV code, and the new TokenSequence created from the new lexed tokens in ExpandMacro

I think this might be the problem: you have iterators pointing to two different backing stores. So the new backing store where the expanded token sequence exists should still be available; I suspect you run into some situation in which you actually read deleted memory (though haven't seen your code, so this is just a guess)

(Running with asan might help to detect if such situation happens
 bazel test --config=asan //...
)

Now I don't actually know why the token sequence points to iterators of tokens and does not contain TokenInfo directly; it sounds like an unnecessary indirection, but maybe David (CC'ed) can shed light on that.
 
It seems that for different calls of ExpandMacro, it creates new lexed TokenInfo with the same reference, so it overrides the values that are pointed to by the previous iterators of the previous call.

I am not sure, if that's the correct problem cause, can you help me on this a bit?

Thanks,
Karim

--
You received this message because you are subscribed to the Google Groups "Verible Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verible-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/verible-dev/1ddf7147-2728-4585-b8ec-bee54b38ea79n%40googlegroups.com.

Karim Tera

unread,
Apr 6, 2022, 7:54:50 PM4/6/22
to Verible Developers

Thanks for the explanation.


I will try this one.
 
Now I don't actually know why the token sequence points to iterators of tokens and does not contain TokenInfo directly; it sounds like an unnecessary indirection, but maybe David (CC'ed) can shed light on that.
 

TokenSequence is just a vector of TokenInfo, but TokenStreamView is a vector of TokenSequence::const_iterator

Henner Zeller

unread,
Apr 6, 2022, 8:00:57 PM4/6/22
to Karim Tera, Verible Developers
Are you storing the TokenSequence you get from lexing the define-body somewhere, like alongside the define body, so that its memory is backed ?
 
 
It seems that for different calls of ExpandMacro, it creates new lexed TokenInfo with the same reference, so it overrides the values that are pointed to by the previous iterators of the previous call.

I am not sure, if that's the correct problem cause, can you help me on this a bit?

Thanks,
Karim

--
You received this message because you are subscribed to the Google Groups "Verible Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verible-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/verible-dev/1ddf7147-2728-4585-b8ec-bee54b38ea79n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Verible Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verible-dev...@googlegroups.com.

Karim Tera

unread,
Apr 6, 2022, 8:08:21 PM4/6/22
to Verible Developers
No, what does it mean to have its memory backed? Sorry 
I thought i might need to save it in the MacroDefiniton itself, it will be more efficient in case there are many macro calls.

Henner Zeller

unread,
Apr 6, 2022, 8:29:45 PM4/6/22
to Karim Tera, Verible Developers
On Wed, 6 Apr 2022 at 17:08, Karim Tera <karim...@gmail.com> wrote:
No, what does it mean to have its memory backed? Sorry 
I thought i might need to save it in the MacroDefiniton itself, it will be more efficient in case there are many macro calls.

Yes, storing it in the MacroDefinition sounds good. With 'memory backed' I just mean that the iterators you hand out need to point to something that is still there when someone is dereferencing them.
So keeping it in the MacroDefinition is the right approach. One thing to be aware of is that if you insert more MacroDefinitions, the map might re-alloc memory, which might copy the values around - so you need to make sure that the memory your iterators are pointing to is still the same. Typically, the values are std::move'ed so this might be fine but being aware what is happening in that case is good.

Karim Tera

unread,
Apr 6, 2022, 8:50:51 PM4/6/22
to Henner Zeller, Verible Developers
On Thu, 7 Apr 2022 at 2:29 AM Henner Zeller <hze...@google.com> wrote:
On Wed, 6 Apr 2022 at 17:08, Karim Tera <karim...@gmail.com> wrote:
No, what does it mean to have its memory backed? Sorry 
I thought i might need to save it in the MacroDefiniton itself, it will be more efficient in case there are many macro calls.

Yes, storing it in the MacroDefinition sounds good. With 'memory backed' I just mean that the iterators you hand out need to point to something that is still there when someone is dereferencing them.
So keeping it in the MacroDefinition is the right approach. One thing to be aware of is that if you insert more MacroDefinitions, the map might re-alloc memory, which might copy the values around - so you need to make sure that the memory your iterators are pointing to is still the same. Typically, the values are std::move'ed so this might be fine but being aware what is happening in that case is good.

Yeah, I understand that the dereferencing problem might happen in later stages, but not at this step at least. I am testing each step i do, and i was confused by this result.

But it is a better to try the MacroDefinition approach.

Thanks for the explanation!
Reply all
Reply to author
Forward
0 new messages