--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
64-bit integers in the token would result in tremendous memory overhead for the 99.999% case to support a scenario which really isn’t intended to be supported at all.
The C++ runtime would be better equipped to handle this through the use of type traits to configure the token size at compile time. Some other languages like C# could do something similar in theory, but in practice it becomes quite burdensome even for relatively simple cases.
Sam
As a quick clarification on “not supported”:
Due to unbounded lookahead as part of the adaptive prediction algorithm, ANTLR makes no guarantees regarding the memory usage with unbuffered streams. For the purposes of conserving memory, the current BufferedCharStream a valid implementation of UnbufferedCharStream. Any manipulation of the grammar and/or input in order to prevent UnbufferedCharStream from reading to the end of the file is relying on implementation details of ANTLR which are subject to change between releases. There are only two ways to force ANTLR to conserve memory in order to parse streaming input:
Due to the difficulty of implementing the second item correctly, I would only really expect users to go with the first approach. When the first approach is used, many side benefits are realized:
Sam
I believe the regular test cases I use have around 10,000,000 tokens. More to the point though is the fact that ANTLR 4 wasn’t designed to handle multi-gigabyte input streams, so 32-bit indexes should never be out of range.
The Java and C# runtime libraries would have 4-byte alignment for these fields in practice. Other runtimes may use different values.
Sam
From: 'Mike Lischke' via antlr-discussion [mailto:antlr-di...@googlegroups.com]
Sent: Monday, July 17, 2017 9:31 AM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] UnbufferedCharStream IllegalArgumentException: invalid interval help
About what token count are we talking here, Sam? 10000 or 100000 (which would be a lot for a single parse run)? That's not even 3MB with the current implementation. Not really tremendous if you ask me. Or do you have something else in mind here? Also keep in mind that with smaller field sizes you probably won't save memory since the compiler will likely align fields at boundaries bigger than the field size.
--