Well, what is being generated for C# is not especially useful to me for what I am wanting to accomplish. Here are some Lexer snippets, for instance. Snippets intentionally stripped down a bit for example only:
public const int
CLOSE_SQUARE_BRACKET=14, COMMA=15, DEFAULT=16, DOT=17, ENUM=18, EOS=19,
PUBLIC=37, REPEATED=38, REQUIRED=39, RESERVED=40, SIGN=41, SYNTAX=42,
OCT_LIT=63, DEC_LIT=64, INFINITY=65, NOT_A_NUMBER=66, FLOAT_DIG_DOT_DIG_OPT_EXP=67,
FLOAT_DIG_EXP=68, FLOAT_DOT_DIG_OPT_EXP=69, IDENT=70, GROUP_NAME=71;
public static readonly string[] ruleNames = {
"LET_DIG_UNDERSCORE", "OCT_DIG", "SIGNAGE", "UNDERSCORE", "X", "ZED",
"OPEN_CURLY_BRACE", "OPEN_PAREN", "OPEN_SQUARE_BRACKET", "OPTION", "OPTIONAL",
"UINT32", "UINT64", "BOOLEAN_FALSE", "BOOLEAN_TRUE", "HEX_LIT", "OCT_LIT",
"FLOAT_DOT_DIG_OPT_EXP", "IDENT", "GROUP_NAME"
};
private static readonly string[] _LiteralNames = {
"';'", "'='", "'extend'", "'extensions'", "'field'", "'group'", "'import'",
"'double'", "'fixed32'", "'fixed64'", "'float'", "'int32'", "'int64'",
"'uint64'", "'false'", "'true'", null, null, null, "'inf'", "'nan'"
};
private static readonly string[] _SymbolicNames = {
"COMMA", "DEFAULT", "DOT", "ENUM", "EOS", "EQU", "EXTEND", "EXTENSIONS",
"TO", "WEAK", "BOOL", "BYTES", "DOUBLE", "FIXED32", "FIXED64", "FLOAT",
"FLOAT_DOT_DIG_OPT_EXP", "IDENT", "GROUP_NAME"
};
What's going on, understandably so, is that the lexer rules themselves are codified. Fine, this is all well and good. However, I would consider actual KEYWORDS to be a subset of those rules, potentially.
Which yields language level tokens in a more usable form, which I can then gather into an enumerable Keyword set.
I just wondered if something like this wasn't already being done for the Antlr Lexer code generation, but it would seem it is not.