On 20/07/2020 01:34, Mike Copeland wrote:
>
> I am working on a C++ source file analyzer; I had one that worked for
> C sources. That program was written many years ago, and I'm attempting
> to update it for C++ code, as well as use C++ structures and features.
> The code for parsing C++ code is tedious, and I'm looking for a
> library or functional code that will (1) parse non-comment code elements
> and (2) return token strings.
> Is there something I can link to/use that will help me? I've done
> some Google searching and have seen references to Clang, Elsa, Metre and
> ANTLR - all of which seem much more than I need. I just want source
> code tokens and to know which source code line they're from.
> TIA
>
I think it is unlikely that you'll get far without using a big project.
Parsing C++ has got more and more difficult - there are more new
syntaxes, context-dependent keywords, even a new operator in the latest
version. I would recommend you look again at existing parsers, and see
if you can learn to use them.
It might take you time to get the hang of clang as a parser, but that's
a job you do once - and then you can take advantage of all the work they
do and you don't have to update or re-write things for each new C++
version. clang is /designed/ to be usable as a library, and as a
parser, for syntax highlighting in IDE's, for making static analysers,
for JIT compilation, and other tools.
I don't know the other tools you mentioned, but I personally would
definitely concentrate on clang first. I'd start with the existing
clang analyser, and see where that could take me - that could be a very
good starting point for adding the new analysers that interest you.
(gcc might also be worth a look these days. There is an analyser
framework in the latest version, there is support for plugins that can
get access to parsed source information for checking, with existing
plugins for other kinds of static or style checking. There is even a
project underway for making a JIT compiler library of gcc. I don't
think gcc is as far down this path as clang, but maybe it is of use.)