Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C structure analyzer ("Reflection" in C?)

365 views
Skip to first unread message

phollin...@sbsintl.com

unread,
Jul 19, 1999, 3:00:00 AM7/19/99
to
Hi,
I'm dealing with a 3rd party API (The Eurex exchange "Values" API)
that uses a large number of C structures. The exchange supplies C
header files that describe structures which define the format of the
input and output messages. These header files provide structures that
define the message format of blocks of bytes that are sent to, and
received from, the exchange.

Does anybody know of a tool that I could use that converts the C
structure definitions into some other source file that has each data
member name, byte offset, size etc so that I could make a program to
dynamically compose these "structures"?

For example, if it had been done entirely in Java, then I could
use "Reflection" to analyse the methods and properties and create a
dynamic interface.

The annoying thing is, I know that somewhere in VC++ it has all
of this and more because you can browse all of the data structures in
a program from within the debugger.

Does anyone know of a way to get VC++ to dump this information
into some sort of "C source" file that can be included elsewhere?

My ideal is to be able allow the user to edit the structures
dynamically and then have my program use the information to package up
the resultant data so that it can be sent to the exchange...

Ideally, I give this hypothetical program this:

#include <eucb_fieldsizes.h> // Defines PASSWORD_LEN, LOGIN_LEN,
REQUEST_ID_LEN etc.
struct Authorization {
char password[PASSWORD_LEN];
char login[LOGIN_LEN];
};

struct Message {
char request_id[REQUEST_ID_LEN];
Authorization authorization_data;
};

And it generates something like this:


struct datainfo {
const char *name;
const char *member;
const char *member_type;
int offset_in_bytes,
int size_in_bytes
};

// P is parent structure type
// F is the first member name
// M is member name for the offset
// T is the type of the member
#define OFFSET(P,F,M) ((unsigned long)&((P*)0)->M) - ((unsigned
long)&((P*)->F)

#define SIZEOF(P,M) (unsigned long)sizeof(((P*)0)->M))

#define FIELD(P,M,T,F) #P,#M,#T,OFFSET(P,F,M), SIZEOF(P,M)

struct datainfo all_structures[] = {
FIELD(Message,request_id,CHAR,request_id,request_id),
FIELD(Message,authorization_data,Authorization,request_id),
FIELD(Authorization,password,CHAR,password),
FIELD(Authorization,login,CHAR,password)
}


which after the preprocessor is done would look like:


struct datainfo all_structures[] = {
"Message", "request_id", "CHAR", 0, 10,
"Message", "authorization_data", "Authorization", 10, 20,
"Authorization", "password", "CHAR", 0, 10,
"Authorization", "login", "CHAR", 10, 10
};

The use of the macros etc. would mean that implicit alignments made
by the compiler would also have the desired effect.

I could run the header files through this program, generate an
"all_structures.h" file that I could include, and then my program can
dynamically use the structures instead...

OK, I hope you get the idea...

The C Exploration Tools (CXT:
http://providenet.softseek.com/Programming/C/Review_17840_index.shtml
) comes close, and I could probably muck around about with perl to
take it's output with various options and generate what I want - but
maybe somebody knows of a better, more direct solution...

Jan Gray

unread,
Jul 20, 1999, 3:00:00 AM7/20/99
to
phollin...@sbsintl.com wrote in message <99-0...@comp.compilers>...

> Does anybody know of a tool that I could use that converts the C
>structure definitions into some other source file that has each data
>member name, byte offset, size etc so that I could make a program to
>dynamically compose these "structures"?

> The annoying thing is, I know that somewhere in VC++ it has all


>of this and more because you can browse all of the data structures in
>a program from within the debugger.
>
> Does anyone know of a way to get VC++ to dump this information
>into some sort of "C source" file that can be included elsewhere?

I don't know, but here's some VC++-specific background and ideas.

By default, a "debug" VC++ build compiles -Zi, which reposits the
compiland's type, symbol, address, etc. debug information in a .pdb
(program database) file, which is then used for many purposes,
including debugging. But (as far as I know) the pdb interfaces are
unpublished.

Instead, compile with the -Z7 flag, which emits old-fashioned MS C7.0
style CodeView debug records into your object file or executable.
Then open this COFF file, find the debug info, and read the symbol and
type records. There you'll find LF_STRUCTURE type records for your
structure definitions, including field names, type indices, and field
offsets. Then recover the types themselves by walking the type graph
using the type indices. But that's a lot of fiddly work. See
http://msdn.microsoft.com/library/specs/S664F.HTM for the CodeView
info spec.

Also check out the Microsoft Research Semantics-Based Tools Group's
AST toolkit.

"The goal of the AST ToolKit project is to provide product development
groups with a public API to a C++ program's parse tree, symbol table,
and types, as well as to provide us with an infrastructure for our own
research using product groups' code bases."

See http://research.microsoft.com/sbt/ for more information.

Jan Gray
[I suspect it'd be easier to write some perl programs that understand
just enough C to process the structure declarations in the source than
to grovel through the COFF symbols. -John]

Pierre R. Mai

unread,
Jul 21, 1999, 3:00:00 AM7/21/99
to
"Jan Gray" <jsg...@acm.org> writes:

> [I suspect it'd be easier to write some perl programs that understand
> just enough C to process the structure declarations in the source than
> to grovel through the COFF symbols. -John]

I'd imagine that glue-generating tools like SWIG (which parses C/C++
to automatically generate glue-code for embedded scripting languages
like Perl, Tcl, Guile, etc.) could be "misused" for this job.
Especially SWIG seems well suited for something like this. See
http://www.swig.org/ for information on SWIG.

I'd try to avoid writing something to parse the C headers, since this
usually ends up being more work than at first assumed (depending on
the headers in question, typedefs have to be resolved, etc.)...

Regs, Pierre.
--
Pierre Mai <pm...@acm.org> PGP and GPG keys at your nearest Keyserver
[Someone else wrote in and noted that in the CPAN perl archive there's
a C source analysis module that looks like it does a pretty good job
of parsing C code. -John]

Keith Thompson

unread,
Jul 23, 1999, 3:00:00 AM7/23/99
to
phollin...@sbsintl.com writes:
> Hi,
> I'm dealing with a 3rd party API (The Eurex exchange "Values" API)
> that uses a large number of C structures. The exchange supplies C
> header files that describe structures which define the format of the
> input and output messages. These header files provide structures that
> define the message format of blocks of bytes that are sent to, and
> received from, the exchange.
>
> Does anybody know of a tool that I could use that converts the C
> structure definitions into some other source file that has each data
> member name, byte offset, size etc so that I could make a program to
> dynamically compose these "structures"?

Once upon a time, I wrote such a tool. (No, sorry, it's not
available.) I used a modified version of third-party freeware C
parser to parse the headers and generate a symbol table. From the
symbol table, I generated tables of type and member names. From the
tables, I generated C sources (with *lots* of ugly macros) that I
compiled and executed to generate more tables. And so on. (No, there
weren't an infinite number of passes, it just seemed that way.)

Extracting information from compiler generated debug info wasn't an
option, since the tool had to work on several different platforms.

As far as parsing is concerned, typedefs can be nasty. You might
consider grabbing an existing compiler and deleting what you don't
need, rather than grabbing a parser and adding what you do need.
Bitfields are also, ahem, interesting, since you can't take a
bitfield's address or use the offsetof() macro.

If you don't need the process to be 100% automated, you can probably
save a lot of effort by writing a quick and dirty tool to extract type
and member names, cleaning up the output manually, and transforming
that into C code that prints the size of each type, and the size and
offset of each member. Take a look at the way your implementation
defines the offsetof macro in <stddef.h>. If you understand how it
works, you can probably figure this stuff out.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>

Jerry Pendergraft

unread,
Jul 28, 1999, 3:00:00 AM7/28/99
to
Keith Thompson wrote:
>
> > Does anybody know of a tool that I could use that converts the C
> > structure definitions into some other source file that has each data
> > member name, byte offset, size etc so that I could make a program to
> > dynamically compose these "structures"?
>
> Once upon a time, I wrote such a tool. (No, sorry, it's not

Me too, (not available either). But there are several things which
make this a very difficult job in addition to the parsing problems
mentioned by others.

The actual layout of a structure is largely up to the specific compiler,
for such things as alignment, which in turn is controlled by

#pragma pack n

All of this makes the output you would get VERRRY non portable anyway
even from one compile run to the next. I would suggens taking another
look at exactly what you need this information for and see if there is
not another way than spelunking the source.
--
Jerry Pendergraft
jerry.pe...@endocardial.com
Endocardial Solutions voice: 651-523-6935
1350 Energy Lane, Suite 110 fax: 651-644-7897
St Paul, MN 55108-5254

Norbert Berzen

unread,
Jul 30, 1999, 3:00:00 AM7/30/99
to
Jerry Pendergraft wrote:

>The actual layout of a structure is largely up to the specific compiler,
>for such things as alignment, which in turn is controlled by
>
>#pragma pack n
>
>All of this makes the output you would get VERRRY non portable anyway
>even from one compile run to the next.

This depends on the code generated by the tool. The code needs not to
have hardwired layout information in it. It could generate C/C++ code
which gets linked together with the application. This code executed
during application startup phase could dynamically extract layout
information solely based on simple prep macros as `offsetof()' and
builtin operators as `sizeof()' and some slightly trickier ones
(e.g. to extract layout information of virtual bases). We developed
such kind of tool as part of a persistent object store for C/C++.
Basically our tool is a C++ compiler frontend with three different
backends:

1) A backend which mostly reflects its input. The input gets somewhat
augmented by special constructors and friend declarations.

2) A backend which generates C++ source to extract layout information
(This code gets linked together with the code generated by (1)).

3) A backend which synthesizes database type layout information.

The code generated by the tool should be portable across different
compilation environments since it does not make any assumptions on the
compiler's type layout. With the one exception that virtual base classes
are assumed to be represented as some kind of pointers. But AFAIK all
standard C++ compilers match this assumption.

Greetings
--
Norbert

Keith Thompson

unread,
Jul 30, 1999, 3:00:00 AM7/30/99
to
Jerry Pendergraft <jerry.pe...@endocardial.com> writes:
[...]

> The actual layout of a structure is largely up to the specific compiler,
> for such things as alignment, which in turn is controlled by
>
> #pragma pack n
>
> All of this makes the output you would get VERRRY non portable anyway
> even from one compile run to the next.

Which is why, if you're going to write code that depends on the
generated layout tables, you should regenerate the tables every time
you rebuild the software. Just add it to the Makefile (or
equivalent).

> I would suggens taking another
> look at exactly what you need this information for and see if there is
> not another way than spelunking the source.

Agreed.

Dibyendu Majumdar

unread,
Jul 30, 1999, 3:00:00 AM7/30/99
to
phollin...@sbsintl.com wrote:

> Does anybody know of a tool that I could use that converts the C
> structure definitions into some other source file that has each data
> member name, byte offset, size etc so that I could make a program to
> dynamically compose these "structures"?

Hello.

I am not sure if this will help ...

I used the parser in the UPS C Interpreter to generate the following:

union example size=136 offset=0 {
struct person size=136 offset=0 {
(array[10] of char) name size=10 offset=0
(array[20] of char) surname size=20 offset=10
(int) age size=4 offset=32
(double) salary size=8 offset=36
struct address size=87 offset=44 {
(array[30] of char) city size=30 offset=0
(array[30] of char) state size=30 offset=30
(array[7] of char) postcode size=7 offset=60
(array[20] of char) country size=20 offset=67
}
(pointer to struct person) next size=4 offset=132
}
struct address size=87 offset=0 {
(array[30] of char) city size=30 offset=0
(array[30] of char) state size=30 offset=30
(array[7] of char) postcode size=7 offset=60
(array[20] of char) country size=20 offset=67
}
(double) d size=8 offset=0
(char) c size=1 offset=0
(int) i size=4 offset=0
}
struct person size=136 offset=0 {
(array[10] of char) name size=10 offset=0
(array[20] of char) surname size=20 offset=10
(int) age size=4 offset=32
(double) salary size=8 offset=36
struct address size=87 offset=44 {
(array[30] of char) city size=30 offset=0
(array[30] of char) state size=30 offset=30
(array[7] of char) postcode size=7 offset=60
(array[20] of char) country size=20 offset=67
}
(pointer to struct person) next size=4 offset=132
}
struct address size=87 offset=0 {
(array[30] of char) city size=30 offset=0
(array[30] of char) state size=30 offset=30
(array[7] of char) postcode size=7 offset=60
(array[20] of char) country size=20 offset=67
}

from the file example7.data containing:

struct address {
char city[30];
char state[30];
char postcode[7];
char country[20];
};

struct person {
char name[10];
char surname[20];
int age;
double salary;
struct address addr;
struct person *next;
};

union example {
struct person p;
struct address a;
double d;
char c;
int i;
};

void dummy() {}

The code that does this looks like:

/* example7.c */
/* author: Dibyendu Majumdar */
/* created: 29 July 1999 */

/* This program illustrates how the UPS C Interpreter's parser can be
* used as a "reflection" mechanism.
* Run this as:
* example7 example7.data
*/

#include "Cinterpreter.h"

void
for_each_global_aggr_type(
Cinterpreter_t *ci,
void (*funcptr)(aggr_or_enum_def_t*,int,int))
{
parse_res_t *pres = (parse_res_t *) ci->ci_parse_id;
block_t *block = pres->pr_block;
aggr_or_enum_def_t *ae = block->bl_aggr_or_enum_defs;

for (; ae != NULL; ae = ae->ae_next) {
funcptr(ae, 0, 0);
}
}

void dumptype(aggr_or_enum_def_t *ae, int offset, int level)
{
var_t *var;
if ((ae->ae_type->ty_code != TY_STRUCT
&& ae->ae_type->ty_code != TY_UNION)
|| ae->ae_is_complete == AE_INCOMPLETE) {
fprintf(stderr, "Aggregate type %s is not a struct or
union\n"
"or is incompletely defined\n",
ae->ae_tag);
return;
}
fprintf(stdout, "%*s%s %s size=%d offset=%d {\n", level, "",
(ae->ae_type->ty_code == TY_STRUCT ? "struct" : "union"),
ae->ae_tag, ae->ae_size, offset);
for (var = ae->ae_aggr_members; var != NULL; var = var->va_next) {

if (var->va_type->ty_code == TY_STRUCT) {
dumptype(var->va_type->ty_aggr_or_enum,
var->va_addr, level+4);
}
else {
char *str;
str = ci_type_to_english(var->va_type, FALSE);
fprintf(stdout, " %*s(%s) %s size=%d
offset=%d\n",
level, "", str, var->va_name,
var->va_type->ty_size, var->va_addr);
free(str);
}
}
fprintf(stdout, "%*s}\n", level, "");
}

int main(int argc, char *argv[])
{
Cinterpreter_t ci;

if ( argc != 2 ) {
fprintf(stderr, "usage: example7 <file>\n");
exit(1);
}

if ( ci_create_interpreter_from_file(&ci, argv[1], NULL,
NULL) ) {
for_each_global_aggr_type(&ci, dumptype);
ci_destroy_interpreter(&ci);
}

return 0;
}

The UPS C Interpreter can be downloaded from
http://www.mazumdar.demon.co.uk/ups3.tar.gz .
This is my private version - it incorporates the ups-3.34-beta4 from the
official UPS website
http://www.concerto.demon.co.uk/UPS.

The example above can be found in the sub-directory
interpreter/tutorial/example7.c (in my version only).

Please note that the ups-3.33 C Interpreter will not work in
stand-alone mode - you need either the ups-3.34-beta4 or my version
which has some additional bug fixes.

I built the UPS C Interpreter on SuSE Linux 6.1 (Kernel 2.2.7). It
appears to me that you are probably looking for a Windows tool. I
think the UPS C Interpreter can be built on Windows using the Win32
GNU C compiler from Cygnus (which has a UNIX emulation layer).

I shall post an article on UPS C Interpreter.

Regards

Volker Denneberg

unread,
Jul 30, 1999, 3:00:00 AM7/30/99
to
phollin...@sbsintl.com writes:
> > that uses a large number of C structures. The exchange supplies C
> > header files that describe structures which define the format of the
> > input and output messages. These header files provide structures that
> > define the message format of blocks of bytes that are sent to, and
> > received from, the exchange.
> >
> > Does anybody know of a tool that I could use that converts the C
> > structure definitions into some other source file that has each data
> > member name, byte offset, size etc so that I could make a program to
> > dynamically compose these "structures"?


What about the compiler itself ?

#define OFFSET( struct, member) unsigned( ((struct*)NULL)->member) -
NULL)

// print member name, size and offset:

#define DUMP( os, struct, member) \
os << #member <<" : size=" << sizeof(member) << "\n" \
<< : ofs =" << OFFSET( struct, member) "\n" \

main()
{
struct S { int a,b; char c[10];} // or any other struct

DUMP( cout, S, a);
DUMP( cout, S, b);
DUMP( cout, S, c);
}

or something like that. Should print your table

volker

Andy Newman

unread,
Aug 1, 1999, 3:00:00 AM8/1/99
to
The information you want is typically in the debug information
generated by the compilers. I once wrote a program to extract the type
information out of the debug information in the object files generated
by two different compilers and compare them for equivilence. This was
for a shared memory system with two different processors and let us
compare the object code's idea of the structure layouts used in the
comms. code to verify the compilers were doing what we thought they
should be doing with the structure layout (to a degree of course, we
had to trust the debug info). The code read an object file and built a
structure representing the types defined in the debug info. It isn't
hard.

If you have a Unix system with the binutils objdump handy trying doing
an "object --debugging" on an object file with debug symbols. Here's
some snippets of its output from my FreeBSD system,

typedef long int time_t;
typedef unsigned int uintptr_t;
struct _physadr { /* size 4 id 2 */
int r[1]; /* bitsize 32, bitpos 0 */
};

typedef struct _physadr /* id 2 */ *physadr;
struct flock { /* size 24 id 5 */
long long int l_start; /* bitsize 64, bitpos 0 */
long long int l_len; /* bitsize 64, bitpos 64 */
int l_pid; /* bitsize 32, bitpos 128 */
short int l_type; /* bitsize 16, bitpos 160 */
short int l_whence; /* bitsize 16, bitpos 176 */
};

Chuck in some run-time linking and C can get a little more dynamic.

0 new messages