Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

design/refactoring of message-based system in C++: ideal solution?

11 views
Skip to first unread message

SegFault

unread,
Nov 24, 2009, 12:27:34 AM11/24/09
to
Hello.

Situation:
I'm currently converting a piece of older software (original code was
written in C somewhere in 1990s, then was converted to C++ somewhere
in 2000..2005, then I started working with the code in this year).
Software uses message-based system I need to refactor/rewrite/convert/
modify (client-server communication over tcp/udp). I need to refactor
the system, because it is nearly unmaintainable, I need to modify
communication protocol and change data types for some variables, which
will break everything due to ambiguous type conversion.
See detailed system description at the end of letter.

The problem:
I can't think of a good(perfect) design for new message system - they
all seem to have flaw at one point or another. It isn't necessary to
preserve existing protocol, but I need something more flexible.
Currently system uses switch/case to process different messages, which
is ugly, and hard to maintain. Also, currently system doesn't have
structs for every message type. I'm currently trying to create struct/
class for every message type, with common base class that has virtual
method "Save to packet", then create std::map which will map message
types to message constructors (i.e. factory pattern), then maybe turn
"messages" into "command" pattern. However, it is still far from
perfect - I can't initialize map with constructors automatically, for
example, I'll have to create quite a lot of redundant structs with
lots of virtual methods, etc. Is there a better way to do this?

Thanks in advance.

Description of the system:
1) Approximately 2 megabytes of code.
2) Uses approximately 70 types of messages, message type is indicated
by "uchar", which is sent first.
3) Each message type has different number of parameters (typically
longs and ints, sometimes strings).
4) Each message type has fixed number of parameters, different types
frequently have different number of parameters.
5) Messages are being transferred using only 20 overloads where every
single overload doesn't seem to correspond to single type of message.
Overloads look like this (now you'll see the problem, I think):
---
int PushMsg(uchar msg, ushort, int);
int PushMsg(uchar msg, ushort, char *, int);
int PushMsg(uchar msg, ushort, ushort, char *, int);
int PushMsg(uchar msg, ushort, uchar, int);
int PushMsg(uchar msg, ushort, ushort, int);
int PushMsg(uchar msg, ushort, uchar, uchar, int);
int PushMsg(uchar msg, ushort, ushort, uchar, int);
int PushMsg(uchar msg, ushort, ushort, uchar, long, long, int);
int PushMsg(uchar,short,ushort,ushort,uchar,int);
int PushMsg(uchar,ushort,ushort,short,uchar,long,long,int);
---
6) There are no structs defined for every message type.
7) In the end message data is being stored into one huge message
struct (with few dozens of members) which is not a union and seems to
contain every possible parameter.
8) Received messages are processed using huge switch/case.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Nick Hounsome

unread,
Nov 24, 2009, 11:03:32 AM11/24/09
to
On 24 Nov, 05:27, SegFault <altctrlsys...@gmail.com> wrote:
> Hello.
>
> Situation:
> I'm currently converting a piece of older software (original code was
> written in C somewhere in 1990s, then was converted to C++ somewhere
> in 2000..2005, then I started working with the code in this year).
> Software uses message-based system I need to refactor/rewrite/convert/
> modify (client-server communication over tcp/udp). I need to refactor
> the system, because it is nearly unmaintainable,

I sympathise but IMHO this is very "courageous".
If it has any bugs it will be your fault.
If you do a wonderful conversion management will just want to know why
you cost them so much money to acheive something that already works.
Unless you are sure that you can do a good job quickly or you are
going to have to maintain it for some time it's just not in your best
interest to mess with it. This is painful experience speaking.

> I need to modify
> communication protocol and change data types for some variables, which
> will break everything due to ambiguous type conversion.
> See detailed system description at the end of letter.
>
> The problem:
> I can't think of a good(perfect) design for new message system - they
> all seem to have flaw at one point or another. It isn't necessary to
> preserve existing protocol, but I need something more flexible.
> Currently system uses switch/case to process different messages, which
> is ugly, and hard to maintain. Also, currently system doesn't have
> structs for every message type. I'm currently trying to create struct/
> class for every message type, with common base class that has virtual
> method "Save to packet", then create std::map which will map message
> types to message constructors (i.e. factory pattern), then maybe turn
> "messages" into "command" pattern. However, it is still far from
> perfect - I can't initialize map with constructors automatically, for
> example, I'll have to create quite a lot of redundant structs with
> lots of virtual methods, etc. Is there a better way to do this?
>

I've done it a million times.
Wrap each message in with a length and (optionally) a trailing CRC.
Create a map from message ID to parser function (obviously cannot be
ctor directly) passing the buffer and length.

You can avoid a master file that knows all messages and neeeds
updating for every new one if you populate the map using static
instances of a special class and resist the urge to have an enum:

class MapFiller {
MapFiller(int id, Message* (*parser)(const void* base,size_t
length)) { msgmap[id] = parser; }
}

// in Msg1.cpp

static MapFiller(Msg1::ID,Msg1::Parse);

If it is useful to enumerate all message types then just add a name
parameter and map that too.

Don't put Msg1.o in a library or it wont get linked.

> Thanks in advance.
>
> Description of the system:
> 1) Approximately 2 megabytes of code.

I did say that you should be cautious about messing with this!

> 2) Uses approximately 70 types of messages, message type is indicated
> by "uchar", which is sent first.

uint8_t would be the portable type. (Or it least it would if Microsoft
implemented all the correct headers)

> 3) Each message type has different number of parameters (typically
> longs and ints, sometimes strings).
> 4) Each message type has fixed number of parameters, different types
> frequently have different number of parameters.
> 5) Messages are being transferred using only 20 overloads where every
> single overload doesn't seem to correspond to single type of message.
> Overloads look like this (now you'll see the problem, I think):
> ---
> int PushMsg(uchar msg, ushort, int);
> int PushMsg(uchar msg, ushort, char *, int);
> int PushMsg(uchar msg, ushort, ushort, char *, int);
> int PushMsg(uchar msg, ushort, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, int);
> int PushMsg(uchar msg, ushort, uchar, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, uchar, long, long, int);
> int PushMsg(uchar,short,ushort,ushort,uchar,int);
> int PushMsg(uchar,ushort,ushort,short,uchar,long,long,int);

If you want to leave this unchanged then you are going to need to map
a message ID to a builder function taking varargs which will then call
the ctor with the proper args.

> ---
> 6) There are no structs defined for every message type.

I agree that each needs a class

> 7) In the end message data is being stored into one huge message
> struct (with few dozens of members) which is not a union and seems to
> contain every possible parameter.

Use length and custom read/write

> 8) Received messages are processed using huge switch/case.

Use the map

Goran

unread,
Nov 24, 2009, 11:20:52 AM11/24/09
to
On Nov 24, 6:27 am, SegFault <altctrlsys...@gmail.com> wrote:
> The problem:
> I can't think of a good(perfect) design for new message system - they
> all seem to have flaw at one point or another. It isn't necessary to
> preserve existing protocol, but I need something more flexible.
> Currently system uses switch/case to process different messages, which
> is ugly, and hard to maintain. Also, currently system doesn't have
> structs for every message type. I'm currently trying to create struct/
> class for every message type, with common base class that has virtual
> method "Save to packet", then create std::map which will map message
> types to message constructors (i.e. factory pattern), then maybe turn
> "messages" into "command" pattern. However, it is still far from
> perfect - I can't initialize map with constructors automatically, for
> example, I'll have to create quite a lot of redundant structs with
> lots of virtual methods, etc. Is there a better way to do this?

I'd call your problem a serialization (a.k.a persistence) one ;-).
Certainly a good starting point is to define a type (class/struct) for
every actual message, and to have save and load methods in there.
Now... You have to read at least something before deciding the type of
the message, I'd say that is common for any "polymorphic"
serialization scheme (your first byte, perhaps more). That's OK from
where I stand, and that gives e.g.:

struct base_message_type
{
virtual void load(load_stream&) = 0;
virtual void store(store_stream&) = 0;
};

Mapping message_type<=>creation_facility using std::map is fine, too.
For creation_facility, simplest possible is to reach for a function
pointer. But you will probably need more, so you'll end up with some
sort of a meta-class (description of a message_type). That would be a
dreaded (one per class) singleton, e.g.

struct message_type_class
{
virtual base_message_type* create_object(char wire_message_type) =
0;
// Other things...
}

> Description of the system:
> 1) Approximately 2 megabytes of code.
> 2) Uses approximately 70 types of messages, message type is indicated
> by "uchar", which is sent first.
> 3) Each message type has different number of parameters (typically
> longs and ints, sometimes strings).
> 4) Each message type has fixed number of parameters, different types
> frequently have different number of parameters.
> 5) Messages are being transferred using only 20 overloads where every
> single overload doesn't seem to correspond to single type of message.
> Overloads look like this (now you'll see the problem, I think):
> ---
> int PushMsg(uchar msg, ushort, int);
> int PushMsg(uchar msg, ushort, char *, int);
> int PushMsg(uchar msg, ushort, ushort, char *, int);
> int PushMsg(uchar msg, ushort, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, int);
> int PushMsg(uchar msg, ushort, uchar, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, uchar, int);
> int PushMsg(uchar msg, ushort, ushort, uchar, long, long, int);
> int PushMsg(uchar,short,ushort,ushort,uchar,int);
> int PushMsg(uchar,ushort,ushort,short,uchar,long,long,int);

Yeah, that is horrible WRT maintenance. It is easy to imagine a system
where you can construct a message and use simple polymorphism to
serialize it to the stream, e.g.

message.store(stream);

> 6) There are no structs defined for every message type.
> 7) In the end message data is being stored into one huge message
> struct (with few dozens of members) which is not a union and seems to
> contain every possible parameter.
> 8) Received messages are processed using huge switch/case.

Yes, but it does not seem that turning messages into commands will be
interesting (you mentioned command pattern). This is, I think, because
the idea of the command pattern is to execute the command at a later
stage or multiple times, which doesn't seem to be your case. (But
might be, further down the line; it's easy to imagine that messages
are put into some sort of execution queue; at that point, simple
application of the command pattern is warranted).

Goran.

Vladyslav Lazarenko

unread,
Nov 24, 2009, 3:38:16 PM11/24/09
to
On Nov 24, 12:27 am, SegFault <altctrlsys...@gmail.com> wrote:
> Hello.
>
> Situation:
> I'm currently converting a piece of older software (original code was
> written in C somewhere in 1990s, then was converted to C++ somewhere
> in 2000..2005, then I started working with the code in this year).
> Software uses message-based system I need to refactor/rewrite/convert/
> modify (client-server communication over tcp/udp). I need to refactor
> the system, because it is nearly unmaintainable, I need to modify
> communication protocol and change data types for some variables, which
> will break everything due to ambiguous type conversion.

Hello SegFault,

Here are some key frameworks I would definitely use to make this
system easy to maintain, scalable and to not spend the entire life
refactoring it:

* Boost ASIO - Cross-platform C++ framework for low-level I/O and
networking (http://www.boost.org/doc/libs/1_41_0/doc/html/
boost_asio.html)

Asio makes network programming really easy and straight forward. It
also usually does not take a long time to start using it.

* Google Protobuf - efficient serialization and deserialization. Can
handle removed/added fields other side doesn't know about. (http://
code.google.com/p/protobuf/)

You may also take a look at other C++ libraries, like the entire
Boost, POCO, ZeroC ICE etc.

As for log switch cases, that is almost a necessary evil. But C++ is
powerful enough to propose alternative solution. Here is my idea. If
you know list of supported messages in compile-time, you can create a
typelist. For each type, assign a unique unsigned ID starting from 0.
Then generate a table of template functions accepting a functor
(visitor, in our case) specialized for each type in the list mentioned
above. This all can be done in compile-time. So in run-time, once you
get that message ID, you just call a function stored in your array
with the offset equals the ID you just read. You may look at
Boost.Variant, for examples, this is something that does a different
thing, but is very close in terms of implementation (like typelists,
template recursion etc).

Hope it helps.

Regards,
Vlad

SegFault

unread,
Nov 26, 2009, 1:08:03 AM11/26/09
to
Thanks for all replies, they were useful.
I think now I have better idea about how to fix this part of code.
I do not have any more questions about this at the moment.
Detailed replies are below.

Nick Hounsome wrote:
> I sympathise but IMHO this is very "courageous".
> If it has any bugs it will be your fault.
> If you do a wonderful conversion management will just want to know why
> you cost them so much money to acheive something that already works.

Thanks for your concern, but I don't have any strict time limit for
this.
However I can't work on this forever.
Also, it looks like I can't avoid this problem - there are two
remaining tasks
that directly involve modification of network protocol and cannot be
done without
modifications in this part of code.

> I've done it a million times.
> Wrap each message in with a length and (optionally) a trailing CRC.
> Create a map from message ID to parser function (obviously cannot be
> ctor directly) passing the buffer and length.

Thanks, this idea will be helpful. I originally was going to pull
message type from buffer,
pass it to construction function, which will read required number of
bytes, but it looks like
saving message length might make it easier to debug problems later.
Especially because I don't have to preserve the protocol.

> You can avoid a master file that knows all messages and neeeds
> updating for every new one if you populate the map using static
> instances of a special class and resist the urge to have an enum:
>
> class MapFiller {
> MapFiller(int id, Message* (*parser)(const void* base,size_t
> length)) { msgmap[id] = parser; }
> }

I will think about it.

> I did say that you should be cautious about messing with this!

I'm using git repository with several branches, with new branch for
each dangerous conversion,
so I can always revert code to earlier state. It doesn't return time I
spent writing code, of course.

> If you want to leave this unchanged then you are going to need to map
> a message ID to a builder function taking varargs which will then call
> the ctor with the proper args.

I was going to remove this part to avoid problems. In some areas I
have to replace ints with custom data type, which has conversion
operators. As you can imagine, with current overloads this will cause
problem.

Thanks for advice, it was useful.

Goran wrote:
> Yes, but it does not seem that turning messages into commands will be
> interesting (you mentioned command pattern). This is, I think, because
> the idea of the command pattern is to execute the command at a later
> stage or multiple times, which doesn't seem to be your case. (But
> might be, further down the line; it's easy to imagine that messages
> are put into some sort of execution queue; at that point, simple
> application of the command pattern is warranted).

Thanks for advice/info/thoughts.
The reason why I thought about command pattern because I couldn't
think of any other way
to get rid of switch/case completely. I.e. if I create MessageID<-
>creation function map,
then it will have to return base type, which is ancestor for all
message structs/classes, or a void*.
Which means, that to process data from constructed message, I will
have to know into which struct
I should cast void* or base type. So in this case I will have elegant
message construction routine
(that will construct appropriate message from data in the buffer),
but
message processing routine will still have to use switch/case, which
will still
need MessageID to know into which structure void* (or base type)
should be cast.
This looks a bit clumsy, but it will require less code changes - there
is already huge switch/case
that processes messages. On other hand, If I turn message data into
command pattern,
I can create base class for every message struct, with virtual method
to process message (called "execute()" or whatever). In this case If
message construction function returns base class for all message, I
don't have
to cast anything - I can just call "execute()", and it will work. This
is more elegant, it will be easier to
extend protocol (if I need it), but it will also require more code
changes, which means higher probability of introducing a bug

Vladyslav Lazarenko wrote:
> On Nov 24, 12:27 am, SegFault <altctrlsys...@gmail.com> wrote:
> Hello SegFault,
>
> Here are some key frameworks I would definitely use to make this
> system easy to maintain, scalable and to not spend the entire life
> refactoring it:

Thank you, I will certainly check those frameworks for future
projects,
but right now I am going to try refactoring first. This piece of
software is
large, complicated and extremely messy, and if I completely replace
network functionality
with another library, there will be higher chance of a major problem
than if I try to
slowly modify the system, checking whether it still works after every
modification
(it still won't be completely possible with current changes). I
already did similar step-by-step
functionality replacement for other large part of this code, and it
looks like this strategy
was efficient enough.Also there is a chance that code have some bugs
which will
magically manifest themselves when I add external library (I already
spent a lot of
time fixing heap corruptions and multithreaded problems like deadlocks/
races in this code).
Of course, If my refactoring attempts fails, I will try frameworks you
mentioned.

> As for log switch cases, that is almost a necessary evil. But C++ is
> powerful enough to propose alternative solution. Here is my idea.

> ....
>
> Hope it helps.
Thanks, I think it gave me some new ideas.

0 new messages