Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

string compare

6 views
Skip to first unread message

yeti

unread,
Jan 23, 2008, 12:42:37 AM1/23/08
to
Hi guys,

I am using custom string structures in a project.

typedef struct{
short int length;
char data[256];
}my_long_string;

and

typedef struct{
short int length;
char data[32];
}my_short_string;

I want to create string processing functions like strcmp, strcpy etc
for these types.

While I can create different functions for these two types, is there
any way to use same function to handle both types ??
With C++ this would have been easy ... I'd have to just overload a
function, but since function overloading is not supported in C is
there a way/technique which I can use to simulate similar behaviour ??

regards

Rohin

santosh

unread,
Jan 23, 2008, 1:48:22 AM1/23/08
to
yeti wrote:

Yes. You can have the caller of the library routines specify which type
is being passed to your functions and act appropriately. Or you could
place an identifier within those types and process them after testing
the identifier to find out the type. Your routines could be defined as
taking void * arguments, which can point to any type of data.

But I would make the types dynamic to start with, so that multiple types
can be avoided. Then the array can grow as needed.

yeti

unread,
Jan 23, 2008, 2:08:39 AM1/23/08
to
On Jan 23, 11:48 am, santosh <santosh....@gmail.com> wrote:
> yeti wrote:
> > Hi guys,
>
> > I am using custom string structures in a project.
>
> > typedef struct{
> > short int length;
> > char data[256];
> > }my_long_string;
>
> > and
>
> > typedef struct{
> > short int length;
> > char data[32];
> > }my_short_string;
>
> > I want to create string processing functions like strcmp, strcpy etc
> > for these types.
>
> > While I can create different functions for these two types, is there
> > any way to use same function to handle both types ??
> > With C++ this would have been easy ... I'd have to just overload a
> > function, but since function overloading is not supported in C is
> > there a way/technique which I can use to simulate similar behaviour ??
>
> Yes. You can have the caller of the library routines specify which type
> is being passed to your functions and act appropriately.
Well what if someone wants to compare a short string with a long
string.
What would the caller of the function specify??
Also if caller has to specify a flag to indicate the operation I'd
better be using different functions.
i.e
short_strcmp(my_short_string * s1, my_short_string * s2);
instead of
my_strcmp(void* s1, void * s2, short flag);

What I mean is that using a flag doesn't serve my purpose well. I want
the caller of the function to be oblivious of the fact that s/he is
comparing a long or a short string.

> Or you could
> place an identifier within those types and process them after testing
> the identifier to find out the type. Your routines could be defined as
> taking void * arguments, which can point to any type of data.

could you please give a small code snippet to explain things.


>
> But I would make the types dynamic to start with, so that multiple types
> can be avoided. Then the array can grow as needed.
>
>

Creating the types dynamic would put in problems like memory leaks. I
don't think it would be safe.

jacob navia

unread,
Jan 23, 2008, 3:24:46 AM1/23/08
to

You can download such a library from the lcc-win download
site. It uses the proposed extensions of lcc-win for the C language.

That string library features Strcmp, Strcat, etc.

Source code is included
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

CBFalconer

unread,
Jan 23, 2008, 3:00:38 AM1/23/08
to
yeti wrote:
>
> I am using custom string structures in a project.
>
> typedef struct {
> short int length;
> char data[256];
> } my_long_string;
>
> and
>
> typedef struct {
> short int length;
> char data[32];
> } my_short_string;
>
> I want to create string processing functions like strcmp,
> strcpy etc for these types.

I add indentation to your typedefs. Assuming your strings don't
need to handle the char '\0' you don't need to do anything. Just
stuff the data portion with C normal zero terminated strings. You
know that a long string can hold anything up to length 255, while a
short is limited to 31 chars. Then your compare etc. routines just
extract pointers to the data field from both and pass those to the
standard routines.

You might be better off making the structs completely common by:

typedef struct my_string {
size_t max_length, length;
char *data;
}

which is a fixed size, and serves you by not needing to eternally
recompute a length. However you do have to malloc space for *data
to point to, and record that maximum in max_length.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Malcolm McLean

unread,
Jan 23, 2008, 6:22:53 AM1/23/08
to

"yeti" <rohin...@gmail.com> wrote in message
Sadly no.
You are creating an N squared problem as you add more string types. Ypu can
get round it by converting everything to an intermediate type, but then you
will lose the speed that was the motive for the new types in the first
place.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Mark Bluemel

unread,
Jan 23, 2008, 6:54:43 AM1/23/08
to
yeti wrote:
> On Jan 23, 11:48 am, santosh <santosh....@gmail.com> wrote:
>> yeti wrote:
>>> Hi guys,
>>> I am using custom string structures in a project.
>>> typedef struct{
>>> short int length;
>>> char data[256];
>>> }my_long_string;
>>> and
>>> typedef struct{
>>> short int length;
>>> char data[32];
>>> }my_short_string;
...
>> ... I would make the types dynamic to start with, so that multiple types

>> can be avoided. Then the array can grow as needed.
>>
> Creating the types dynamic would put in problems like memory leaks. I
> don't think it would be safe.

I think Santosh was suggesting a structure approach like this :-

typedef struct {
short length;
char data[]; /* or "char data[1];" if c99 support isn't available */
} my_string;

so if you had a string of 32 characters to work with, you'd do something
like:-

my_string *string_pointer = malloc(sizeof(short) + 32);
/* check the return value */

string_pointer->length = 32;
memcpy(string_pointer->data,<some source>,32);

I'm not sure what risks you perceive in this approach.

James Kuyper

unread,
Jan 23, 2008, 8:08:14 AM1/23/08
to
Mark Bluemel wrote:
...

> I think Santosh was suggesting a structure approach like this :-
>
> typedef struct {
> short length;
> char data[]; /* or "char data[1];" if c99 support isn't available */
> } my_string;
>
> so if you had a string of 32 characters to work with, you'd do something
> like:-
>
> my_string *string_pointer = malloc(sizeof(short) + 32);

That assumes there are no padding bytes between length and data. The
right way to calculate the allocation is

#include <stddef.h>
...
mystring *string_pointer = malloc(offsetof(my_string, data) + 32);

Mark Bluemel

unread,
Jan 23, 2008, 8:14:59 AM1/23/08
to

I had a sneaking suspicion I'd missed something. Thank you for the
correction.

David Resnick

unread,
Jan 23, 2008, 8:48:58 AM1/23/08
to

I'm not endorsing this, and it is useless if you actually want safety
(as in you want to know the real type of the structure inside
functions, say, to know the extent of the data segment). I don't
recommend this btw, just was saying it is possible, and as far as I
know is legal too. If not, I'll be corrected VERY shortly no
doubt... If you implement strcpy this way, the CALLER will have to be
the one to guarantee that the target string's data segment is at least
as big as the sources, which, well, is sort of dodgy.

#include <assert.h>
#include <stddef.h>
#include <string.h>
#include <stdio.h>

typedef struct
{
short int length;
char data[256];
} my_long_string;

typedef struct
{
short int length;
char data[32];
} my_short_string;


int my_strcmp(const void* str1, const void* str2)
{
const char* str1_data = (const char*)((char*)str1 +
offsetof(my_long_string,
data));
const char* str2_data = (const char*)((char*)str2 +
offsetof(my_long_string,
data));
return strcmp(str1_data, str2_data);
}

int main(void)
{
my_long_string str1;
my_short_string str2;

/* I don't think it is possible for this to fail, is it? */
assert(offsetof(my_long_string,data) == offsetof(my_short_string,
data));

str1.length = 6; /* or 5, depending on semantics being used */
strcpy(str1.data, "hello");

str2.length = 7; /* or 6... */
strcpy(str2.data, "hello2");

printf("my_strcmp returns %d\n", my_strcmp(&str1, &str2));

return 0;
}

Again, being possible doesn't make it a good idea or a good design.
You could use this (void*/casting) approach and have the structures
have two initial common fields, one with the max length as someone
else suggested to add some safety.

-David

David Resnick

unread,
Jan 23, 2008, 8:56:33 AM1/23/08
to

n.b. I was assuming the data was NUL terminated strings, if not you'd
need to extract the length in the my_strcmp as well and make use of
that as well. Also can be done.

-David

Flash Gordon

unread,
Jan 23, 2008, 1:24:57 PM1/23/08
to
jacob navia wrote, On 23/01/08 08:24:

> yeti wrote:
>> Hi guys,
>>
>> I am using custom string structures in a project.
>>
>> typedef struct{
>> short int length;
>> char data[256];
>> }my_long_string;

<snip>

> You can download such a library from the lcc-win download
> site. It uses the proposed extensions of lcc-win for the C language.
>
> That string library features Strcmp, Strcat, etc.
>
> Source code is included

Note that as it relies on an extension which is, to the best of my
knowledge, unique to lcc-win you will only be able to use the library if
you are prepared to restrict yourself to lcc-win.
--
Flash Gordon

Army1987

unread,
Jan 23, 2008, 4:03:11 PM1/23/08
to
yeti wrote:

> Hi guys,
>
> I am using custom string structures in a project.
>
> typedef struct{
> short int length;
> char data[256];
> }my_long_string;
>
> and
>
> typedef struct{
> short int length;
> char data[32];
> }my_short_string;
>
> I want to create string processing functions like strcmp, strcpy etc
> for these types.

If offsetof(my_long_string, data) equals offsetof(my_short_string, data)
you can do:

int my_strcmp(const void *a, const void *b)
{
short int a_length = *(const short int *)a;
short int b_length = *(const short int *)b;
const char *a_data = (const char *)a + offsetof(my_long_string, data);
const char *b_data = (const char *)b + offsetof(my_long_string, data);
if (a_length < b_length)
return -1;
else if (a_length > b_length)
return +1;
else
return memcmp(a_data, b_data, a_length);
}

void my_strcpy(void *target, const void *source)
{
*(short int *)target = *(const short int *)source;
memcpy((char *)target + offsetof(my_long_string, data),
(const char *)target + offsetof(my_long_string, data),
*(const short int *)source);
}
Of course, it causes UB if you try to copy a string larger than the
destination array, but so does the "real" strcpy.

--
Army1987 (Replace "NOSPAM" with "email")

CBFalconer

unread,
Jan 23, 2008, 8:55:01 PM1/23/08
to
James Kuyper wrote:
> Mark Bluemel wrote:
> ...
>> I think Santosh was suggesting a structure approach like this
>>
>> typedef struct {
>> short length;
>> char data[]; /* or "char data[1];" if no c99 support */

>> } my_string;
>>
>> so if you had a string of 32 characters to work with, you'd
>> do something like:-
>>
>> my_string *string_pointer = malloc(sizeof(short) + 32);
>
> That assumes there are no padding bytes between length and data.
> The right way to calculate the allocation is

Since 'data' is a char field there will be no such padding bytes.

James Kuyper

unread,
Jan 23, 2008, 11:32:17 PM1/23/08
to
CBFalconer wrote:
> James Kuyper wrote:
>> Mark Bluemel wrote:
>> ...
>>> I think Santosh was suggesting a structure approach like this
>>>
>>> typedef struct {
>>> short length;
>>> char data[]; /* or "char data[1];" if no c99 support */
>>> } my_string;
>>>
>>> so if you had a string of 32 characters to work with, you'd
>>> do something like:-
>>>
>>> my_string *string_pointer = malloc(sizeof(short) + 32);
>> That assumes there are no padding bytes between length and data.
>> The right way to calculate the allocation is
>
> Since 'data' is a char field there will be no such padding bytes.

The standard imposes no such requirement, though you'll probably be
right on most implementations. It's best to get used to using the
offsetof() idiom consistently, rather than trying to decide in each
particular case whether or not you can rely upon fragile
implementation-specific assumptions about padding.

James Antill

unread,
Jan 28, 2008, 10:25:43 AM1/28/08
to

This is required to be the same as sizeof(mystring), which is much more
readable IMNSHO. Or you could just use any of a number of pre-made
string APIs:

http://www.and.org/vstr/comparison

--
James Antill -- ja...@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/

james...@verizon.net

unread,
Jan 30, 2008, 6:32:54 PM1/30/08
to
James Antill wrote:
> On Wed, 23 Jan 2008 13:08:14 +0000, James Kuyper wrote:
>
> > Mark Bluemel wrote:
> > ...
> >> I think Santosh was suggesting a structure approach like this :-
> >>
> >> typedef struct {
> >> short length;
> >> char data[]; /* or "char data[1];" if c99 support isn't available
> >> */
> >> } my_string;
> >>
> >> so if you had a string of 32 characters to work with, you'd do
> >> something like:-
> >>
> >> my_string *string_pointer = malloc(sizeof(short) + 32);
> >
> > That assumes there are no padding bytes between length and data. The
> > right way to calculate the allocation is
> >
> > #include <stddef.h>
> > ...
> > mystring *string_pointer = malloc(offsetof(my_string, data) + 32);
>
> This is required to be the same as sizeof(mystring),

Citation, please?

To give you a conceptual base for thinking about this, consider an
implementation where sizeof(short)==2, shorts are required to be
aligned on even addresses, char arrays (regardless of length,
including as a special case flexible arrays) have no alignment
restrictions, and structures are required to be aligned on addresses
that are multiples of 16 bytes, with the result that all structure
types much have a size that is a multiple of 16. I don't know for sure
if any implementation has all of those features, but I know that there
are implementations which have each of those features, and the first
three features are quite commonplace. If no padding were used between
'length' and 'data', then offsetof(my_string, data) would be 2, but
sizeof(mystring) would be 16. Do you believe that the standard imposes
any requirements which would be violated by such an implementation?

David Thompson

unread,
Feb 3, 2008, 9:21:22 PM2/3/08
to

(s/my/&_/)

Or in the C99 FAM case only, terser but arguably _less_ clear:
sizeof(my_string) +n or sizeof *string_pointer +n. 6.7.2.1p16.

OTOH for the multiple specific types in the OP and elsethread, you can
be assured all of the data offsets are the same if you declare a union
(type) containing the individual types (even if you never use it) and
in practice you can be pretty sure they're the same even without this.

- formerly david.thompson1 || achar(64) || worldnet.att.net

james...@verizon.net

unread,
Feb 4, 2008, 9:21:36 AM2/4/08
to
David Thompson wrote:
> On Wed, 23 Jan 2008 13:08:14 GMT, James Kuyper
> <james...@verizon.net> wrote:
>
> > Mark Bluemel wrote:
> > ...
> > > I think Santosh was suggesting a structure approach like this :-
> > >
> > > typedef struct {
> > > short length;
> > > char data[]; /* or "char data[1];" if c99 support isn't available */
> > > } my_string;
> > >
> > > so if you had a string of 32 characters to work with, you'd do something
> > > like:-
> > >
> > > my_string *string_pointer = malloc(sizeof(short) + 32);
> >
> > That assumes there are no padding bytes between length and data. The
> > right way to calculate the allocation is
> >
> > #include <stddef.h>
> > ...
> > mystring *string_pointer = malloc(offsetof(my_string, data) + 32);
>
> (s/my/&_/)

I was able to puzzle that out; but I suspect that most people who
aren't familiar with vi or sed are going to have a lot of trouble
figuring out that you're telling me I left out a '_' when I typed
"mystring".

> Or in the C99 FAM case only, terser but arguably _less_ clear:
> sizeof(my_string) +n or sizeof *string_pointer +n. 6.7.2.1p16.

Using sizeof(my_string) could result in overallocation, as I implied
in my earlier response to James Antill. Overallocation is not a
serious error, but only wasteful, unless the overallocation prevents
memory from being allocated, but using offsetof() avoids the
overallocation. 6.7.2.1p17 uses sizeof() rather than offsetof() - but
it's a non-normative example, and I believe that it should be
corrected.

James Antill

unread,
Feb 7, 2008, 1:51:13 AM2/7/08
to
On Wed, 30 Jan 2008 15:32:54 -0800, jameskuyper wrote:

> James Antill wrote:
>> On Wed, 23 Jan 2008 13:08:14 +0000, James Kuyper wrote:
>>
>> > Mark Bluemel wrote:
>> > ...
>> >> I think Santosh was suggesting a structure approach like this :-
>> >>
>> >> typedef struct {
>> >> short length;
>> >> char data[]; /* or "char data[1];" if c99 support isn't
>> >> available */
>> >> } my_string;
>> >>
>> >> so if you had a string of 32 characters to work with, you'd do
>> >> something like:-
>> >>
>> >> my_string *string_pointer = malloc(sizeof(short) + 32);
>> >
>> > That assumes there are no padding bytes between length and data. The
>> > right way to calculate the allocation is
>> >
>> > #include <stddef.h>
>> > ...
>> > mystring *string_pointer = malloc(offsetof(my_string, data) +
>> > 32);
>>
>> This is required to be the same as sizeof(mystring),
>
> Citation, please?

6.7.2.1

#16

As a special case, the last element of a structure with more than
one named member may have an incomplete array type; [...] the
size of the structure shall be equal to the offset of the last
element of an otherwise identical structure that replaces the
flexible array member with an array of unspecified length

#17 even gives an example basically identical to what I posted.

Harald van Dijk

unread,
Feb 7, 2008, 2:01:48 PM2/7/08
to
On Thu, 07 Feb 2008 06:51:13 +0000, James Antill wrote:
> On Wed, 30 Jan 2008 15:32:54 -0800, jameskuyper wrote:
>> James Antill wrote:
>>> On Wed, 23 Jan 2008 13:08:14 +0000, James Kuyper wrote:
>>> > Mark Bluemel wrote:
>>> >> typedef struct {
>>> >> short length;
>>> >> char data[]; /* or "char data[1];" if c99 support isn't
>>> >> available */
>>> >> } my_string;
>>> >
>>> > #include <stddef.h>
>>> > ...
>>> > mystring *string_pointer = malloc(offsetof(my_string, data) +
>>> > 32);
>>>
>>> This is required to be the same as sizeof(mystring),
>>
>> Citation, please?
>
> 6.7.2.1
>
> #16
>
> As a special case, the last element of a structure with more than one
> named member may have an incomplete array type; [...] the size of the
> structure shall be equal to the offset of the last element of an
> otherwise identical structure that replaces the flexible array member
> with an array of unspecified length

This states that sizeof(my_string) must be equal to offsetof(struct {
short length;
char data[n];
}, data) for some fixed n, but that might be different from
offsetof(my_string, data).

> #17 even gives an example basically identical to what I posted.

#17 even explicitly mentions that the offset of a flexible array member
might differ from the offset of a non-flexible array member.

0 new messages