Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

One of my first C functions - Please comment

1 view
Skip to first unread message

Mark Hobley

unread,
Mar 7, 2010, 11:08:02 PM3/7/10
to
I am writing a C function that will be utilized by a pathwalker. I am fairly
new to C programming. The purpose of the function is to look at a path variable
and determine the longest element within it. This is for a Linux based system
and the function supports paths with a colon prefixed with an backslash escape
character. (Presumably I need to support this on Linux, because directory
names can contain a colon).

The code currently contains a rather ugly "goto treatasdefault" within the
switch constuct, and I am wondering if this can be rearranged to be more
presentable.

If I am in one branch of a switch, is there any mechanism that will allow me to
conditionally jump to the default branch, or is goto the only way? I know
that I can fallthrough, by eliminating break, but this will just fallthrough
to the next switch branch. I really want to jump to default.

The code is not yet complete and has not yet been tested.

sconst.h:

#ifndef SCONST_H
#define SCONST_H

#define DIRSEP '/'
#define PATHSEP ':'
#define ESCSEQ '\'

#endif

lonp.c

/* ***************************************************************************

Filename: lonp.c Version: 0.0.1 Date: 96/03/2010

Details: This function will return the length of the longest value within
a path variable. This is typically used within path walking applications.

Written by Mark Hobley.

(c) Copyright 2010 Mark Hobley.

This file can be redistributed under the terms of version 2 of the
GNU General Public Licence as published by the Free Software Foundation.

It is prohibited to include any part of this software into any other
software product or other work, not covered by version 2 of the
GNU General Public Licence.

*************************************************************************** */

/* ************************** Standard Includes *************************** */

#include <stddef.h> /* Required for NULL */

/* **************************** Other Includes **************************** */

#include "lonp.h"

/* ******************************* Functions ****************************** */

int longestpath(char *pathdata) {
char *element = pathdata;
char c;
int eoe = 0; /* False */
int lastescseq = 0; /* False */
int len = 0, longest = 0;

while (!eoe) {
c = element
switch (c) {
case NULL:
eoe = 1; /* True */
if (len > longest) {
longest = len;
}
break;
case ESCSEQ:
if (lastescseq != 0) {
lastescseq = 0
goto treatasdetault;
}
case PATHSEP:
if (lastescseq != 0) {
goto treatasdefault;
}
if (len > longest) {
longest = len;
}
len = 0; /* Reset Length */
break;
default:
treatasdefault:
++len;
break;
}
++element;
}
return(longest);
}


--
Mark Hobley
Linux User: #370818 http://markhobley.yi.org/

Keith Thompson

unread,
Mar 8, 2010, 12:14:08 AM3/8/10
to
markh...@hotpop.donottypethisbit.com (Mark Hobley) writes:
[...]

> The code is not yet complete and has not yet been tested.

Apparently it hasn't even been compiled.

> sconst.h:
>
> #ifndef SCONST_H
> #define SCONST_H
>
> #define DIRSEP '/'
> #define PATHSEP ':'
> #define ESCSEQ '\'

This is a syntax error (or rather, any attempt to use this macro is a
syntax error). The character constant for a backslash is '\\'.

> #endif
>
> lonp.c
>
[snip]


> (c) Copyright 2010 Mark Hobley.
>
> This file can be redistributed under the terms of version 2 of the
> GNU General Public Licence as published by the Free Software Foundation.
>
> It is prohibited to include any part of this software into any other
> software product or other work, not covered by version 2 of the
> GNU General Public Licence.

<OT>
I'm not sure this prohibition is consistent with the GPL. This isn't
the place to go into details, but a good start would be to carefully
read the GPL itself.
</OT>

[...]
>
> #include "lonp.h"

Is it lonp.h or sconst.h?

> /* ******************************* Functions ****************************** */
>
> int longestpath(char *pathdata) {
> char *element = pathdata;
> char c;
> int eoe = 0; /* False */
> int lastescseq = 0; /* False */
> int len = 0, longest = 0;
>
> while (!eoe) {
> c = element

c is of type char; element is of type char*. Your compiler should
have complained about this -- and about the missing semicolon.

> switch (c) {
> case NULL:

NULL is a null pointer constant, not a character constant. The null
character is '\0'. (Your compiler might not have complained about
this if your <stddef.h> defines NULL as 0.)

> eoe = 1; /* True */
> if (len > longest) {
> longest = len;
> }
> break;
> case ESCSEQ:
> if (lastescseq != 0) {
> lastescseq = 0

Another missing semicolon.

> goto treatasdetault;
> }
> case PATHSEP:
> if (lastescseq != 0) {
> goto treatasdefault;
> }
> if (len > longest) {
> longest = len;
> }
> len = 0; /* Reset Length */
> break;
> default:
> treatasdefault:
> ++len;
> break;
> }
> ++element;
> }
> return(longest);
> }

If you want help with your program logic, please correct any
compilation errors before posting your code.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Mark Hobley

unread,
Mar 8, 2010, 4:08:07 AM3/8/10
to
Keith Thompson <ks...@mib.org> wrote:
> Apparently it hasn't even been compiled.

Yeah. Sorry Keith. I hadn't actually finished coding. I never thought about
compiling as is. It was a good idea though.

>> #define ESCSEQ '\'
>
> This is a syntax error (or rather, any attempt to use this macro is a
> syntax error). The character constant for a backslash is '\\'.

Ok. I fixed this.

#define ESCSEQ '\\'

> It is prohibited to include any part of this software into any other
> software product or other work, not covered by version 2 of the
> GNU General Public Licence.

>> I'm not sure this prohibition is consistent with the GPL. This isn't


>> the place to go into details, but a good start would be to carefully
>> read the GPL itself.

Ok. I have read GPL. I am not sure what you think is wrong here. I will ask
about that in a software licencing related group.

#include "sconst.h"
>> #include "lonp.h"

>
> Is it lonp.h or sconst.h?

Oops! It's both

>> /* ******************************* Functions ****************************** */
>>
>> int longestpath(char *pathdata) {
>> char *element = pathdata;
>> char c;
>> int eoe = 0; /* False */
>> int lastescseq = 0; /* False */
>> int len = 0, longest = 0;
>>
>> while (!eoe) {
>> c = element
>
> c is of type char; element is of type char*. Your compiler should
> have complained about this -- and about the missing semicolon.

Missing semicolon fixed. I was expecting a derefence here. I expected c to
take the value pointed to by *element. Right, I will do some more reading
to find out how to get the derefence to work.

>> case NULL:
>
> NULL is a null pointer constant, not a character constant.

Ok. That wasn't clear at all (from documentation). I have now added a line to
lonp.h:

#define NULLCHR '\0'

and changed NULL to NULLCHR

>> lastescseq = 0
>
> Another missing semicolon.

Ok. Fixed. Cheers.

> If you want help with your program logic, please correct any
> compilation errors before posting your code.

Right. I have still got to look at that derefence and I will do a preliminary
compile.

Thanks for your help so far.

Mark.

Mark Hobley

unread,
Mar 8, 2010, 4:08:07 AM3/8/10
to
Keith Thompson <ks...@mib.org> wrote:
> I'm not sure this prohibition is consistent with the GPL. This isn't
> the place to go into details, but a good start would be to carefully
> read the GPL itself.

Ok. I have now asked about this in comp.software.licensing

Regards,

Mark.

Malcolm McLean

unread,
Mar 8, 2010, 4:42:05 AM3/8/10
to
On 8 Mar, 04:08, markhob...@hotpop.donottypethisbit.com (Mark Hobley)
wrote:


It's knock-up code. The trick in making in neat is to use the standard
library functions to do some of the messiness for you. Here's an
untested version.

int longestpath(char *pathname, char sep)
{
char *ptr = pathname;
int answer = 0;
int dist;
char *sepptr;

while(ptr)
{
sepptr = strchr(ptr, sep);
/* this is a nasty bit, exclude escapes */
while(sepptr > ptr && *(sepptr-1) == ESCSEQ))
septpr = strrchr(sepptr+1, sep);

if(sepptr)
dist = sepptr - ptr -1;
else
dist = strlen(ptr);
if(dist > answer)
answer = dist;
ptr = sepptr;
if(ptr)
ptr = ptr + 1;
}
}

answer will give length, including anyescape characters.

A question for mediumbies: why do we say ptr = ptr +1 rather than ptr+
+?

Nick

unread,
Mar 8, 2010, 2:14:40 PM3/8/10
to
Malcolm McLean <malcolm...@btinternet.com> writes:

I haven't a clue. Any why not ptr += 1?

Of course, you could do:
[snip]


> if(sepptr)
{
> dist = sepptr - ptr -1;

sepptr += 1;


}
> else
> dist = strlen(ptr);
> if(dist > answer)
> answer = dist;
> ptr = sepptr;
> }

Yes, you could avoid adding a line by
dist = sepptr++ - ptr -1;
but that seems too confusing.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

Richard

unread,
Mar 8, 2010, 2:48:55 PM3/8/10
to
Nick <3-no...@temporary-address.org.uk> writes:

Horrible. I have almost never seen that in years of C programming. The
idiom in C is to use ++ for pointer advancement.

> }
>> else
>> dist = strlen(ptr);
>> if(dist > answer)
>> answer = dist;
>> ptr = sepptr;
>> }
>
> Yes, you could avoid adding a line by
> dist = sepptr++ - ptr -1;
> but that seems too confusing.

Really? Certainly not the ++ part. If you dont understan that then you
should not be programming in C.

What I find confusing there is the lack of brackets and need to work out
which way it works in regard to the "-" pair. Or rather I KNOW at a
later date it will confuse others. I never learnt precedence rules and
always bracket to make it explicit.


--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c

pete

unread,
Mar 8, 2010, 4:18:48 PM3/8/10
to
Malcolm McLean wrote:

> if(ptr)
> ptr = ptr + 1;
> }
> }
>
> answer will give length, including anyescape characters.
>
> A question for mediumbies: why do we say ptr = ptr +1 rather than ptr+
> +?

Because it's our first day programming
and we haven't learned the increment operator yet.

I prefer the prefix form, when the expression is not a subexpression:
++ptr;

--
pete

Eric Sosman

unread,
Mar 8, 2010, 4:37:53 PM3/8/10
to
On 3/8/2010 2:14 PM, Nick wrote:
> Malcolm McLean<malcolm...@btinternet.com> writes:
>
>> It's knock-up code. The trick in making in neat is to use the standard
>> library functions to do some of the messiness for you. Here's an
>> untested version.
>>
>> int longestpath(char *pathname, char sep)
>> {
>> char *ptr = pathname;
>> int answer = 0;
>> int dist;
>> char *sepptr;
>>
>> while(ptr)
>> {
>> sepptr = strchr(ptr, sep);
>> /* this is a nasty bit, exclude escapes */
>> while(sepptr> ptr&& *(sepptr-1) == ESCSEQ))
>> septpr = strrchr(sepptr+1, sep);

Is it "Spot the UB Week" again already? Relational
operators are not defined for null pointer operands (6.5.8p5).

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,
Mar 8, 2010, 7:16:46 PM3/8/10
to
Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 3/8/2010 2:14 PM, Nick wrote:
>> Malcolm McLean<malcolm...@btinternet.com> writes:
>>
>>> It's knock-up code. The trick in making in neat is to use the standard
>>> library functions to do some of the messiness for you. Here's an
>>> untested version.
>>>
>>> int longestpath(char *pathname, char sep)
>>> {
>>> char *ptr = pathname;
>>> int answer = 0;
>>> int dist;
>>> char *sepptr;
>>>
>>> while(ptr)
>>> {
>>> sepptr = strchr(ptr, sep);
>>> /* this is a nasty bit, exclude escapes */
>>> while(sepptr> ptr&& *(sepptr-1) == ESCSEQ))

typo: syntax error from too many )s
>>> septpr = strrchr(sepptr+1, sep);
typo: var should be 'sepptr'


>
> Is it "Spot the UB Week" again already? Relational
> operators are not defined for null pointer operands (6.5.8p5).

So is falling off the end of the end of non-void function. There's
also an off-by-one error in the logic and escapes can't be escaped.
(I don't know if that last part is in the design but I think it should
be.) I know this was just Malcolm showing the overall idea but I
think that that very idea deserves some discussion.

There seems to be some interaction between C programmers and character
twiddling code that promotes monolithic functions with tricky logic
and, as a result, buggy code. I don't draw this conclusion from the
postings in this thread but from my own sorry experience. I have,
rather late in the day, started to apply normal programming design
criteria to these sorts of string fiddling problems. With the right
helper functions the code is often cleaner.

Here is how I'd go about it these days. It is another example where
strchr is less than ideal in that it would help if strchr returned a
pointer to the terminating null when no matching character is found so
I'll will write a function to do that.

Checking for escapes can be done inefficiently -- I only look when I
find a separator and multiple consecutive \s will be rare so I won't
bother to keep track of the "escape state" the way I often used to in
code like this.

I think this is one of those relatively rare cases where there is a
natural fit to do ... while loop rather than a while, so that's the
way I've gone:

const char *find_chr_or_eos(const char *s, char chr)
{
while (*s && *s != chr) ++s;
return s;
}

int is_escaped(const char *cp, const char *start, char esc)
{
int escaped = 0;
if (cp > start)
while (--cp >= start && *cp == esc)
escaped = !escaped;
return escaped;
}

const char *find_unescaped_chr_or_eos(const char *s, char chr, char esc)
{
const char *cp = s;
while (*(cp = find_chr_or_eos(cp, chr)) && is_escaped(cp, s, esc))
++cp;
return cp;
}

size_t longestpath(const char *path, char sep)
{
size_t max_len = 0;
const char *end;
do {
end = find_unescaped_chr_or_eos(path, sep, '\\');
if (end - path > max_len)
max_len = end - path;
path = end + 1;
} while (*end);
return max_len;
}

I am sure this is old hat to most people here, but as a convert away
from the sort of loop the OP presented I though it worth posting. It
will server me right if there are half a dozen bug in this but I
claim that, written this way, they will at least be easier to find!

--
Ben.

Peter Nilsson

unread,
Mar 8, 2010, 7:39:18 PM3/8/10
to
Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> Eric Sosman <esos...@ieee-dot-org.invalid> writes:
> >     Is it "Spot the UB Week" again already?  Relational
> > operators are not defined for null pointer operands (6.5.8p5).
>
> So is falling off the end of the end of non-void function.

That in itself is not UB, unless the calling function attempts
to use the return value. [6.9.1p12]

--
Peter

Ben Bacarisse

unread,
Mar 8, 2010, 8:28:33 PM3/8/10
to
Peter Nilsson <ai...@acay.com.au> writes:

Touche[1]! Not much chance of that for this sort of function though,
but a good catch none the less.

[1] I'll resist using UTF-8.

--
Ben.

Mark Hobley

unread,
Mar 15, 2010, 10:08:02 PM3/15/10
to
Mark Hobley <markh...@hotpop.donottypethisbit.com> wrote:

> while (!eoe) {
> c = *element;
> switch (c) {
> case NULLCHR:


> eoe = 1; /* True */
> if (len > longest) {
> longest = len;
> }
> break;

<snip>
> }
> ++element; <----
> } |

I have another query. If *element at the NUL character at the end of the
string, the final increment will take this pointer beyond the end of the
string. This does not matter to me, because the loop is ending, and
*element no longer has any relevance. However, is it actually legal to
increment beyond the end of the string, or do I need a trap here?
eg:

if (eoe == 0) {
++element;
}

The program runs fine without the trap AFAIK, but am I "breaking the rules"
here?

(In other words is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules to attempt to
subsequently dereference the now invalid pointer?)

Mark.

Peter Nilsson

unread,
Mar 15, 2010, 10:39:15 PM3/15/10
to
markhob...@hotpop.donottypethisbit.com (Mark Hobley) wrote:
> (... is the act of incrementing the pointer beyond the string

> boundary against the rules, or is it only breaking the rules
> to attempt to subsequently dereference the now invalid pointer?)

You can reference one byte beyond an array, but not dereference
it.

--
Peter

Ben Bacarisse

unread,
Mar 15, 2010, 10:39:14 PM3/15/10
to
markh...@hotpop.donottypethisbit.com (Mark Hobley) writes:

> Mark Hobley <markh...@hotpop.donottypethisbit.com> wrote:
>
>> while (!eoe) {
>> c = *element;
>> switch (c) {
>> case NULLCHR:
>> eoe = 1; /* True */
>> if (len > longest) {
>> longest = len;
>> }
>> break;
> <snip>
>> }
>> ++element; <----
>> } |
>
> I have another query. If *element at the NUL character at the end of the
> string, the final increment will take this pointer beyond the end of the
> string. This does not matter to me, because the loop is ending, and
> *element no longer has any relevance. However, is it actually legal to
> increment beyond the end of the string, or do I need a trap here?

No, it's safe.

<snip>


> (In other words is the act of incrementing the pointer beyond the string
> boundary against the rules, or is it only breaking the rules to attempt to
> subsequently dereference the now invalid pointer?)

It might be safe to do even that, since the null character may not be
at the end of the object, but that is a special case. You are right
that it is only de-referencing one of these one-past-the-end pointers
that is prohibited. This applies to none array objects as well:

int i, *ip = &i;
if (++ip > &i) puts("safe and always true");

<snip>
--
Ben.

Nick Keighley

unread,
Mar 16, 2010, 4:39:30 AM3/16/10
to
On 16 Mar, 02:08, markhob...@hotpop.donottypethisbit.com (Mark Hobley)
wrote:

> ([...] is the act of incrementing the pointer beyond the string


> boundary against the rules, or is it only breaking the rules to attempt to
> subsequently dereference the now invalid pointer?)

its a special case allowed by the standard

int main (void)
{
int a [10];
int *pi;
int i;

pi = &a[9]; /* last entry ok */
i = *pi; /* ok */
pi = &a[10]; /* one past- ok */
i = *pi; /* BAD */

pi = &a[-1]; /* BAD */

return 0;
}

its invalid to even form the address of something outside the range
0..10

BruceS

unread,
Mar 16, 2010, 10:46:41 AM3/16/10
to
On Mar 16, 2:39 am, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:

C&V please? I didn't know this one, and I can't seem to find it, at
least in C99. My search skills are sometimes suboptimal :<

Eric Sosman

unread,
Mar 16, 2010, 11:46:06 AM3/16/10
to
On 3/16/2010 10:46 AM, BruceS wrote:
> On Mar 16, 2:39 am, Nick Keighley<nick_keighley_nos...@hotmail.com>
> wrote:
>> [...]
>> int a [10];
>> [...]

>> its invalid to even form the address of something outside the range
>> 0..10
>
> C&V please? I didn't know this one, and I can't seem to find it, at
> least in C99. My search skills are sometimes suboptimal :<

6.5.6p8 describes the situations where adding pointers and
integers (including negative integers) are valid. Situations
which would lead to a subscript less than [0] or greater than [10]
are not in the list, hence undefined by omission. (4.1p2 says
that this is just as undefined as if there were a specific ban.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Kenneth Brody

unread,
Mar 16, 2010, 12:14:38 PM3/16/10
to
On 3/16/2010 10:46 AM, BruceS wrote:
> On Mar 16, 2:39 am, Nick Keighley<nick_keighley_nos...@hotmail.com>
> wrote:
>> On 16 Mar, 02:08, markhob...@hotpop.donottypethisbit.com (Mark Hobley)
>> wrote:
>>
>>> ([...] is the act of incrementing the pointer beyond the string
>>> boundary against the rules, or is it only breaking the rules to attempt to
>>> subsequently dereference the now invalid pointer?)
>>
>> its a special case allowed by the standard
[...]

>
> C&V please? I didn't know this one, and I can't seem to find it, at
> least in C99. My search skills are sometimes suboptimal :<

6.5.6p8 and p9 talk about pointer arithmetic and the case of "one past the
last element of the array object".

There are several other sections that refer to the semantics of pointers
which point to "one past the last element", or "P+1" where P points to "the
last element".

There may be some other place that explicitly states that it's valid. (I
searched for "last element".)

6.5.6p8 also states "If the result points one past the last element of the
array object, it shall not be used as the operand of a unary * operator that
is evaluated".

--
Kenneth Brody

BruceS

unread,
Mar 16, 2010, 1:46:56 PM3/16/10
to

Thanks, Eric. I read that part of the Standard (after you gave the
ref), and thought I *still* wasn't getting it (had started more
questions on it) when I noticed that the stated range was 0..10. I
blame the time change.

0 new messages