Google Groups Home
Help | Sign in
Explanation, please!
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 64 - Collapse all   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Douglas C. Schmidt  
View profile
 More options Aug 24 1988, 10:16 pm
Newsgroups: comp.lang.c
From: schm...@bonnie.ics.uci.edu (Douglas C. Schmidt)
Date: 25 Aug 88 02:16:14 GMT
Local: Wed, Aug 24 1988 10:16 pm
Subject: Explanation, please!
The following piece of wonderful obscurity comes from Stroustup's
C++ Programming Language book, page 100:

void send(int *to,int *from, int count) {
   int n = (count + 7) / 8;

   switch(count % 8) {
      case 0:  do { *to++ = *from++;
      case 7:       *to++ = *from++;
      case 6:       *to++ = *from++;
      case 5:       *to++ = *from++;
      case 4:       *to++ = *from++;
      case 3:       *to++ = *from++;
      case 2:       *to++ = *from++;
      case 1:       *to++ = *from++;
               } while (--n > 0);
   }

}

Now, much to my surprise, this is not only valid C++, it is also valid C!
Could some one please explain to me why this is so?  It seems like
the case 7-1 labels are actually nested inside the do {} while loop,
and thus not in the scope of the switch (should a break statement exit
both the switch and the loop, or just one?!?!).

Finally, Stroustrup asks the rhetorical question ``why would anyone
want to write something like this.''  Any guesses?!

thanks,

   Doug Schmidt
--
Douglas Schmidt
"If our behavior is strict, we do not need fun." -Zippy th' Pinhead
"If our behavior is struct, we do not need defun." -Anon


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Torek  
View profile
 More options Aug 25 1988, 3:16 am
Newsgroups: comp.lang.c
From: ch...@mimsy.UUCP (Chris Torek)
Date: 25 Aug 88 07:16:15 GMT
Local: Thurs, Aug 25 1988 3:16 am
Subject: Re: Explanation, please!
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas

C. Schmidt) writes:
>The following piece of wonderful obscurity comes from Stroustup's
>C++ Programming Language book, page 100:
>   switch(count % 8) {
>      case 0:  do { *to++ = *from++;
>      case 7:       *to++ = *from++;
        ...
>      case 1:       *to++ = *from++;
>               } while (--n > 0);
>   }
>Now, much to my surprise, this is not only valid C++, it is also valid C!
>Could some one please explain to me why this is so?  It seems like
>the case 7-1 labels are actually nested inside the do {} while loop,
>and thus not in the scope of the switch (should a break statement exit
>both the switch and the loop, or just one?!?!).

`break' exits the innermost switch or loop, hence a `break' in cases
7 through 1 exits the do-while, not the switch.

>Finally, Stroustrup asks the rhetorical question ``why would anyone
>want to write something like this.''  Any guesses?!

This has been called `Duff's device' (after Tom Duff, who probably did
not invent it first), and it looks exactly like what an optimising
compiler generates when it does loop unrolling.  It works because case
labels are just that---labels.  As long as there is a switch in
scope, a case label is legal; the case applies to the closest switch.
(And if C had used a separate keyword for `break switch', the whole
thing could be consistent :-) .)

Why?  To quote a certain infamous C language hack :-) , `It looks exactly
like what an optimising compiler generates when it does loop unrolling.'

Incidentally, there are some compilers that choke on that form.  I
would have to look hard at the dpANS to decide whether it is Officially
Legal.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: ch...@mimsy.umd.edu    Path:   uunet!mimsy!chris


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Gwyn  
View profile
 More options Aug 25 1988, 12:16 pm
Newsgroups: comp.lang.c
From: g...@smoke.ARPA (Doug Gwyn )
Date: 25 Aug 88 16:16:29 GMT
Local: Thurs, Aug 25 1988 12:16 pm
Subject: Re: Explanation, please!
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes:
-   switch(count % 8) {
-      case 0:  do { *to++ = *from++;
-      case 7:       *to++ = *from++;
-      case 6:       *to++ = *from++;
-      case 5:       *to++ = *from++;
-      case 4:       *to++ = *from++;
-      case 3:       *to++ = *from++;
-      case 2:       *to++ = *from++;
-      case 1:       *to++ = *from++;
-               } while (--n > 0);
-   }

-Now, much to my surprise, this is not only valid C++, it is also valid C!
-Could some one please explain to me why this is so?

Could you explain why it shouldn't be?  "case" labels are just a
special form of label.  You can stick a label on most statements.

-It seems like the case 7-1 labels are actually nested inside the
-do {} while loop, and thus not in the scope of the switch

?? "Thus"?  Why would the labels go out of scope?  They're
definitely within the body of the switch.

-(should a break statement exit both the switch and the loop, or just one?!?!)

A "break" applies to whatever the innermost container is.  A "break"
between two of the "*to++ = *from++;" statements would exit the
do-while loop.

-Finally, Stroustrup asks the rhetorical question ``why would anyone
-want to write something like this.''  Any guesses?!

It's about the fastest way to move arbitrarily-aligned data in
portable C with a guarantee as to what happens in the case that the
data overlaps.  memcpy() doesn't guarantee anything for overlaps.
memmove() does, but that's a recent X3J11 invention that probably
doesn't exist on your system yet.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel R. Levy  
View profile
 More options Aug 25 1988, 7:39 pm
Newsgroups: comp.lang.c
From: l...@ttrdc.UUCP (Daniel R. Levy)
Date: 25 Aug 88 23:39:20 GMT
Local: Thurs, Aug 25 1988 7:39 pm
Subject: Re: Explanation, please!
< void send(int *to,int *from, int count) {
<    int n = (count + 7) / 8;
<    
<    switch(count % 8) {
<       case 0:  do { *to++ = *from++;
<       case 7:       *to++ = *from++;
<       case 6:       *to++ = *from++;
<       case 5:       *to++ = *from++;
<       case 4:       *to++ = *from++;
<       case 3:       *to++ = *from++;
<       case 2:       *to++ = *from++;
<       case 1:       *to++ = *from++;
<                } while (--n > 0);
<    }
< }

Question:  what if count==0?
--
|------------Dan Levy------------|  THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY
| Bell Labs Area 61 (R.I.P., TTY)|  AND ARE NOT TO BE IMPUTED TO AT&T.
|        Skokie, Illinois        |
|-----Path:  att!ttbcad!levy-----|


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Siegal  
View profile
 More options Aug 26 1988, 8:53 am
Newsgroups: comp.lang.c
From: j...@fenchurch.MIT.EDU (Jeff Siegal)
Date: 26 Aug 88 12:53:49 GMT
Local: Fri, Aug 26 1988 8:53 am
Subject: Re: Explanation, please!
In article <2...@ttrdc.UUCP> l...@ttrdc.UUCP (Daniel R. Levy) writes:

>>[Duff's device loop example from C++ book]

>Question:  what if count==0?

The program breaks.  I prefer (and also use, in highly-speed-sensitive
code):

#define duff16(counter, block) \
  switch (counter & 0x0f) { \
   do \
   { \
     counter -= 16; \
     { block; } \
     case 15:   { block; } \
     case 14:   { block; } \
     case 13:   { block; } \
     case 12:   { block; } \
     case 11:   { block; } \
     case 10:   { block; } \
     case 9:    { block; } \
     case 8:    { block; } \
     case 7:    { block; } \
     case 6:    { block; } \
     case 5:    { block; } \
     case 4:    { block; } \
     case 3:    { block; } \
     case 2:    { block; } \
     case 1:    { block; } \
     case 0:    /* null statement */; \
   } while (counter >= 16); \
  }

  duff16(n, *to++ = *from++)

Jeff Siegal


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric S. Raymond  
View profile
 More options Aug 26 1988, 1:30 pm
Newsgroups: comp.lang.c, comp.arch
From: e...@snark.UUCP (Eric S. Raymond)
Date: 26 Aug 88 17:30:20 GMT
Local: Fri, Aug 26 1988 1:30 pm
Subject: Re: Explanation, please!
(Code below reproduced so that comp.arch people seeing this followup only
won't get terminally frustrated. This is *really neat*, gang...)

Yeah. That's the most hackish way of trying to write a portable optimized
copy routine I've ever seen. I gather the whole point of the shenanigans
is to get all the *from++ -> *to++ instructions in the generated code to be
adjacent.

This only makes if the author knows he's got a hardware instruction pipeline
or cache that's no less than 8 and no more than 9 byte-copy instruction widths
long, and stuff executing out of the pipeline is a lot faster than if the
copies are interleaved with control transfers. Dollars to doughnuts this code
was written on a RISC machine.

(Gawrsh. That sounded just like one of the big boys on comp.arch tawkin'. I
think I'll cross-post over there just to see if I get shot down in flames...)

--
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = e...@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Henry Spencer  
View profile
 More options Aug 26 1988, 3:01 pm
Newsgroups: comp.lang.c
From: he...@utzoo.uucp (Henry Spencer)
Date: Fri, 26 Aug 88 19:01:36 GMT
Local: Fri, Aug 26 1988 3:01 pm
Subject: Re: Explanation, please!
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes:

>The following piece of wonderful obscurity comes from Stroustup's
>C++ Programming Language book, page 100...

Tom Duff, what hast thou wrought? :-)

>...
>Now, much to my surprise, this is not only valid C++, it is also valid C!
>Could some one please explain to me why this is so?  It seems like
>the case 7-1 labels are actually nested inside the do {} while loop,
>and thus not in the scope of the switch (should a break statement exit
>both the switch and the loop, or just one?!?!).

You're thinking of switch as if it were Pascal's case; the correct model
is closer to Fortran's computed goto.  The labels must appear within the
case body, but they can be (essentially) *anywhere* within the body,
including within nested blocks.  The sensible programmer will not exploit
this freedom except in truly unusual situations, but it is available.

Break breaks out of the innermost construct that it could apply to, i.e.
"just one".

>Finally, Stroustrup asks the rhetorical question ``why would anyone
>want to write something like this.''  Any guesses?!

This construct is called "Duff's Device", after its discoverer.  One can
often make a loop run faster by "unrolling" it, duplicating its body
N times and running one-Nth as many iterations -- this reduces the loop-
control overhead by a factor of N.  The annoying part is that the number
of times one needs to do the body is usually not an even multiple of N,
so one has to do a partial iteration at beginning or end.  The usual
assembly-language trick is to compute a start address in the middle
of the loop and start by branching there, not to the beginning, thus
doing a partial iteration first.  Most high-level languages have no way
to express this.  Tom Duff (of Bell Labs) realized that it could be done
in C.  Ugh.
--
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry he...@zoo.toronto.edu

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William E. Davidsen Jr  
View profile
 More options Aug 26 1988, 4:22 pm
Newsgroups: comp.lang.c, comp.std.c
Followup-To: comp.lang.c
From: david...@steinmetz.ge.com (William E. Davidsen Jr)
Date: 26 Aug 88 20:22:24 GMT
Local: Fri, Aug 26 1988 4:22 pm
Subject: Re: Explanation, please!
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes:
| The following piece of wonderful obscurity comes from Stroustup's
| C++ Programming Language book, page 100:

[ a do loop within a switch, with cases all through the do loop ]

This does not seem to be clearly covered in the latest dpANS (section
3.6). The issue is if it is legal *and defined* to jump into a loop.
While a case label behaves just like any other label, it's not clear
what a jump into a loop implies, and I don't see that the existing
standard is any clearer on the topic than the discussion was three years
ago.

Consider:
        i = 15;
        if (j < 10) goto BadMove;

        /* more code here */

        do {
          int i = 7;
          int myvect[400];
        BadMove: /* entry into block */
          myvect[300] = i;
          if (j < 5) return i;
        } while (j-- > 302);

  There was a lot of discussion of this and I don't see a clarification
in the standard as it stands. I think there are several possibilities:
  1) it's not allowed
  2) it's allowed, but the initializations at the start of the
        block don't take place. What does this imply about the
        block local variables?
  3) it's allowed and the block local variables are allocated and
        initialized as you would expect.
  4) it's allowed but the results are implementation dependent.

  In all other cases entry into a block causes allocation of the block
local (auto) variables. Initialization is performed at the time of
allocation. By allowing entry at any point the expected behavior is
unspecified.

Suggestion:
  Forget (1), it's there for completeness. Pick any of 2-4 and add it to
the standard NOW, as an editorial change. We talked about this three
years ago, it can hardly be considered a recent issue. I would not
suggest (1) because it breaks a lot of existing code, but I would be
happy to see a (5) "you can't jump into a block with local variables."

  This is one of the rare times when I would like to be wrong, to be
told "oh, it's there in a footnote, you can, or you can't, and this is
what it does." If the action is implementation dependent then I
(personally) wouldn't use the feature, since I value portability
highly. Please tell me X3J11 didn't let this go unresolved for three
years.
--
        bill davidsen           (w...@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Gwyn  
View profile
 More options Aug 26 1988, 5:18 pm
Newsgroups: comp.lang.c
From: g...@smoke.ARPA (Doug Gwyn )
Date: 26 Aug 88 21:18:14 GMT
Local: Fri, Aug 26 1988 5:18 pm
Subject: Re: Explanation, please!

In article <9...@eddie.MIT.EDU> j...@fenchurch.MIT.EDU (Jeff Siegal) writes:
>...  I prefer (and also use, in highly-speed-sensitive code): ...

It is worth noting that byte-at-a-time is not optimal on most architectures.
Duff's device is useful for arbitrarily-aligned data, but if you can
arrange for (most of) the source and destination buffers to be relatively
word-aligned, then along with processing the head and tail pieces via
something like Duff's device, the bulk of the data can be transferred in
bigger chunks, gaining considerable increase in speed.  One should really
use memcpy() in all cases where the source and destination do not overlap,
because all the work done to optimize block-move for a specific architecture
are supposed to be centralized there.

A couple of people have complained about the tone of my previous response
on this topic.  I meant it to be informative and minimal, i.e. requiring
some thought on the reader's part.  Since some have missed the point, let
me summarize the C style issue:

        Since the constructs are not forbidden by the C language
        rules, they are perforce permitted.  One should have an
        explicit rule to point at as being violated in order to
        think that the code is not valid.  "Intuition" is a lousy
        guide, particularly when it was formed by experience in
        other contexts (e.g. Pascal).

        Now, because of the demands made on one's understanding
        of the language, the cited code would be considered bad
        programming style IF what it did could be effectively
        accomplished in some other way.  As I pointed out in my
        previous posting on this, Duff's device does indeed have
        use in cases where other approaches fail, namely moving
        overlapping unaligned data.  Eventually we can all use
        memmove() for this, but not today.

Incidentally, Duff's device is used to manipulate text strings in the
"sam" text editor.  I recently changed it to use the device for both
directions, only in cases of overlap, using memcpy() for all other
cases.  (Originally the device was used for all transfers in one
direction and memcpy() for all in the other, which assumed more about
the behavior of memcpy() than is officially guaranteed.)  So this is
not just an academic issue..


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Douglas C. Schmidt  
View profile
 More options Aug 27 1988, 3:57 am
Newsgroups: comp.lang.c, comp.arch
From: schm...@bonnie.ics.uci.edu (Douglas C. Schmidt)
Date: 27 Aug 88 07:57:11 GMT
Local: Sat, Aug 27 1988 3:57 am
Subject: Re: Explanation, please!
Hi,

   Since I posted my original question there has been a great deal of
abstract discussion about the relative merits of the loop unrolling
scheme.  The topic has piqued my curiousity, so I when ahead and
implemented a short test program, included below, to test Duff's
device against the ``ordinary for loop w/index variable'' technique.
See for yourself....  

After some quick testing I found that gcc 1.26 -O on a Sun 3 and a
Sequent Balance was pretty heavily in favor of the regular (non-Duff)
loop.  Your mileage may vary.  I realize that there may be other
tests, and if anyone has a better version, I'd like to see it!

Doug Schmidt
----------------------------------------
#include <sys/time.h>
#include <sys/resource.h>

double Start_Timer();
double Return_Elapsed_Time();

#define MAX_NUM 100000
int array1[MAX_NUM ];
int array2[MAX_NUM ];
int *A = array1, *B = array2;

main(int argc, char *argv[]) {
   double Elapsed_Time;
   int    Count = argc > 1 ? atoi(argv[1]) : MAX_NUM;
   int i;

   for (i = 0;i < Count ;i++) {
      array1[i] = i + 1;
      array2[i] = i;
   }

   printf("Starting Duff's device timing...\n");
   Start_Timer();

   {
      int n = (Count + 7) / 8;

      switch(Count % 8) {
         case 0:  do { *A++ = *B++;
         case 7:       *A++ = *B++;
         case 6:       *A++ = *B++;
         case 5:       *A++ = *B++;
         case 4:       *A++ = *B++;
         case 3:       *A++ = *B++;
         case 2:       *A++ = *B++;
         case 1:       *A++ = *B++;
                  } while (--n > 0);
      }
   }

   Elapsed_Time = Return_Elapsed_Time(0.0 );
   printf("Elapsed time = %.3f\n",Elapsed_Time);

   for (i = 0;i < Count ;i++) {
      if (array1[i] != array2[i]) {
         printf("Yow, problems at location %d!\n",i);
         break;
      }
   }

   for (i = 0;i < Count ;i++) {
      array1[i] = i + 1;
      array2[i] = i;
   }

   printf("Starting ordinary copy timing...\n");
   Start_Timer();

   for (i = 0;i < Count ;i++) {
      array1[i] = array2[i];
   }

   Elapsed_Time = Return_Elapsed_Time(0.0 );
   printf("Elapsed time = %.3f\n",Elapsed_Time);

   for (i = 0;i < Count ;i++) {
      if (array1[i] != array2[i]) {
         printf("Yow, problems at location %d!\n",i);
         break;
      }
   }

}      

/* no such thing as "negative time"! */
#define  ERROR_VALUE -1.0  

static struct rusage Old_Time;
static struct rusage New_Time;
static int    Timer_Set = 0;

#ifdef __STDC__
double Start_Timer(void)
#else
double Start_Timer()
#endif
{
   Timer_Set = 1;
   getrusage(RUSAGE_SELF,&Old_Time);        /* set starting process time */
   return(Old_Time.ru_utime.tv_sec + (Old_Time.ru_utime.tv_usec / 1000000.0));

}

/* returns process time since Last_Time (if parameter is not DEFAULT_TIME, */
/* i.e., (double) 0.0 ),otherwise, if parameter == DEFAULT_TIME then       */
/* the time since the Old_Time was set is returned.  Returns ERROR_VALUE   */
/* if Start_Timer() is not called first */
#ifdef __STDC__
double Return_Elapsed_Time(double Last_Time)
#else
double Return_Elapsed_Time(Last_Time)
double Last_Time;
#endif
{
   if (!Timer_Set) {
      return(ERROR_VALUE);
   }  
   else {
    /* get process time */
      getrusage(RUSAGE_SELF,&New_Time);
      if (Last_Time == 0.0) {
         return((New_Time.ru_utime.tv_sec - Old_Time.ru_utime.tv_sec) +
               ((New_Time.ru_utime.tv_usec - Old_Time.ru_utime.tv_usec)
                / 1000000.0));
      }
      else {
         return((New_Time.ru_utime.tv_sec +
                (New_Time.ru_utime.tv_usec / 1000000.0)) - Last_Time);
      }
   }

}

--
schm...@bonnie.ics.uci.edu (ARPA)
"If our behavior is strict, we do not need fun." -Zippy th' Pinhead
"If our behavior is struct, we do not need defun." -Anon

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
T. William Wells   </