The following piece of wonderful obscurity comes from Stroustup's C++ Programming Language book, page 100:
void send(int *to,int *from, int count) { int n = (count + 7) / 8;
switch(count % 8) { case 0: do { *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while (--n > 0); }
}
Now, much to my surprise, this is not only valid C++, it is also valid C! Could some one please explain to me why this is so? It seems like the case 7-1 labels are actually nested inside the do {} while loop, and thus not in the scope of the switch (should a break statement exit both the switch and the loop, or just one?!?!).
Finally, Stroustrup asks the rhetorical question ``why would anyone want to write something like this.'' Any guesses?!
thanks,
Doug Schmidt -- Douglas Schmidt "If our behavior is strict, we do not need fun." -Zippy th' Pinhead "If our behavior is struct, we do not need defun." -Anon
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas
C. Schmidt) writes: >The following piece of wonderful obscurity comes from Stroustup's >C++ Programming Language book, page 100: > switch(count % 8) { > case 0: do { *to++ = *from++; > case 7: *to++ = *from++; ... > case 1: *to++ = *from++; > } while (--n > 0); > } >Now, much to my surprise, this is not only valid C++, it is also valid C! >Could some one please explain to me why this is so? It seems like >the case 7-1 labels are actually nested inside the do {} while loop, >and thus not in the scope of the switch (should a break statement exit >both the switch and the loop, or just one?!?!).
`break' exits the innermost switch or loop, hence a `break' in cases 7 through 1 exits the do-while, not the switch.
>Finally, Stroustrup asks the rhetorical question ``why would anyone >want to write something like this.'' Any guesses?!
This has been called `Duff's device' (after Tom Duff, who probably did not invent it first), and it looks exactly like what an optimising compiler generates when it does loop unrolling. It works because case labels are just that---labels. As long as there is a switch in scope, a case label is legal; the case applies to the closest switch. (And if C had used a separate keyword for `break switch', the whole thing could be consistent :-) .)
Why? To quote a certain infamous C language hack :-) , `It looks exactly like what an optimising compiler generates when it does loop unrolling.'
Incidentally, there are some compilers that choke on that form. I would have to look hard at the dpANS to decide whether it is Officially Legal. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: ch...@mimsy.umd.edu Path: uunet!mimsy!chris
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes: - switch(count % 8) { - case 0: do { *to++ = *from++; - case 7: *to++ = *from++; - case 6: *to++ = *from++; - case 5: *to++ = *from++; - case 4: *to++ = *from++; - case 3: *to++ = *from++; - case 2: *to++ = *from++; - case 1: *to++ = *from++; - } while (--n > 0); - }
-Now, much to my surprise, this is not only valid C++, it is also valid C! -Could some one please explain to me why this is so?
Could you explain why it shouldn't be? "case" labels are just a special form of label. You can stick a label on most statements.
-It seems like the case 7-1 labels are actually nested inside the -do {} while loop, and thus not in the scope of the switch
?? "Thus"? Why would the labels go out of scope? They're definitely within the body of the switch.
-(should a break statement exit both the switch and the loop, or just one?!?!)
A "break" applies to whatever the innermost container is. A "break" between two of the "*to++ = *from++;" statements would exit the do-while loop.
-Finally, Stroustrup asks the rhetorical question ``why would anyone -want to write something like this.'' Any guesses?!
It's about the fastest way to move arbitrarily-aligned data in portable C with a guarantee as to what happens in the case that the data overlaps. memcpy() doesn't guarantee anything for overlaps. memmove() does, but that's a recent X3J11 invention that probably doesn't exist on your system yet.
< void send(int *to,int *from, int count) { < int n = (count + 7) / 8; < < switch(count % 8) { < case 0: do { *to++ = *from++; < case 7: *to++ = *from++; < case 6: *to++ = *from++; < case 5: *to++ = *from++; < case 4: *to++ = *from++; < case 3: *to++ = *from++; < case 2: *to++ = *from++; < case 1: *to++ = *from++; < } while (--n > 0); < } < }
Question: what if count==0? -- |------------Dan Levy------------| THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY | Bell Labs Area 61 (R.I.P., TTY)| AND ARE NOT TO BE IMPUTED TO AT&T. | Skokie, Illinois | |-----Path: att!ttbcad!levy-----|
In article <6...@paris.ics.uci.edu> Douglas C. Schmidt writes:
> void send(int *to,int *from, int count) { > int n = (count + 7) / 8;
> switch(count % 8) { > case 0: do { *to++ = *from++; > case 7: *to++ = *from++; > case 6: *to++ = *from++; > case 5: *to++ = *from++; > case 4: *to++ = *from++; > case 3: *to++ = *from++; > case 2: *to++ = *from++; > case 1: *to++ = *from++; > } while (--n > 0); > }
> }
> Finally, Stroustrup asks the rhetorical question ``why would anyone > want to write something like this.'' Any guesses?!
Yeah. That's the most hackish way of trying to write a portable optimized copy routine I've ever seen. I gather the whole point of the shenanigans is to get all the *from++ -> *to++ instructions in the generated code to be adjacent.
This only makes if the author knows he's got a hardware instruction pipeline or cache that's no less than 8 and no more than 9 byte-copy instruction widths long, and stuff executing out of the pipeline is a lot faster than if the copies are interleaved with control transfers. Dollars to doughnuts this code was written on a RISC machine.
(Gawrsh. That sounded just like one of the big boys on comp.arch tawkin'. I think I'll cross-post over there just to see if I get shot down in flames...)
-- Eric S. Raymond (the mad mastermind of TMN-Netnews) UUCP: ...!{uunet,att,rutgers}!snark!eric = e...@snark.UUCP Post: 22 S. Warren Avenue, Malvern, PA 19355 Phone: (215)-296-5718
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes:
>The following piece of wonderful obscurity comes from Stroustup's >C++ Programming Language book, page 100...
Tom Duff, what hast thou wrought? :-)
>... >Now, much to my surprise, this is not only valid C++, it is also valid C! >Could some one please explain to me why this is so? It seems like >the case 7-1 labels are actually nested inside the do {} while loop, >and thus not in the scope of the switch (should a break statement exit >both the switch and the loop, or just one?!?!).
You're thinking of switch as if it were Pascal's case; the correct model is closer to Fortran's computed goto. The labels must appear within the case body, but they can be (essentially) *anywhere* within the body, including within nested blocks. The sensible programmer will not exploit this freedom except in truly unusual situations, but it is available.
Break breaks out of the innermost construct that it could apply to, i.e. "just one".
>Finally, Stroustrup asks the rhetorical question ``why would anyone >want to write something like this.'' Any guesses?!
This construct is called "Duff's Device", after its discoverer. One can often make a loop run faster by "unrolling" it, duplicating its body N times and running one-Nth as many iterations -- this reduces the loop- control overhead by a factor of N. The annoying part is that the number of times one needs to do the body is usually not an even multiple of N, so one has to do a partial iteration at beginning or end. The usual assembly-language trick is to compute a start address in the middle of the loop and start by branching there, not to the beginning, thus doing a partial iteration first. Most high-level languages have no way to express this. Tom Duff (of Bell Labs) realized that it could be done in C. Ugh. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry he...@zoo.toronto.edu
In article <6...@paris.ICS.UCI.EDU> schm...@bonnie.ics.uci.edu (Douglas C. Schmidt) writes: | The following piece of wonderful obscurity comes from Stroustup's | C++ Programming Language book, page 100:
[ a do loop within a switch, with cases all through the do loop ]
This does not seem to be clearly covered in the latest dpANS (section 3.6). The issue is if it is legal *and defined* to jump into a loop. While a case label behaves just like any other label, it's not clear what a jump into a loop implies, and I don't see that the existing standard is any clearer on the topic than the discussion was three years ago.
Consider: i = 15; if (j < 10) goto BadMove;
/* more code here */
do { int i = 7; int myvect[400]; BadMove: /* entry into block */ myvect[300] = i; if (j < 5) return i; } while (j-- > 302);
There was a lot of discussion of this and I don't see a clarification in the standard as it stands. I think there are several possibilities: 1) it's not allowed 2) it's allowed, but the initializations at the start of the block don't take place. What does this imply about the block local variables? 3) it's allowed and the block local variables are allocated and initialized as you would expect. 4) it's allowed but the results are implementation dependent.
In all other cases entry into a block causes allocation of the block local (auto) variables. Initialization is performed at the time of allocation. By allowing entry at any point the expected behavior is unspecified.
Suggestion: Forget (1), it's there for completeness. Pick any of 2-4 and add it to the standard NOW, as an editorial change. We talked about this three years ago, it can hardly be considered a recent issue. I would not suggest (1) because it breaks a lot of existing code, but I would be happy to see a (5) "you can't jump into a block with local variables."
This is one of the rare times when I would like to be wrong, to be told "oh, it's there in a footnote, you can, or you can't, and this is what it does." If the action is implementation dependent then I (personally) wouldn't use the feature, since I value portability highly. Please tell me X3J11 didn't let this go unresolved for three years. -- bill davidsen (w...@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
In article <9...@eddie.MIT.EDU> j...@fenchurch.MIT.EDU (Jeff Siegal) writes: >... I prefer (and also use, in highly-speed-sensitive code): ...
It is worth noting that byte-at-a-time is not optimal on most architectures. Duff's device is useful for arbitrarily-aligned data, but if you can arrange for (most of) the source and destination buffers to be relatively word-aligned, then along with processing the head and tail pieces via something like Duff's device, the bulk of the data can be transferred in bigger chunks, gaining considerable increase in speed. One should really use memcpy() in all cases where the source and destination do not overlap, because all the work done to optimize block-move for a specific architecture are supposed to be centralized there.
A couple of people have complained about the tone of my previous response on this topic. I meant it to be informative and minimal, i.e. requiring some thought on the reader's part. Since some have missed the point, let me summarize the C style issue:
Since the constructs are not forbidden by the C language rules, they are perforce permitted. One should have an explicit rule to point at as being violated in order to think that the code is not valid. "Intuition" is a lousy guide, particularly when it was formed by experience in other contexts (e.g. Pascal).
Now, because of the demands made on one's understanding of the language, the cited code would be considered bad programming style IF what it did could be effectively accomplished in some other way. As I pointed out in my previous posting on this, Duff's device does indeed have use in cases where other approaches fail, namely moving overlapping unaligned data. Eventually we can all use memmove() for this, but not today.
Incidentally, Duff's device is used to manipulate text strings in the "sam" text editor. I recently changed it to use the device for both directions, only in cases of overlap, using memcpy() for all other cases. (Originally the device was used for all transfers in one direction and memcpy() for all in the other, which assumed more about the behavior of memcpy() than is officially guaranteed.) So this is not just an academic issue..
Since I posted my original question there has been a great deal of abstract discussion about the relative merits of the loop unrolling scheme. The topic has piqued my curiousity, so I when ahead and implemented a short test program, included below, to test Duff's device against the ``ordinary for loop w/index variable'' technique. See for yourself....
After some quick testing I found that gcc 1.26 -O on a Sun 3 and a Sequent Balance was pretty heavily in favor of the regular (non-Duff) loop. Your mileage may vary. I realize that there may be other tests, and if anyone has a better version, I'd like to see it!
#ifdef __STDC__ double Start_Timer(void) #else double Start_Timer() #endif { Timer_Set = 1; getrusage(RUSAGE_SELF,&Old_Time); /* set starting process time */ return(Old_Time.ru_utime.tv_sec + (Old_Time.ru_utime.tv_usec / 1000000.0));
}
/* returns process time since Last_Time (if parameter is not DEFAULT_TIME, */ /* i.e., (double) 0.0 ),otherwise, if parameter == DEFAULT_TIME then */ /* the time since the Old_Time was set is returned. Returns ERROR_VALUE */ /* if Start_Timer() is not called first */ #ifdef __STDC__ double Return_Elapsed_Time(double Last_Time) #else double Return_Elapsed_Time(Last_Time) double Last_Time; #endif { if (!Timer_Set) { return(ERROR_VALUE); } else { /* get process time */ getrusage(RUSAGE_SELF,&New_Time); if (Last_Time == 0.0) { return((New_Time.ru_utime.tv_sec - Old_Time.ru_utime.tv_sec) + ((New_Time.ru_utime.tv_usec - Old_Time.ru_utime.tv_usec) / 1000000.0)); } else { return((New_Time.ru_utime.tv_sec + (New_Time.ru_utime.tv_usec / 1000000.0)) - Last_Time); } }
}
-- schm...@bonnie.ics.uci.edu (ARPA) "If our behavior is strict, we do not need fun." -Zippy th' Pinhead "If our behavior is struct, we do not need defun." -Anon