Everyone knows switch() switches on integer expressions.
Wouldn't It Be Nice if you could select from strings, though...
Well, perhaps one could write "str_switch()", which takes a string, and an array of strings, and returns the index in the array of a matching string. Of course, this would be horribly inefficient - presumably at least as expensive as the iterative strcmp() so often used as an alternative, and possibly as bad as the endless if-else-if-else-if... used in some programs.
But what if one were to build a table? In every application I've had where I wish to see which of a set of strings I have, I've had a known predefined set of strings. So, the behavior could be something like { static ss_table *sst;
if (!sst) { sst = str_stable(stringtable); if (!sst) return 0; } switch (str_switch(str, sst)) { case -1: /* failure */ break; case 0: /* 1st string in table */ ... } }
The idea would be that "str_stable" would build a somewhat more efficient table representation of what the strings are like. I could see it being done with hashes, or with something like what fgrep traditionally uses.
Any prior art? Ideas I've missed? I'm thinking about developing this, since it would be useful in a current application, and I'd like to do it right, so I don't repeat the work.
-s -- Copyright 1997 Peter Seebach - seebs at solon.com - C/Unix Wizard I am not actually dns-ad...@earthlink.net but junk mail is accepted there. The *other* C FAQ, the hacker FAQ, et al. http://www.solon.com/~seebs Unsolicited email (junk mail and ads) is unwelcome, and will be billed for.
My simplest suggestion --> Steal whatever VAX BASIC does. Cram it into the next iteration of the standard. All mortal C programmers will sing your praises.
Another solution: Have the COMPILER sort the strings AT COMPILE TIME and place them in an internal array. Then do a BSEARCH of the array for the input string to switch on. Not only portable, but also efficient. Not only would they sing your praises, they would recount your glories to their grandchildren and their great grandchildren. ;-)
/* what the programmer codes: */ char *Animal; /* ... */ switch (Animal) { case "BEAR": /* do stuff */ break; case "LION": /* do stuff */ break; case "TIGER": /* do stuff */ break; case "EAGLE": /* do stuff */ break; case "FROG": /* do stuff */ break; case "PYTHON": /* do stuff */ break; default: /* Warn about the 5 million other possibilities */
}
/* What the compiler does: */ /* Fast search, hidden behind the scenes, that knows we are using pointers */ int fastbsearch( char **SwitchStuff, char *SwitchItem );
/* Then we do the actual switch on return value for fastbsearch(). The compiler actually replaces the string after case with a number which is in the position of the list. */ /* ... */ switch (fastbsearch( SwitchStrings, Animal ) ); { case 0: /* "BEAR": */ /* do stuff */ break; case 4: /* "LION": */ /* do stuff */ break; case 6: /* "TIGER": */ /* do stuff */ break; case 2: /* "EAGLE": */ /* do stuff */ break; case 3: /* "FROG": */ /* do stuff */ break; case 5: /* "PYTHON":*/ /* do stuff */ break; default: /* Warn about the 5 million other possibilities */
}
/* Why should I have to type all this in? Less is more. */
Peter Seebach <se...@solon.com> wrote in article <5bdvof$...@solutions.solon.com>...
> Everyone knows switch() switches on integer expressions.
> Wouldn't It Be Nice if you could select from strings, though...
> Well, perhaps one could write "str_switch()", which takes a string, > and an array of strings, and returns the index in the array of a matching > string. Of course, this would be horribly inefficient - presumably at > least as expensive as the iterative strcmp() so often used as an > alternative, and possibly as bad as the endless if-else-if-else-if... > used in some programs.
> But what if one were to build a table? In every application I've had > where I wish to see which of a set of strings I have, I've had a known > predefined set of strings. So, the behavior could be something like > { > static ss_table *sst;
> if (!sst) { > sst = str_stable(stringtable); > if (!sst) > return 0; > } > switch (str_switch(str, sst)) { > case -1: /* failure */ > break; > case 0: /* 1st string in table */ > ... > } > }
> The idea would be that "str_stable" would build a somewhat more efficient > table representation of what the strings are like. I could see it being > done with hashes, or with something like what fgrep traditionally uses.
> Any prior art? Ideas I've missed? I'm thinking about developing this, > since it would be useful in a current application, and I'd like to do it > right, so I don't repeat the work.
> -s > -- > Copyright 1997 Peter Seebach - seebs at solon.com - C/Unix Wizard > I am not actually dns-ad...@earthlink.net but junk mail is accepted there. > The *other* C FAQ, the hacker FAQ, et al. http://www.solon.com/~seebs > Unsolicited email (junk mail and ads) is unwelcome, and will be billed for.
se...@solon.com (Peter Seebach) wrote: >Wouldn't It Be Nice if you could select from strings, though...
Whenever I've needed to switch on a string there's usually just a small portion (sizeof long or less) of the string that's significant. A technique that I've used is to turn those bytes into something that I can switch on. The OS that I use provides a MAKETYPE macro that, given the address of something, will cast that something into another specified type: #define MAKETYPE(v, type) (*((type *)&v))
Then I just switch on MAKETYPE(myString, long) to turn into a long and use #defines for the cases so I'll have meaningful labels there.
Note that this works for me. It is definitely not portable, primarily because byte-swapping will significantly alter the result.
John Currier john.curr...@checksol.com Software Development Engineer Check Solutions
: Everyone knows switch() switches on integer expressions. : : Wouldn't It Be Nice if you could select from strings, though... : : ( ... snip ... ) : : But what if one were to build a table? In every application I've had : where I wish to see which of a set of strings I have, I've had a known : predefined set of strings. So, the behavior could be something like : : ( ... snip ... ) : : The idea would be that "str_stable" would build a somewhat more efficient : table representation of what the strings are like. I could see it being : done with hashes, or with something like what fgrep traditionally uses. : : Any prior art? Ideas I've missed? I'm thinking about developing this, : since it would be useful in a current application, and I'd like to do it : right, so I don't repeat the work.
My solution may not be general enough - it is a compile time not run time solution. Anyway, I had to parse a config file of key/value pairs and wanted users to be able to write the text corresponding to the name of the key rather than the numeric #define constant.
Given over 100 constants, I wasn't interested in the strcmp noise. So I whipped out a perl critter that takes a data file and gens a header file with #defines and a source file with a hashed array (257 buckets) and a lookup function that returns the numeric equivalent of each tag and deals with the collisions.
typedef struct tag_def_dtd { int tag; /* Field tag */ char name[25]; /* Tag name */ } tag_def; extern tag_def tag_array[]; extern int tag_lookup(char *name);
-- George Headley Director of Software Engineering Phone: 608/288-3000 head...@binc.net Berbee Information Networks Fax: 608/288-3007 5944 Seminole Centre Court Madison WI 53711
In article <5bdvof$...@solutions.solon.com> se...@solon.com writes:
>Everyone knows switch() switches on integer expressions.
>Wouldn't It Be Nice if you could select from strings, though...
>Well, perhaps one could write "str_switch()", which takes a string, >and an array of strings, and returns the index in the array of a matching >string. Of course, this would be horribly inefficient - presumably at >least as expensive as the iterative strcmp() so often used as an >alternative, and possibly as bad as the endless if-else-if-else-if... >used in some programs.
>The idea would be that "str_stable" would build a somewhat more efficient >table representation of what the strings are like. I could see it being >done with hashes, or with something like what fgrep traditionally uses.
>Any prior art? Ideas I've missed? I'm thinking about developing this, >since it would be useful in a current application, and I'd like to do it >right, so I don't repeat the work.
When faced with this, I typically define a structure with two members, a string (char *) and a function pointer, then I have an array (usually sorted alphabetically) of such structures. The parameters the function takes usually depends upon the application, but it gives me a nice blend between speed and flexibility.
during initialization, I sort actions[] based on actions[].name, then use a binary search to find matches. It has the advantage that adding a new command (typically I use this for quick command-line parsers) requires only adding a E(whatever), line and a do_whatever() routine, and is efficient enough for all but the most demanding applications.
This is also the example I use when people say token quoting and token pasting are useless and should be avoided. In this case, writing it out can even introduce bugs, along the lines of:
{ "fmoo", do_foo },
--------------------------------------------------------------------------- Tim Hollebeek | Disclaimer :=> Everything above is a true statement, Electron Psychologist | for sufficiently false values of true. Princeton University | email: t...@wfn-shop.princeton.edu ----------------------| http://wfn-shop.princeton.edu/~tim (NEW! IMPROVED!)
Peter Seebach <se...@solon.com> wrote in article <5bdvof$...@solutions.solon.com>...
> Wouldn't It Be Nice if you could select from strings, though...
[snip-ola]
> Any prior art? Ideas I've missed? I'm thinking about developing this, > since it would be useful in a current application, and I'd like to do it > right, so I don't repeat the work.
You might check out my Associative Array library on my web site, which is what I think you're talking about. It allows creating an associative array context, which you may place string keys that translate into either other strings, or a general void * pointer. The string key may be either stored externally to the AA context, or duplicated internally. It uses a hash table algorithm which will dynamically increase the hash table size as the number of entries increase. I have attached the documentation file.
I intentionally made the names short, i.e., aaPut and aaGet to make it convenient to use. I have to say that this is one of those libraries that you just didn't know you needed until you have it available, cheap and easy (kind of like Perl associative arrays :-) )
I should note that my legal language (which I deleted here) allows this library to be used in commercial applications.
--------
1.0 Description
This document describes subroutines that may be used in a manner similiar to "Associative Arrays" as in Perl. They use a very efficient hash table algorithm for quick translation of C string Keys to strings or general pointers. Features:
* Use external or internal storage for key string. * Use external or internal storage for key value. In the case of internal storage, the value must be a string. For external storage, the value may be any general pointer. * Dynamically increases/decreases size of hash table depending on ratio of entries / hash table size. * Uses 'CRC' algorithm for good distributions, even on similiar keys.
Destroys Associative Array context, and all internally allocated keys and values. If a key or value was specified as external, nothing is done to that key or value. Parameters:
2.3 int aaPut(aaContext *aa, char *key, void *value, long flags)
aa Associative array key Key to store. value Value, string or general pointer. flags AA_DUPKEY Duplicate key using internal storage. AA_DUPVAL Duplicate string value using internal storage. AA_EXTKEY Store pointer to passed key, rather than duplicate key internally. AA_EXTVAL Store pointer to external value, rather than duplicate internally. May be general pointer in this case (such as to a structure). AA_NOREP Do not replace value if key already exists.
One of (AA_DUPKEY,AA_EXTKEY) and (AA_DUPVAL,AA_EXTVAL) must be given.
Store a key/value pair into an associative array. Overwrites previous value if key already exists, unless the AA_NOREP flag is given. Key is case significant, unless AA was created with AA_CASEINSIG flag. Returns -1 if error (No KEY or VAL flag specified, allocation error), 0 if key did not exist, and 1 if key already existed. May be used from within an aaScanBegin loop.
The intention of having no default for internal or external storage is to cut down the risk of errors, even though it makes it a bit more long-winded. A default can obviously be added easily.
2.4 void *aaGet(aaContext *aa, char *key)
aa Associative array key Key to search.
Retrieve value of key; returns NULL if key not found.
2.4 int aaDelete(aaContext *aa, char *key)
aa Associative array key Key to delete.
Delete key from associative array. Returns 0 if key existed, otherwise -1. May be used within an aaScanBegin loop.
2.5 int aaScanBegin(aaContext *aa, char *key)
aa Associative array context. key Optional starting key, or NULL. Key must exist.
Scan keys in hash table (that is, random) order. In practice, passing in a start key is not that useful, but it could be used to restart a scan for whatever reason. aaScanEnd() should be used when the scan is complete. Note that during an active scan, the hash table will not be reconfigured from adds or deletes in order to preserve a stable hash order. When the scan is completed by calling aaScanEnd, the hash table's size is reconfigured (if necessary).
2.6 char *aaScanNext(aaContext *aa, void **value)
aa Associative array context. value Optional value of key, or NULL.
After calling aaScanBegin, returns pointer to next key, or NULL. Note it is safe to use aaDelete and aaPut from within an aaScanBegin loop.
2.7 void aaScanEnd(aaContext *aa)
aa Associative array context.
Terminates a scan. If any aaDelete's or aaPut's where done during a scan, then the hash table is reconfigured at this time.
2.8 void aaDumpTable(aaContext *aa)
aa Associative array context.
Dumps out an associative array to stdout for debugging purposes.
3.0 Sample program
#include <stdio.h> #include "aa.h"
int main(void) { struct aaContext *aa; char *p; char key[20], value[20]; char key2[20]; int i;
aa = aaCreateContext(0);
/* Create some entries */
for (i = 0; i < 100; ++i) { sprintf(key, "KEY%d", i); sprintf(value, "VALUE%d", i); aaPut(aa, key, value, AA_DUPKEY|AA_DUPVAL); } aaDumpTable(aa);
/* Read back the entries */
for (i = 0; i < 100; ++i) { sprintf(key2, "KEY%d", i); p = aaGet(aa, key2); printf("aaGet (%s): value = '%s'\n", key2, p); }
/* Scan and delete 90% of the entries */
aaScanBegin(aa, NULL); i = 0; while ((p = aaScanNext(aa, NULL)) != NULL) { if (i++ % 10 == 0) /* Leave some left over */ continue; printf("Deleting: %s\n", p); aaDelete(aa, p); } aaScanEnd(aa);
/* Dump out the entries that are left */
printf("After delete:\n"); aaDumpTable(aa);
aaDestroyContext(aa); return;
}
-- ========================================================================== | Tim Behrendsen (t...@a-sis.com) | http://www.cerfnet.com/~timb | | "Judge all, and be prepared to be judged by all." | ==========================================================================
"Dann Corbit" <dcor...@solutionsiq.com> writes: >My simplest suggestion --> Steal whatever VAX BASIC does. Cram it into the >next iteration of the standard. All mortal C programmers will sing your >praises.
And all the mortal customers will scream with outrage at the US-only software and go and buy something else.
Write 100 times: hardwired strings in a program are a barrier to localisation.
>Another solution: Have the COMPILER sort the strings AT COMPILE TIME and >place them in an internal array. Then do a BSEARCH of the array for the >input string to switch on. Not only portable, but also efficient.
Actually, NOT efficient. If you have N strings of length S, bsearch() can take O(S.lgN) time, whereas if you use a suitably represented trie, it can be done in O(S) time.
>Not only would they sing your praises, they would recount your glories >to their grandchildren and their great grandchildren.
Strings that are strictly internal to the operation of a program can be anything you want, but strings that may be of interest to an end-user had better be in the end-user's language and script. And if you are going to match strings the user typed against strings in some kind of table, you had better worry about whether the match should be case sensitive or not and you had better worry about the ISO 10646 canonicalisation rules and you had better worry about whether to distinguish between "APL LETTER ALPHA" and "GREEK LETTER ALPHA" or not, &c &c &c.
It has always been true that hardwiring strings into a program was a silly thing to do unless you had no alternative. For the past 10 or 12 years it has been OBVIOUSLY silly. By now, when Windows NT uses Unicode inside, and UNIX systems come with support for multibyte characters, and the X Window System has support for this stuff, it is PAINFULLY obvious that hardwiring ASCII string literals into your program is a good way to produce a product with no market outside your own country.
"mbc", "wchar_t", locales, and so on are there in C for a *REASON*. -- My tertiary education cost a quarter of a million in lost income (assuming close-to-minimum wage); why make students pay even more? Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.
Peter Seebach <se...@solon.com> wrote: > But what if one were to build a table? In every application I've had > where I wish to see which of a set of strings I have, I've had a known > predefined set of strings.
This looks suspiciously similar to what GPERF does. You give it a table of strings, and it builds a perfect hash function for your table, spewing out the appropriate C code. I've not used it myself in real code, but it looks quite handy. -- [mdw], who avoids mentioning the obvious use of `bsearch'.
`How can you be so mean to someone so meaningless?' -- Selina Kyle
: >Wouldn't It Be Nice if you could select from strings, though...
: Whenever I've needed to switch on a string there's usually just : a small portion (sizeof long or less) of the string that's significant. : A technique that I've used is to turn those bytes into something that I : can switch on. The OS that I use provides a MAKETYPE macro that, given : the address of something, will cast that something into another specified : type: : #define MAKETYPE(v, type) (*((type *)&v))
: Then I just switch on MAKETYPE(myString, long) to turn into a long and : use #defines for the cases so I'll have meaningful labels there.
: Note that this works for me. It is definitely not portable, primarily : because byte-swapping will significantly alter the result.
Seems to me like what you guys are trying to re-invent (seebs used the term "prior art") are associative arrays, indexed on a string, with an integer data type as data. I.e., the *opposite* of an array of char pointers. To my knowledge, a good implementation for this must be in the source code for Perl, because that language lives and dies by the efficiency of its $value("some string") constructs. I thinks it's done by means of hash tables.
******************************************************************* Martin F. Sohnius msohn...@novell.co.uk Novell IS & T, Bracknell, England +44-1344-724031 ******************************************************************* * if (status = UNDER_NUCLEAR_ATTACK) * * launch_full_counterstrike(); * ******************************************************************* (C) 1997 M.F.Sohnius -- free distribution on USENET (Not a spokesperson -- just a cyclist!)
Peter Seebach <se...@solon.com> wrote: >First, let me say, this is *NOT* the FAQ.
>Rather...
>Everyone knows switch() switches on integer expressions.
>Wouldn't It Be Nice if you could select from strings, though...
>Well, perhaps one could write "str_switch()", which takes a string, >and an array of strings, and returns the index in the array of a matching >string. Of course, this would be horribly inefficient - presumably at >least as expensive as the iterative strcmp() so often used as an >alternative, and possibly as bad as the endless if-else-if-else-if... >used in some programs.
I think the most efficient technique to str_switch would be to write an optimized state machine to recognize each of the strings. That way, at worst you'll have the cost of the iterative strcmp() solution, and at best, it'll be (waving hands) much better.
If you want to code your state machine yourself, more power to you. I'd use lex.
Although there is something to be said for hand-tuning them when your set of input strings is small. I can write a state machine for the strings { "this", "that", "other" } without much difficulty at all, and it'll likely be smaller and faster than lex.
Me? I use perl for string-intensive stuff these days. Apologies. ;) -- Steve Watt KD6GGD PP-ASEL-IA Packet: KD6GGD @ N0ARY.#NOCAL.CA.USA.NA ICBM: 121W 56' 58.1" / 37N 20' 14.2" Internet: Home: {root,ste...@Watt.COM Free time? There's no such thing. It just comes in varying prices...
- that's not meant to be C, or: - you've never done any contract work for NORAD.
-- //.\\/\\/\\\\//////\//\/\/\/\\\/\\\/\\\\/\//\/\\\//\\\\/\\\//\/\\//\\\/\\// \ Jason Tyler "What's so unpleasant about being drunk?" \ \ c9326...@cs.newcastle.edu.au "You ask a glass of water." -- HGTTG \ ///\\/\\/\\\\\\/\\/\/\\/\\\/\//\\///\\\\\///\//\\\\\\\///\/\\\/\/\\\//\\//\
Ok, I'll try to say something usefel here as well ... Actually I have no time to think about this, just couldn't resist to pass on some partial thoughts. Fasten seatbelts, here we go:
(Really, this is Q about algorithms rather than C, but who cares these days anymore)
After trying it all I have come to the conclusion that sorting alphbetically is a good idea.
(I tried sort by stringlength as well, but it won't buy you anything. Hashing is off since there is a lot of more useful things you could have done while racing through your teststring.)
---------------------------------------------------------------------- The way I understand the problem, the set of valid strings won't change during runtime.
The set of strings to test might, on the other hand, be supplied by the user (so there is always one more.) ----------------------------------------------------------------------
What did I say? Oh yes, the strings are _already_sorted_ alphabetically. The proper way would then be to perform a bSearch for the first letter in your test string
Assume you found it, Ok? Then test the second letter, and it was (surprise) also valid.
But! In this example, as you go on, the third letter was not valid ... (Oh shit)
Since the strings are sorted, the value will either be too small, meaning you still have a chance, or too big, meaning that there is no such string ...
At this point it would be nice to have a bit of help, no? Simply testing the next string at index 'n' will most probably send you into the void space of undefined behaviour.
So we cheat and do some precalculations (on the valid strings)!
Lets have an array of int (or short, or char). One for each string in our valid library.
The value of this int is equal to the number of chars that are unchanged between "this string" and "the next string".
If that (next) value is less than our current index, then we have lost. Else there is still a chance ... (repeat from top :)
> >My simplest suggestion --> Steal whatever VAX BASIC does. Cram it into the > >next iteration of the standard. All mortal C programmers will sing your > >praises.
> And all the mortal customers will scream with outrage at the US-only > software and go and buy something else.
> Write 100 times: > hardwired strings in a program are a barrier to localisation.
> >Another solution: Have the COMPILER sort the strings AT COMPILE TIME and > >place them in an internal array. Then do a BSEARCH of the array for the > >input string to switch on. Not only portable, but also efficient.
> Actually, NOT efficient. If you have N strings of length S, bsearch() > can take O(S.lgN) time, whereas if you use a suitably represented trie, > it can be done in O(S) time.
Well, you make an excellent point. But to store strings in a resource file, you still have to relink, don't you? So we can build the table at link time instead of at run time. The bsearch() function will only take O(s*log(n)) time if the strings differ only in the last character. Again, at link time, the C system could recognize this and start the string compare at the last character. If there is difficult data, with randomly different locations for the string differences, I suspect that the trie structure search will take just as long because we have a difficult entropy problem.
> >Not only would they sing your praises, they would recount your glories > >to their grandchildren and their great grandchildren.
> Strings that are strictly internal to the operation of a program can be > anything you want, but strings that may be of interest to an end-user > had better be in the end-user's language and script. And if you are > going to match strings the user typed against strings in some kind of > table, you had better worry about whether the match should be case > sensitive or not and you had better worry about the ISO 10646 > canonicalisation rules and you had better worry about whether to > distinguish between "APL LETTER ALPHA" and "GREEK LETTER ALPHA" or not, > &c &c &c.
Since I am doing an exact match, even memcmp() will work. I can do any binary comparison at all (even consider the data as an array of ints if the lengths happen to be evenly divisible by sizeof int). My solution is not bound by strings being internal to the program. If the strings actually change during the running of the program so that the table itself is not constant, then the proposed solution would require a sort of the strings before the bsearch at program start time, and probably not be worth the bother unless the program runs for a long time and uses the switch heavily. I believe that you also may have neglected the effort to convert the strings into a trie structure (if the strings are not known at start time). In such a case, while the lookup will be fast, you still have a cost of building your data structure. The data structure I proposed has no extra memory overhead except for the n array positions.
> It has always been true that hardwiring strings into a program was > a silly thing to do unless you had no alternative. For the past 10 > or 12 years it has been OBVIOUSLY silly. By now, when Windows NT > uses Unicode inside, and UNIX systems come with support for multibyte > characters, and the X Window System has support for this stuff, it is > PAINFULLY obvious that hardwiring ASCII string literals into your > program is a good way to produce a product with no market outside > your own country.
I did not make that requirement. I posted a simple example. If we consider such things as resource files (outside the bounds of the C language -- but we can generalize to 'data files') then the method I suggested works unless the strings change during the execution of the program itself. You would (of course) need quite an intelligent system to recognize all of this, I'll admit.
In article <5biotf$...@solutions.solon.com> "Dann Corbit" <dcor...@solutionsiq.com> writes: >My simplest suggestion --> Steal whatever VAX BASIC does. Cram it into the >next iteration of the standard. All mortal C programmers will sing your >praises.
>Another solution: Have the COMPILER sort the strings AT COMPILE TIME and >place them in an internal array. Then do a BSEARCH of the array for the >input string to switch on. Not only portable, but also efficient. Not >only would they sing your praises, they would recount your glories to their >grandchildren and their great grandchildren. >;-)
>/* what the programmer codes: */ >char *Animal; >/* ... */ >switch (Animal) >{ > case "BEAR": > /* do stuff */ > break; > case "LION": > /* do stuff */
Well it's only portable if it gets into the standard. There is some prior art for this. The languages Eh and Zed, (the other descendants from B), had such a construct. It turns out not to be a great winner. It was easy to code, and "close enough" to what was needed that programmers tended not to expend the extra effort to do a good job. The result was programs that had very rigid user interfaces. Programs tended to be excessively case sensitive, and could not be easily changed to accept abbreviations that seemed natural to the users.
Several library routines to do different types of string look ups and map to a integer that is switched on is more useful. Sometimes exact matches and speed are important, in which case binary searches of perfact hash look ups are what is needed. Other times you really want to define a "pattern" (not necessarily a GREP type pattern), that defines what should match.
We routinely use some home grown routines for command option parsing that take a table of strings like optable[] { "KeyWord", "OtherOption", NULL } Where the lowercase letters are one the user is allowed to omit.
>>>>> "Sean" == Sean 'Captain Napalm' Conner <s...@gate.net> writes:
[switching on strings]
Sean> When faced with this, I typically define a structure with two Sean> members, a string (char *) and a function pointer, then I have Sean> an array (usually sorted alphabetically) of such structures. Sean> The parameters the function takes usually depends upon the Sean> application, but it gives me a nice blend between speed and Sean> flexibility.
Have you considered using a perfect hash function rather than searching the array?
GNU gperf (a tool for generating perfect hash functions from arbitrary strings) is part of the libg++ distribution (it can generate either C or C++ code). All the GNU compilers use this method for handling keywords.
>******************************************************************* >Martin F. Sohnius msohn...@novell.co.uk >Novell IS & T, Bracknell, England +44-1344-724031 >******************************************************************* >* if (status = UNDER_NUCLEAR_ATTACK) *
^ Hey, warmonger! That's supposed to be ==!
Assuming UNDER_NUCLEAR_ATTACK != 0, you've just launched a nuclear strike on quite possibly friendly neighbours!
If builders built buildings the programmers write programs, the first woodpecker to come along would destroy civilization - unless Martin F. Sohnius gets there first.
>* launch_full_counterstrike(); *
Temper, temper!
>******************************************************************* >(C) 1997 M.F.Sohnius -- free distribution on USENET > (Not a spokesperson -- just a cyclist!)
Sincerely,
Gene Wirchenko
C Pronunciation Guide: y=x++; "wye equals ex plus plus semicolon" x=x++; "ex equals ex doublecross semicolon"
Well, if *everyone* knew, it wouldn't be a FAQ. :-)
> Wouldn't It Be Nice if you could select from strings, though...
Sure would. But you can't. Case labels can't contain string literals, and that (in my mind, at least) is what constitutes a "string switch".
> But what if one were to build a table? In every application I've had > where I wish to see which of a set of strings I have, I've had a known > predefined set of strings.
There are plenty of solutions for turning strings into predetermined integer constants; hashing or bsearch-ing a table immediately come to mind, and then there's lex-ing.
> So, the behavior could be something like > { > static ss_table *sst;
> if (!sst) { > sst = str_stable(stringtable); > if (!sst) > return 0; > } > switch (str_switch(str, sst)) { > case -1: /* failure */ > break; > case 0: /* 1st string in table */ > ... > } > }
Nothing wrong with this. I'd probably use "default:" where you put "case -1:".
However, no solution (that I've ever thought of) puts real live string literals on case labels, and failing that I wouldn't bother trying to generalize a string switch.
-- Richard Krehbiel, Kastle Systems, Arlington, VA, USA r...@kastle.com (work) or ri...@mnsinc.com (personal)
> The bsearch() function will only take > O(s*log(n)) time if the strings differ only in the last character.
The worst case here is s*log(n), but in the best case, s just becomes 1. I think more often than not s will be small. The main disadvantage, imho, is when you have to add or remove elements from the b array. Here you either have to copy the rest of the array around or wait until before you read and do a sort. If you are dynamically changing the contents of a large btab, the overhead of the trie or hash table may become less significant.
Hmm, this probably doesnt relate to the switch issue, does it... -- Mark Weissman weiss...@gte.com
Martin Sohnius x24031 <msohn...@novell.co.uk> wrote in article
> Seems to me like what you guys are trying to re-invent (seebs used > the term "prior art") are associative arrays, indexed on a string, > with an integer data type as data. I.e., the *opposite* of an array of > char pointers. To my knowledge, a good implementation for this > must be in the source code for Perl, because that language lives and > dies by the efficiency of its $value("some string") constructs. I thinks > it's done by means of hash tables.
Just for the record, Perl 4 calculates a hash using
which is OK for most purposes, but beware if you have very similiar strings. Perl 5 is a little better, it uses
char *s = key; int i = strlen(key); int hash = 0; while (i--) hash = hash * 33 + *s++;
-- ========================================================================== | Tim Behrendsen (t...@a-sis.com) | http://www.cerfnet.com/~timb | | "Judge all, and be prepared to be judged by all." | ==========================================================================
: >******************************************************************* : >Martin F. Sohnius msohn...@novell.co.uk : >Novell IS & T, Bracknell, England +44-1344-724031 : >******************************************************************* : >* if (status = UNDER_NUCLEAR_ATTACK) * : ^ : Hey, warmonger! That's supposed to be ==!
: Assuming UNDER_NUCLEAR_ATTACK != 0, you've just launched a : nuclear strike on quite possibly friendly neighbours!
: If builders built buildings the programmers write programs, the : first woodpecker to come along would destroy civilization - unless : Martin F. Sohnius gets there first.
: >* launch_full_counterstrike(); *
: Temper, temper!
WhoA! I never thought I'd get this response from posting my little .sig to c.l.c.moderated (where's the moderator?)
[Easily amused. -mod]
Look, guys, don't EVER judge from a code snippet if you haven't seen the whole thing!
What I didn't tell you is this:
#ifdef PACIFIST
enum {UNDER_NUCLEAR_ATTACK, ALL_CLEAR};
#else /* Armageddon */
enum {ALL_CLEAR, UNDER_NUCLEAR_ATTACK}; #endif
And, yes, I didn't share the belief that SDI was a good idea! :-)
Martin (no need for .sig -- it's all in the above)
In article <5c0tb4$...@solutions.solon.com> "Tim Behrendsen"
<t...@a-sis.com> writes: >.... >which is OK for most purposes, but beware if you have very similiar >strings. Perl 5 is a little better, it uses
> char *s = key; > int i = strlen(key); > int hash = 0; > while (i--) > hash = hash * 33 + *s++;
To avoid implementation defined behavior when 'hash' overflows, shouldn't this be re-written as:
unsigned char *s = key; int i = strlen(key); unsigned hash = 0; while (i--) hash = hash * 33U + *s++;
Also, Aho, Sethi and Ulman in "Compilers: Principles, Techniques, and Tools" imply that it would be beneficial to replace 33 by a prime number close to 2^(CHAR_BIT * sizeof(hash)) to encourage the randomizing bit-truncation that occurs with unsigned overflows. They suggest 65599 for 16 bit ints. Can anyone suggest a prime that is less than but close to 2^32?
> which is OK for most purposes, but beware if you have very similiar > strings. Perl 5 is a little better, it uses
> char *s = key; > int i = strlen(key); > int hash = 0; > while (i--) > hash = hash * 33 + *s++;
I presume it also takes pains to ensure that the table length is not a multiple of 3 or 11. If not, the early characters don't count in the hash.
I also hope that the declaration: "int hash" is a miscopy. I really can't believe that you want signed arithmetic here. (Remember that overflow in signed arithmetic is not defined. Using signed arithmetic in such cases is an error waiting to happen.)
-- James Kanze +33 (0)1 39 55 85 62 email: ka...@gabi-soft.fr GABI Software, Sarl., 22 rue Jacques-Lemercier, 78000 Versailles, France -- Conseils en informatique industrielle --