Account Options

  1. Sign in
Google Groups Home
« Groups Home
Message from discussion Bug in qsort-function under Solaris 2.4 ?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Richard A. O'Keefe  
View profile  
 More options Oct 17 1996, 3:00 am
Newsgroups: comp.unix.solaris, comp.unix.programmer, comp.lang.c, comp.std.c
From: o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe)
Date: 1996/10/17
Subject: Re: Bug in qsort-function under Solaris 2.4 ?

bar...@tools.bbnplanet.com (Barry Margolin) writes:
>In article <53iel5$es...@goanna.cs.rmit.edu.au>,
>Richard A. O'Keefe <o...@goanna.cs.rmit.edu.au> wrote:
>>For strcmp() this is probably frivolous, but efficient implementations of
>>strspn(), strcspn(), or strpbrk() *do* rely on modifying a global table;
>>this turns an O(m*n) operation into an O(m+n) one and is well worth doing.
>While I understand why use of a table makes the implementation efficient,
>and further that the standard allows this table to be a global variable, I
>don't see why it's necessary for the table to be global to achieve the
>efficiency.  It could be allocated on the stack, or if it's too big for the
>stack it could be malloc'ed on entry and freed on exit from those
>functions.  It's clear that this is slightly less efficient than not having
>to allocate a new table each time, it's just a one-time expense for the
>call, and still allows the inner loop to be fast.

Why not?  Because for efficiency, the table must be *incrementally*
revised, not reinitialised from scratch each time.

Here's the trick, which I invented in about 1982 for use in my Emacs-like
editor "thief".  It is probably well known.

        unsigned char table[Alphabet_Size];     /* 7 or 8 bit alphabet */
        unsigned char cycle = 0;

        void set_up_table(unsigned char const *set, bool include_NUL) {
            if (set != 0) {             /* useful for strtok() too */
                if (cycle == Alphabet_Size - 1) {
                    memset(table, 0, sizeof table);
                    cycle = 1;
                } else {
                    cycle = cycle + 1;
                }
                while (*set) table[*set++] = cycle;
            }
            table[0] = include_NUL ? cycle : 0;
        }

        #define in_current_set(c) (table[c] == cycle)

        size_t strspn(char const *src, char const *set) {
            unsigned char const *s = (unsigned char const *)src;
            size_t i;

            assert(set != 0);
            set_up_table((unsigned char const *)set, false);
            for (i = 0; in_current_set(s[i]); i++) {}
            return i;
        }

        size_t strcspn(char const *src, char const *set) {
            unsigned char const *s = (unsigned char const *)src;
            size_t i;

            assert(set != 0);
            set_up_table((unsigned char const *)set, true);
            for (i = 0; !in_current_set(s[i]); i++) {}
            return i;
        }

with the other variations being obvious.

It is clear that the for() loops in str{,c}spn() are O(|src|).
It should also be clear that set_up_table() has one part that
takes O(|set|) time and another part that takes O(Alphabet_Size)
time but happens only once in every Alphabet_Size - 1 calls,
so the amortised cost is O(1) for that part.  So each call to
str{,c}spn() takes O(|src| + |set| + 1) time.

If a new table were allocated on each call, the cost of this method
would soar to O(|src| + |set| + Alphabet_Size), which would not be
so good.  It's not a case of "slightly" less efficient, it's a case
of DRAMATICALLY less efficient, with more time being spent setting
up the table than doing useful work.

The neat thing about the present scheme is that it scales to 16-bit
characters reasonably well if you make 'table' and 'cycle' unsigned
short.  But reallocating a 128k table every time, whether on the stack
or not, would be too costly because of the need to initialise it.

I do not know whether any existing commercial implementations of C
use this incrementally-revised table technique for strspn(), strcpspn(),
strpbrk(), or strtok().  Timing tests I've done on a couple of UNIX
systems suggested to me that they didn't, which is why I don't actually
_use_ any of those functions (not to mention the brain-dead interface
of strtok()).

--
Mixed Member Proportional---a *great* way to vote!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.