I dont understand this hash table calculation

Chad

unread,

May 3, 2006, 10:07:53 AM5/3/06

to

The question stems from the following paper:

http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/index.html

I'm drawing a blank on the following statement:

"However, if each element hashes to the same bucket, the hash table
will also degenerate to a linked list, and it will take O(n*n) time to
insert n elements. "

I thought it took constant time (ie O(1) ) to search a hash table and
linear time to search a linked list. How do they get n^2 for the
insertion time?

Chad

Arthur J. O'Dwyer

unread,

May 3, 2006, 10:21:39 AM5/3/06

to

If all your inputs hash to the same bucket, then you might as well have
a one-bucket hash table --- the other unused buckets don't need to be
there.
A one-bucket [chained] hash table is a linked list.
Insertion of one element into a linked list, in order, takes O(n).
Therefore insertion of n elements takes O(n*n).

Why do we insert in order? Well, we could just push the new element onto
the front of the list in constant time, but that would put the elements in
random order, so our lookups would have to scan to the end --- doubling
the cost of each and every lookup. Since insertions are less frequent than
lookups, it makes sense to choose the in-order solution... in this
degenerate case. In a nice evenly distributed hash table, insertion at
the head of the list should be perfectly all right.

HTH,
-Arthur

Googmeister

unread,

May 3, 2006, 10:43:05 AM5/3/06

to

A common convention for symbol tables is to disallow duplicate keys:
if the key already exists in the symbol table, you overwrite its value
with the new one. Under this scenario, it would take Theta(k) to insert
into an unsorted list of size k, and Theta(n^2) to do n insertions. If
all elements hash to the same bucket, the hash table degenerates
to an unsorted list.

CBFalconer

unread,

May 3, 2006, 11:05:14 AM5/3/06

to

Linear time is O(n). Doing this for each of n items results in
O(n*n). However there is no need to handle overflows in this crude
way.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

toby

unread,

May 3, 2006, 11:49:36 AM5/3/06

to

Arthur J. O'Dwyer wrote:
> On Wed, 3 May 2006, Chad wrote:
> >
> > The question stems from the following paper:
> >
> > http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/index.html
> >
> > I'm drawing a blank on the following statement:
> >
> > "However, if each element hashes to the same bucket, the hash table
> > will also degenerate to a linked list, and it will take O(n*n) time to
> > insert n elements. "
> >
> > I thought it took constant time (ie O(1) ) to search a hash table and
> > linear time to search a linked list. How do they get n^2 for the
> > insertion time?
>
> If all your inputs hash to the same bucket, then you might as well have
> a one-bucket hash table --- the other unused buckets don't need to be
> there.
> A one-bucket [chained] hash table is a linked list.
> Insertion of one element into a linked list, in order, takes O(n).
> Therefore insertion of n elements takes O(n*n).
>
> Why do we insert in order? Well, we could just push the new element onto
> the front of the list in constant time, but that would put the elements in
> random order, so our lookups would have to scan to the end

I don't see the logic here. A lookup would be O(n) regardless of the
ordering of the list elements?

Ben C

unread,

May 3, 2006, 12:13:33 PM5/3/06

to

On 2006-05-03, toby <to...@telegraphics.com.au> wrote:
>
> Arthur J. O'Dwyer wrote:
>> On Wed, 3 May 2006, Chad wrote:
>> >
>> > The question stems from the following paper:
>> >
>> > http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/index.html
>> >
>> > I'm drawing a blank on the following statement:
>> >
>> > "However, if each element hashes to the same bucket, the hash table
>> > will also degenerate to a linked list, and it will take O(n*n) time to
>> > insert n elements. "
>> >
>> > I thought it took constant time (ie O(1) ) to search a hash table and
>> > linear time to search a linked list. How do they get n^2 for the
>> > insertion time?
>>
>> If all your inputs hash to the same bucket, then you might as well have
>> a one-bucket hash table --- the other unused buckets don't need to be
>> there.
>> A one-bucket [chained] hash table is a linked list.
>> Insertion of one element into a linked list, in order, takes O(n).
>> Therefore insertion of n elements takes O(n*n).
>>
>> Why do we insert in order? Well, we could just push the new element onto
>> the front of the list in constant time, but that would put the elements in
>> random order, so our lookups would have to scan to the end
>
> I don't see the logic here. A lookup would be O(n) regardless of the
> ordering of the list elements?

If the bucket's sorted, you can do a binary search to lookup within the
bucket, which is better than 0(n).

Normally you wouldn't sort the buckets though, because you're already
using a hashtable and so the idea is each bucket shouldn't be that long.

Which is what Arthur said:

>> [...] In a nice evenly distributed hash table, insertion at the head

Arthur J. O'Dwyer

unread,

May 3, 2006, 12:38:24 PM5/3/06

to

On Wed, 3 May 2006, Ben C wrote:
> On 2006-05-03, toby <to...@telegraphics.com.au> wrote:
>> Arthur J. O'Dwyer wrote:
>>>
>>> Why do we insert in order? Well, we could just push the new element onto
>>> the front of the list in constant time, but that would put the elements in
>>> random order, so our lookups would have to scan to the end
>>
>> I don't see the logic here. A lookup would be O(n) regardless of the
>> ordering of the list elements?
>
> If the bucket's sorted, you can do a binary search to lookup within the
> bucket, which is better than 0(n).

Well, binary search on a linked list is tricky, I think :) (although
algorithms for doing it, sort of, have been given in this newsgroup in
the past).
I meant exactly what I said: "our lookups would have to scan to the
end --- doubling the cost of each and every lookup." Scanning the entire
list every time is exactly twice as expensive as scanning (on average)
half of a sorted list. Big-O had nothing to do with my statement.

N.B.: My reason for scanning the list was to speed up lookups.
Googmeister gives a better reason for insertion to scan the list:
to avoid duplicates. And if you're scanning the list anyway, you
might as well keep it sorted.

-Arthur

Message has been deleted

toby

unread,

May 3, 2006, 2:57:18 PM5/3/06

to

Arthur J. O'Dwyer wrote:
> On Wed, 3 May 2006, Ben C wrote:
> > On 2006-05-03, toby <to...@telegraphics.com.au> wrote:
> >> Arthur J. O'Dwyer wrote:
> >>>
> >>> Why do we insert in order? Well, we could just push the new element onto
> >>> the front of the list in constant time, but that would put the elements in
> >>> random order, so our lookups would have to scan to the end
> >>
> >> I don't see the logic here. A lookup would be O(n) regardless of the
> >> ordering of the list elements?
> >
> > If the bucket's sorted, you can do a binary search to lookup within the
> > bucket, which is better than 0(n).
>
> Well, binary search on a linked list is tricky, I think :) (although
> algorithms for doing it, sort of, have been given in this newsgroup in
> the past).
> I meant exactly what I said: "our lookups would have to scan to the
> end --- doubling the cost of each and every lookup." Scanning the entire
> list every time is exactly twice as expensive as scanning (on average)
> half of a sorted list.

Sorry, you're quite right, I didn't think it through.

Logan Shaw

unread,

May 3, 2006, 8:12:28 PM5/3/06

to

toby wrote:

> Ben C wrote:
>> On 2006-05-03, toby <to...@telegraphics.com.au> wrote:

>>> I don't see the logic here. A lookup would be O(n) regardless of the
>>> ordering of the list elements?

>> Normally you wouldn't sort the buckets though, because you're already

>> using a hashtable and so the idea is each bucket shouldn't be that long.
>

> Right. No argument there. I just can't see how sorting it changes
> anything.
>
> Arthur says that linear searching in a sorted list (for a match) is
> quicker than sorting a randomly ordered list. I cannot see how this is
> so. In either case, you will on average check half the list.

With an unsorted linked list, you have to check every element of the
list every time, so you average 100%. With a sorted linked list, you
will average 50%[1].

- Logan

[1] Getting back to the concept of security, you will only average
50% if it's an even distribution. Someone could feed you
numbers in the reverse order that you'd sort them in, and
that would have you going to the end of the list every time.
Luckily, that'd only be twice as bad, so probably not that
effective a denial of service attack...

Chad

unread,

May 3, 2006, 9:39:55 PM5/3/06

to

> Why do we insert in order? Well, we could just push the new element onto
> the front of the list in constant time, but that would put the elements in
> random order, so our lookups would have to scan to the end --- doubling
> the cost of each and every lookup. Since insertions are less frequent than
> lookups, it makes sense to choose the in-order solution... in this
> degenerate case. In a nice evenly distributed hash table, insertion at
> the head of the list should be perfectly all right.
>
> HTH,
> -Arthur

I think I'm missing the concept of ordering a list. Say I have a list
of words like:

Apples
Oranges
Cherries

If I push each onto the font (starting with Apple), I get

Cherries, Oranges, Apples

Or as I understand it, random order.

Now, I just insert at the end. This becomes
Apples, Oranges, Cherries.

But if I sorted the list, it becomes
Apples, Cherries, Oranges.

The only thing I see is that by inserting at the end, we would preserve
the order of the original list. What did I miss here?

Chad

Arthur J. O'Dwyer

unread,

May 3, 2006, 11:29:50 PM5/3/06

to

On Wed, 3 May 2006, Chad wrote:

[I wrote, and Chad snipped attribution of:]

>> Why do we insert in order? Well, we could just push the new element onto
>> the front of the list in constant time, but that would put the elements in
>> random order, so our lookups would have to scan to the end --- doubling
>> the cost of each and every lookup. Since insertions are less frequent than
>> lookups, it makes sense to choose the in-order solution... in this
>> degenerate case. In a nice evenly distributed hash table, insertion at
>> the head of the list should be perfectly all right.
>

> I think I'm missing the concept of ordering a list. Say I have a list
> of words like:
>
> Apples
> Oranges
> Cherries
>
> If I push each onto the font (starting with Apple), I get
>
> Cherries, Oranges, Apples
>
> Or as I understand it, random order.

Right.

> Now, I just insert at the end. This becomes
> Apples, Oranges, Cherries.
>
> But if I sorted the list, it becomes
> Apples, Cherries, Oranges.

Right.

> The only thing I see is that by inserting at the end, we would preserve
> the order of the original list. What did I miss here?

Nothing, yet. But notice the following:

Search for "Cherries" in the unsorted list:
is Cherries == Cherries? yes, return

Search for "Cherries" in the sorted list:
is Apples == Cherries? no, continue
is Cherries == Cherries? yes, return

Search for "Bananas" in the unsorted list:
is Cherries == Bananas? no, continue
is Apples == Bananas? no, continue
is Oranges == Bananas? no, continue
return

Search for "Bananas" in the sorted list:
is Apples >= Bananas? no, continue
is Cherries >= Bananas? yes, return

Half the search time, on average, if the item is not in the list ---
and no worse, if the item is in the list.

-Arthur

Chad

unread,

May 4, 2006, 12:47:18 AM5/4/06

to

?
>
> Nothing, yet. But notice the following:
>
>
> Search for "Cherries" in the unsorted list:
> is Cherries == Cherries? yes, return
>
> Search for "Cherries" in the sorted list:
> is Apples == Cherries? no, continue
> is Cherries == Cherries? yes, return
>
> Search for "Bananas" in the unsorted list:
> is Cherries == Bananas? no, continue
> is Apples == Bananas? no, continue
> is Oranges == Bananas? no, continue
> return
>
> Search for "Bananas" in the sorted list:
> is Apples >= Bananas? no, continue
> is Cherries >= Bananas? yes, return
>
> Half the search time, on average, if the item is not in the list ---
> and no worse, if the item is in the list.
>
> -Arthur

Maybe I need to sleep on this some more, but let's go back to the
original list.

Apples
Oranges
Cherries

If I would insert this into the list so that it is:
Apples, Oranges,Cherries

Would this list be considered sorted for your above example to work?

Chad

toby

unread,

May 4, 2006, 2:00:25 AM5/4/06

to

I failed to see Arthur's point initially. But the principle behind what
he's saying is, that you can stop looking in a sorted list once you
reach a value greater than your key. Assuming a uniform distribution of
values and keys, this will happen on average halfway through the list.
The optimisation is not available on an unsorted list.

>
> Chad