Am 01.07.2013 05:09, schrieb ralph:
>>> Note that the key in the collection is normally hashed so only an extra
>>> few bytes per entry is used.
>>
>> This is strange. For speeding up the read access it's logical to use hashes,
>> but still, if the full information of the key is not stored somewhere (to
>> check "special cases"), it would be subject to collisions (to different keys
>> pointing to the same element).
>>
>
> Since an error is thrown with any attempt to add an item with an
> existing key, and you can easily add multiple copies of the same
> object to a collection - as long as each entry has a different key -
> what are these special cases and possible collisions?
I think, the second part in Eduardos statement:
"... pointing to the same element"
is a bit misleading, because what he meant was
probably:
"if the full information of the key is not stored somewhere,
and the calculated Hash-Values of different Keys collide
(come out the same - and point to the same HashIndex-Pos),
then there's no additional criterium, to separate those
same Entries in the HashTable for a successful Lookup".
Or something along those lines...
Hash-Values are like some sort of CRC32 (but with shorter
BitLength') - and so the collision-probability for String-
Keys (producing the same Hash-Index) is relatively high,
and mechanisms need to be in place, to do safe fallbacks
in these cases. The common (and easiest) approach is, to
just store under the same HashIndex also the colliding
entries, should they happen - and to keep them "unique" and
distinguishable there, we cannot only store the Item under
the given HashValue, but the String-Keys need to be stored
in a "lossless format" as well - to be able to perform the
(then of course smaller) fallback-loop (under the given HashIndex).
Olaf