Again, another attempt to clarify.
First, in Smalltalk hash is a special instance variable, part of the header of ANY object in Smalltalk, with the exception of immediate objects (standing for themselves).
It is accessed by basicHash and basicHash: primitives and is able to hold an unsigned 15 bit integer.
This hash, being an integer, represents the object, its value characterizes the object, not its content!
The pair of hash and the class forms a quick non-equal-identity check (due to hash collisions, two different objects may have the same hash).
This hash is set as part of Object new and ought to be kept fixed for the life time of the object. It characterizes the object identity.
So the == operation is partially covered by a quick integer comparison.
Several other constructs are based upon this hash: e.g. the hash serves to select bins/partitions where many objects are kept for some purpose, so to reduce the search space (bin).
If the objects content is changed normally that should not change the hash (as the identity is kept fix to help to support any navigation based on it stable).
Sometimes, when a hash has to be changed (for other purposes), then usually any construct based on the initial hash has to reorganize / this is usually is named rehash.
The point is once having been assigned a hash value at the object creation point, its value is fixed.
Any basicHash therefore is a getter (and as fast as any other instance variable access) to a precomputed value.
The hash code is e.g. invariant to the location of the object (e.g. does not change when being moved by GC).
In other computer languages such identity checks are realized as pointer comparison (a pointer again seen as integer object identity), as long as the objects are not moved in the memory. Many procedural languages provide a heap to allocate objects, which are not moved later.
In Smalltalk, such a pointer comparison would fail (except after sending it makeFixed - but makeFixed is another topic, important in outerworld parameter communication of in and outward calls), because the location of an object is not fix, it is moved in the image, while GC is working.
So this hash performs good. It is calculated only once in Object class>>basicNew.
The example of Hans Martin is not a strange wizzardry of the compiler, this a general design to cope with any literals in any compiled method.
As any constant in a method code has to be provided at runtime, they are held as literals (see CompiledMethod allLiterals).
To keep method code free of redundancy, any literal is kept only once per method (no duplicates).
So even if the same string, class, symbol, pool association, .... appears many times in the source code of a method, only one instance of it is kept in the literals.
As the method keeps only one instance, thus having also only one hash in it, so asking the same object it will answer the same value, its hash.
John, a word to your example: a copy of an object provides another object, hence it has a new hash code, initialized when it is allocated, indepent of its content. Hence the extension provided by basicHash: range: is different because the basicHash in it is already different (as the copy is another object than the original). It does not matter how long the strings are or what they contain, they are (s1, s2) different objects!
The basicHash:range: construct is a specific reaction to avoid (basic)hash collisions in certain hash distributions.
Then to the so called immediate objects:
Immediate objects
are always singletons, they cannot be allocated, modified or copied.
They just exist, examples are $A, nil, true, false, 5 or 5.2e4 . For some of such objects, the class method allInstances fails (being useless).
This opens a wide field of optimization, including their hash (see my comment about special hash values for them).
A hash value can simply be defined here - and does not need even an initial computation.
However, a string, even a string constant like "abab", is not immediate: it is an ordinary object like any others (actually an Array of Characters).
So I turn to the readOnly property of any object: this is the only difference between String new: x and '.....' having length x.
You can send 'abbaba' markReadOnly: false and then modify its content (it is an array of characters, write lock removed).
You can also investigate
'abbaba' basicHash, change the read only state, modify it and finally look again at its basicHash, it has to stay the same.
The markReadOnly is independent of the objects identity and the 'ahjahjhs' hash follows the general rule of any object.
Thus any use of IdentitySets and IdentityDictionary suffer from the same restriction: making use of basicHash as an unsigned 15 bit integer.
No matter which objects are kept, when the amount of objects nears 2**15 their hashes tend to collide (independent from any content).
As I started initially with the hash as a
quick non-equal-identity check the consequence is you have to dive deeper if hashes are equal.
So as Henry stated before: if hashes collide, performance will fall dramatically because of the unavoidable collision handling.
So it is important for hashed based solutions (like Sets and Dictionaries) to study the hash value distribution if they do not perform.
You have to design the hash to avoid hash collisions.
And you have to learn from the Object hash example: it is computed once and reused later, providing an identity mark as hidden instance variable.
In certain environments, e.g. a UUID serves exactly this purpose, being computed once for an instance and then kept stable (however UUID are thought to be universally unique, not only in the current image).
It would be counterproductive in most cases to compute the hash more than once, when the object is allocated.
To conclude: to decide whether to use IdentySets/Dictionaries or ordinary Sets/LookupTable/Dictionary you have to define how any object can be found: by its object identity or by comparing its content (equality). A content comparison in general is expensive. Alternatively, objects had to be provided by some identity mark, which is (in Smalltalk traditionally since the 80ies, see the blue book) approached by a hash code.
And this in turn will ask questions about the involved hash distributions.
And also think about framework design, concerning the existance of wanted or unwanted copies. Obey the design decisions concerning immediate objects to avoid copies and thus avoid complex hash calculations. See ##Atoms of how copies are avoided, using a central registry.
And finally if copies of objects cannot be avoided, do not use IdentitySets/Dictionaries.
From this standpoint, IdentitySets/Dictionaries
can ONLY be usefull when object copies are impossible, either physically (as by immediate objects, automatically garanteed singletons) or by design (like Atoms or certain persistent object frameworks).
This is to close the loop to the start of this conversation where it starts with an Identity Set and two different objects.