This technique of adapting types to the framework appears to me to be similar to what Boost.Serialization does. I wonder if it could be generalized to things beyond hashing, like serialization.
Regards,
&rzej
I'm not sure that hashing and serialization can be combined so simply in a common interface.For serialization, when extracting a sequence container, you usually need to first extract the size of the container.For hashing, it is probably not desirable to hash_append the size of the container.
On Thursday, April 3, 2014 3:42:37 PM UTC-4, rhalb...@gmail.com wrote:On Thursday, April 3, 2014 4:06:02 PM UTC+2, Andrzej Krzemieński wrote:This technique of adapting types to the framework appears to me to be similar to what Boost.Serialization does. I wonder if it could be generalized to things beyond hashing, like serialization.
Regards,
&rzejBoth hashing/serialization extract the bit sequence. A similar thing would be to allow to insert the bit sequence. This allows randomly generating any class X (e.g. to stress test the very hash function you are interested in!).So perhaps one could define two lower-level functions bit_insert() and bit_extract() [or maybe names like bit_push_back()/bit_pop_front() to reflect FIFO character of bit sequence?] on top of which higher-level functionality like hash_append, serialize() and random_generate() could be written.Rein
--
Howard
On Thu, Apr 3, 2014 at 1:03 PM, Alex B <deva...@gmail.com> wrote:
I'm not sure that hashing and serialization can be combined so simply in a common interface.For serialization, when extracting a sequence container, you usually need to first extract the size of the container.For hashing, it is probably not desirable to hash_append the size of the container.As has been demonstrated already, leaving out the size when hashing a container results in a bad hash in this model; [ [1], [2, 3] ] and [ [1, 2], [3] ] would hash the same.
It might be helpful if the paper explicitly discussed how it relates to N3333 and N3898, e.g. where any points of agreement or disagreement are, etc.
The general interface has no reliance on the result type. The types
being hashed operate on a template "hasher" type with no assumptions
about what the hasher ultimately produces. The example also shows that
a static_cast<size_t>() is used at least in one case, indicating that
a hasher could provide operator size_t(), operator std::array<char,
16>(), or whatever else it likes.
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
We're aiming at the pre-Rapperswil mailing. Any feedback, positive or negative, will be much appreciated.
class type_erased_hasher {
public:
template <class Hasher>
explicit type_erased_hasher(Hasher &hasher)
: h_(std::ref(hasher))
{}
void operator()(const void *key, size_t len) {
h_(key, len);
}
private:
std::function<void (const void *, size_t)> h_;
};
On 12 Apr 2014 11:25, "Howard Hinnant" <howard....@gmail.com> wrote:
>
> What is your easy proof that if you prefix with the size of containers rather than postfix, that there can be no collisions? I don't doubt that you are correct that there will be no collisions. But I fail to see how prefix vs postfix makes a difference.
>
The advantage of prefix is that first we read the length, then we know how many values we have to read to get to the end of the object, so we can reconstruct the original object easily from the values we feed to hash_append. Similarly with recursive objects, we still always read the length first, so know how far too read. With postfix we are never sure if we are reading another value in the object, or the length at the end of it. Of course it might turn out even with postfix you can prove there is only one valid reading, but it is not so obvious to me.
Note also that no other hash-related paper I'm aware of that has been submitted to the standards committee even deal with this issue at all. Indeed, they seem to leave it up to the client to get right. This proposal strongly advocates that this detail, however it may be resolved, should be resolved by std::vector itself, not by clients who want to hash a vector. I've come to believe that asking your clients to do your hashing for you is akin to asking your clients to compute operator== for you, or swap for you.
On 13 Apr 2014 16:02, "Christopher Jefferson" <ch...@bubblescope.net> wrote:
>
>
> On 12 Apr 2014 11:25, "Howard Hinnant" <howard....@gmail.com> wrote:
> >
> > What is your easy proof that if you prefix with the size of containers rather than postfix, that there can be no collisions? I don't doubt that you are correct that there will be no collisions. But I fail to see how prefix vs postfix makes a difference.
> >
>
> The advantage of prefix is that first we read the length, then we know how many values we have to read to get to the end of the object, so we can reconstruct the original object easily from the values we feed to hash_append. Similarly with recursive objects, we still always read the length first, so know how far too read. With postfix we are never sure if we are reading another value in the object, or the length at the end of it.
Postfix is exactly the same. Just parse right to left.
Postfix is exactly the same. Just parse right to left.
You bring up a good point, and I agree that all 3 should hash identically. However I do not think there is a problem in hashing the appending null char:
As it turns out, one can not hash an unordered container. There is no way to satisfy rule 1:
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html#hash_append_rules
because two unordered containers can compare equal, and yet have their elements in different orders. hash_append can not specify the proper order to hash_append the elements. Therefore hash_append for unordered containers simply will not exist, so if someone tries to hash an unordered container, they will get a compile-time error.
I’m now wondering if T[N] should be appended with size_t(N), and/or if we should specialize char[N] et al. I will give this more thought.
>
> Trying to ensure hash consistency between objects of different types that compare equal is almost certainly impractical. Consider integers of different sizes, integer vs double, static vs variable-length containers.
I’m hoping to lay the groundwork for:
unordered_set<string> s;
// …
s.find(“some text”); // avoids constructing string(“some text”)
Updated:
http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html
* unordered containers are mentioned, but not handled. Examples of ways to handle them are included.
Howard
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.