std::hash of basic_string

311 views
Skip to first unread message

sean.mid...@gmail.com

unread,
May 20, 2015, 2:09:45 PM5/20/15
to std-dis...@isocpp.org
I'm seeing that std::hash of a string with a custom allocator doesn't work (given the behavior of libstdc++ 4.8 and the reference at http://en.cppreference.com/w/cpp/string/basic_string/hash).

Which, as with the parsing stuff I raised last week, seems to be a defect. The allocator has zero reason I can think of to impact the hashing of a sequence of characters. I could see a custom char_traits causing problems, but a std::basic_string<char, char_traits<char>, my_allocator<char>> should work with std::hash in my opinion and hash identically to a std::string with the same contents. Mostly because there's no portable way to emulate std::hash<std::string> for another char-using string type without allocating a temporary.

Am I missing a good reason for this not to work, or is it just an oversight?

Nevin Liber

unread,
May 20, 2015, 2:24:20 PM5/20/15
to std-dis...@isocpp.org
On 20 May 2015 at 13:09, <sean.mid...@gmail.com> wrote:
I'm seeing that std::hash of a string with a custom allocator doesn't work (given the behavior of libstdc++ 4.8 and the reference at http://en.cppreference.com/w/cpp/string/basic_string/hash).

Which, as with the parsing stuff I raised last week, seems to be a defect.

I concur, but I don't know the history to be sure. 
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Howard Hinnant

unread,
May 20, 2015, 2:52:49 PM5/20/15
to std-dis...@isocpp.org
On May 20, 2015, at 2:23 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
>
> On 20 May 2015 at 13:09, <sean.mid...@gmail.com> wrote:
>> I'm seeing that std::hash of a string with a custom allocator doesn't work (given the behavior of libstdc++ 4.8 and the reference at http://en.cppreference.com/w/cpp/string/basic_string/hash).
>>
>> Which, as with the parsing stuff I raised last week, seems to be a defect.
>>
> I concur, but I don't know the history to be sure.

I concur as well. In fact, I’ll go so far as to say that std::hash<T> itself is broken because it locks people into an unknown and inflexible hashing algorithm. When std::hash<std::string> is attacked, it can be difficult to do anything about it.

The proposed fix for both of these problems is here:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html

(search for “basic_string” for the proposed wording to fix the allocator problem)

This proposal will be updated with (at least) the following changes for Kona:

1. The hash algorithm finalization step will be renamed from "operator result_type()” to something along the lines of “result_type finalize()”.

2. The signature of the hash algorithm “update” function will be changed from:

void operator()(void const* key, size_t len) noexcept;

to:

void operator()(unsigned char const* key, size_t len) noexcept;

3. The link to the implementation, which is in the current paper, but hard to find, will be made more prominent. For convenience, here is the link:

https://github.com/HowardHinnant/hash_append

Howard

Sean Middleditch

unread,
May 20, 2015, 6:21:57 PM5/20/15
to std-dis...@isocpp.org
Thanks Howard.

I've used N3980-like constructs in other projects and indeed found it
to be more pleasant to work with overall, and with fewer bugs. Good to
know that it'll solve the basic_string problem as well, if adopted.


-- on N3980 --

An issue I have with N3980 as written is that you always hash_append
the string's size (same with other containers), which means that a
hasher over a buffer of UTF-8 bytes produces different results than a
hash of a std::basic_string of UTF-8 characters; it would be ideal if
the user had a way to select whether they wanted that extra metadata
to be included or not. I often need a hash of the contents using a
known algorithm, not the metadata, especially when inter-operating
with other libraries or tools that only hash the contents.

Another case might be hashing a vector of string fragments against a
fully-assembled string, which I have encountered here and there
(especially when comparing paths, where you may not want to allocate a
string just to assemble a path when you already know the component
pieces).

The variants of N3980 I've written for myself just leaves out the
metadata, as the concerns over uniqueness are moot for my common uses
in game engines. I'm not sure what the most ergonomic extensibility
mechanism might be, but something that does the equivalent of:

template <class HashAlgorithm, class T, class Alloc>
void
hash_append(HashAlgorithm& h, std::vector<T, Alloc> const& v) noexcept
{
for (auto const& t : v)
hash_append(h, t);

if (h.uses_metadata())
hash_append(h, v.size());
}

// override siphash to avoid use of metadata like container sizes
struct MyHash : std::siphash
{
constexpr bool uses_metadata() const { return false; }
};

using hasher = uhash<MyHash>;
assert(hasher{}(vector<string>({"ab", "cd"})) == hasher{}(string{"abcd"}));

That also lets a user write a version of the hash algorithm that makes
its metadata choice at runtime, if they need it.

Apologies if this has already been brought up; my Google-fu on the ISO
groups is failing me and I'm not finding the previous discussions on
the paper, though I know I've seen them before.
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
> To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/gtfSBtckVEQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to std-discussio...@isocpp.org.
> To post to this group, send email to std-dis...@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.



--
Sean Middleditch
http://seanmiddleditch.com

Howard Hinnant

unread,
May 20, 2015, 8:21:30 PM5/20/15
to std-dis...@isocpp.org
Thanks for the thoughtful comments Sean. I will certainly keep these in mind as the proposal (hopefully) winds its way through committee.

Howard

1971 powerChina

unread,
May 25, 2015, 2:14:51 AM5/25/15
to std-dis...@isocpp.org, sean.mid...@gmail.com
Reply all
Reply to author
Forward
0 new messages