Hi Chandler,
I just skimmed over the proposal and it seems very interesting to me.
Unfortunately I did not have time to look further into it.
One point that I found surprising is that your hashing is in most cases
notably faster than the old LLVM hashing, except for key sizes which are
a modulo of 4. Here the existing hash function is a lot faster than your
new solution. If keys that are multiple of 4 bytes are common in LLVM,
it might be worth to check if this change does not cause performance
regressions.
> old_llvm_performance.txt
>
>
> -------------------------------------------------------------------------------
> --- Testing llvm (LLVM Hashing)
>
> [[[ Speed Tests ]]]
>
> Small key speed test - 4-byte keys - 16.77 cycles/hash 7.39 nanos/hash
> Small key speed test - 8-byte keys - 25.93 cycles/hash 11.43 nanos/hash
> Small key speed test - 12-byte keys - 30.39 cycles/hash 13.39 nanos/hash
> Small key speed test - 16-byte keys - 30.18 cycles/hash 13.30 nanos/hash
> Small key speed test - 20-byte keys - 39.33 cycles/hash 17.34 nanos/hash
> Small key speed test - 24-byte keys - 44.70 cycles/hash 19.70 nanos/hash
> Small key speed test - 28-byte keys - 49.17 cycles/hash 21.68 nanos/hash
> new_llvm_performance.txt
>
>
> -------------------------------------------------------------------------------
> --- Testing std64 (LLVM Standard-baed Hashing)
>
> [[[ Speed Tests ]]]
>
> Small key speed test - 4-byte keys - 40.22 cycles/hash 17.73 nanos/hash
> Small key speed test - 8-byte keys - 40.24 cycles/hash 17.73 nanos/hash
> Small key speed test - 12-byte keys - 44.69 cycles/hash 19.70 nanos/hash
> Small key speed test - 16-byte keys - 44.69 cycles/hash 19.70 nanos/hash
> Small key speed test - 24-byte keys - 50.95 cycles/hash 22.46 nanos/hash
> Small key speed test - 28-byte keys - 50.95 cycles/hash 22.46 nanos/hash
Cheers
Tobi
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> Howard, high-level feedback from you would be particularly appreciated as I would love to contribute this to libc++ when the time is right.
Does the enclosed implementation implement this part of N3333:
http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2012/n3333.html#per.process.seed
?
That to me seems like potentially the most controversial part. I scanned Hashing.h but did not immediately see that support (I could've easily missed it).
Howard
On Feb 28, 2012, at 6:34 AM, Chandler Carruth wrote:Does the enclosed implementation implement this part of N3333:
> Howard, high-level feedback from you would be particularly appreciated as I would love to contribute this to libc++ when the time is right.
http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2012/n3333.html#per.process.seed
?
That to me seems like potentially the most controversial part. I scanned Hashing.h but did not immediately see that support (I could've easily missed it).
OK, but this is a VERY big exception! Almost any non-string data
anyone wants to hash will be a multiple of 4 bytes in length.
> //===-- llvm/ADT/Hashing.h - Utilities for hashing --------------*- C++ -*-===//
Doesn't this belong in ../Support/ ? No-one's ever explained the
difference to me; I just assumed that generic data types go in ADT/
and everything else goes in Support/.
> //
> // The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open Source
> // License. See LICENSE.TXT for details.
> //
> //===----------------------------------------------------------------------===//
> //
> // This file implements the newly proposed standard C++ interfaces for hashing
> // arbitrary data and building hash functions for user-defined types. This
> // interface was originally proposed in N3333[1] and is currently under review
> // for inclusion in a future TR and/or standard.
> //
> // The primary interfaces provide are comprised of one type and three functions:
"provided"
> //
> // -- 'hash_code' class is an opaque type representing the hash code for some
> // data. It is the intended product of hashing, and can be used to implement
> // hash tables, checksumming, and other common uses of hashes. It is not an
> // integer type (although it can be converted to one) because it is risky
> // to assume much about the internals of a hash_code. In particular, each
> // execution of the program has a high probability of producing a different
> // hash_code for a given input. Thus their values are not stable to save or
> // persist, and should only be used during the execution for the
> // construction of hashing datastructures.
> //
> // -- 'hash_value' is a function designed to be overloaded for each
> // user-defined type which wishes to be used within a hashing context. It
> // should be overloaded within the user-defined type's namespace and found via
> // ADL. Overloads for primitive types are provided by this library.
> //
> // -- 'hash_combine' and 'hash_combine_range' are functions designed to aid
> // programmers in easily and intuitively combining a set of data into a single
> // hash_code for their object. They should only logically be used within the
> // implementation of a 'hash_value' routine or similar context.
> //
> // Note that 'hash_combine_range' contains very special logic for hashing
> // a contiguous array of integers or pointers. This logic is *extremely* fast,
> // on a modern Intel "Gainestown" Xeon (Nehalem uarch) @2.2 GHz, these were
> // benchmarked at over 8.5 GiB/s for large keys, and <20 cycles/
20 cycles per what? Don't keep us in suspense!
> //
> //===----------------------------------------------------------------------===//
>
> #ifndef LLVM_ADT_HASHING_H
> #define LLVM_ADT_HASHING_H
>
> #include "llvm/ADT/STLExtras.h"
> #include "llvm/Support/DataTypes.h"
> #include "llvm/Support/type_traits.h"
> #include <cassert>
> #include <cstring>
> #include <algorithm>
> #include <utility>
> #include <iterator>
>
> namespace llvm {
>
> /// \brief An opaque object representing a hash code.
> ///
> /// This object represents the result of hashing some entity. It is intended to
> /// be used to implement hashtables or other hashing-based data structures.
> /// While it wraps and exposes a numeric value, this value should not be
> /// trusted to be stable or predictable across processes or executions.
> ///
> /// In order to obtain the hash_code for an object 'x':
> /// \code
> /// using llvm::hash_value;
> /// llvm::hash_code code = hash_value(x);
> /// \endcode
> ///
> /// Also note that there are two numerical values which are reserved, and the
> /// implementation ensures will never be produced for real hash_codes. These
> /// can be used as sentinels within hashing data structures.
> class hash_code {
> size_t value;
>
> public:
> /// \brief Default construct a hash_code. Constructs a null code.
> hash_code() : value() {}
>
> /// \brief Form a hash code directly from a numerical value.
> hash_code(size_t value) : value(value) {}
>
> /// \brief Convert the hash code to its numerical value for use.
> /*explicit*/ operator size_t() const { return value; }
>
> /// \brief Get a hash_code object which corresponds to a null code.
> ///
> /// The null code must never be the result of any 'hash_value' calls and can
> /// be used to detect an unset hash_code.
> static hash_code get_null_code() { return 0; }
>
> /// \brief Get a hash_code object which corresponds to an invalid code.
> ///
> /// The invalid code must never be the result of any 'hash_value' calls. This
> /// can be used to flag invalid hash_codes or mark entries in a hash table.
> static hash_code get_invalid_code() { return -1; }
I'm sure there's a compiler out there just waiting to warn you that
you're implicitly converting -1 to an unsigned type. :-)
> friend bool operator==(const hash_code &lhs, const hash_code &rhs) {
> return lhs.value == rhs.value;
> }
> friend bool operator!=(const hash_code &lhs, const hash_code &rhs) {
> return lhs.value != rhs.value;
> }
>
> /// \brief Allow a hash_code to be directly run through hash_value.
> friend size_t hash_value(const hash_code &code) { return code.value; }
> };
>
>
> // All of the implementation details of actually computing the various hash
> // code values are held within this namespace. These routines are included in
> // the header file mainly to allow inlining and constant propagation.
> namespace hashing { namespace detail {
My only comment on the implementation is that, on CPUs that have
different instruction sequences for aligned and unaligned loads, you
want to be careful that the 8-byte memcpys turn into aligned loads
when hashing naturally aligned data. (But if we only care about
running on x86-64, this won't be a problem.)
Jay.
Please, no. We have a cortex-a9 native buildbot already in lab.llvm.org and
as manufacturers emit faster ARM chips we (ARM) will want to have LLVM run
native on them.
You've also got the OpenCL use case etc.
Please bear ARM in mind.
Cheers,
James
Right. (But ARMv7 uses the same ldr instruction for aligned and
unaligned loads (doesn't it?) so it's not an issue there.)
-----Original Message-----
From: Jay Foad [mailto:jay....@gmail.com]
Sent: 29 February 2012 12:25
To: James Molloy
Cc: Chandler Carruth; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
> On 29 February 2012 11:47, James Molloy <james....@arm.com> wrote:
>>> (But if we only care about
>>> running on x86-64, this won't be a problem.)
>>
>> Please, no. We have a cortex-a9 native buildbot already in lab.llvm.org and
>> as manufacturers emit faster ARM chips we (ARM) will want to have LLVM run
>> native on them.
>>
>> You've also got the OpenCL use case etc.
>>
>> Please bear ARM in mind.
>
> Right. (But ARMv7 uses the same ldr instruction for aligned and
> unaligned loads (doesn't it?) so it's not an issue there.)
>
Sorta. It's the same instruction, but on some implementations, unaligned loads will cause an exception, so the compiler may need to generate LDRB sequences instead. There's also the VLD* instructions to consider. Depending on what's being loaded, there's all sorts of potential tricks that can be played. We should try to be careful and give the compiler as many hints about alignment as we can.
-Jim
Have you ever tried:
namespace foo {
class std {};
}
using namespace foo;
#include <vector>
Well, I'm not sure that Chandler is guarding against this possibility,
but most library implementations of the standard use ::std::
everywhere to avoid this potential for ambiguity.