Doubly created strings on Hashtable insert from const char*

365 views
Skip to first unread message

Dave Tapuska

unread,
Sep 10, 2024, 5:20:05 PM9/10/24
to platform-architecture-dev
When we are creating strings from const char* it appears we end up creating the string twice.. I've shared the stack traces...

First to get the string hash, then to actually insert it. I'm actually wondering if we really should have these permission policy fields actually be atomic strings anyways because then we wouldn't have refs of the same string in both the isolated feature map and the non-isolated feature map.

But I thought I'd share the observation I'm seeing, as I wonder if we are tripping up elsewhere. 

dave.

#3 0x7f658f60e37d WTF::StringImpl::Create()
#4 0x7f658f628c11 WTF::String::String()
#5 0x7f659762d1e9 WTF::IdentityHashTranslator<>::GetHash<>()
#6 0x7f6597cef8ef WTF::HashTable<>::insert<>()
#7 0x7f6597cedc8c blink::GetDefaultFeatureNameMap()
#8 0x7f659726d296 blink::PermissionsPolicyParser::ParseHeader()
#9 0x7f6597265940 blink::SecurityContextInit::ApplyPermissionsPolicy()

#3 0x7f658f60e37d WTF::StringImpl::Create()
#4 0x7f658f628c11 WTF::String::String()
#5 0x7f6597cef9a1 WTF::HashTable<>::insert<>()
#6 0x7f6597cedc8c blink::GetDefaultFeatureNameMap()
#7 0x7f659726d296 blink::PermissionsPolicyParser::ParseHeader()
#8 0x7f6597265940 blink::SecurityContextInit::ApplyPermissionsPolicy()

Steinar H. Gunderson

unread,
Sep 10, 2024, 5:42:21 PM9/10/24
to Dave Tapuska, platform-architecture-dev
On Tue, Sep 10, 2024 at 05:19:53PM -0400, Dave Tapuska wrote:
> First to get the string hash, then to actually insert it. I'm actually
> wondering if we really should have these permission policy fields actually
> be atomic strings anyways because then we wouldn't have refs of the same
> string in both the isolated feature map and the non-isolated feature map.

I'm not a WTF expert, and I couldn't line up your stack traces exactly
(it's a bit tricky when they have all templates cleared, because they
hold some of the key to understanding this), but it seems to me that the
call is Set(), which calls InlineAdd(), which does:

template <typename T, typename U, typename V, typename W, typename X>
template <typename IncomingKeyType, typename IncomingMappedType>
typename HashMap<T, U, V, W, X>::AddResult HashMap<T, U, V, W, X>::InlineAdd(
IncomingKeyType&& key,
IncomingMappedType&& mapped) {
return impl_.template insert<HashMapTranslator<KeyTraits, ValueTraits>>(
std::forward<IncomingKeyType>(key),
std::forward<IncomingMappedType>(mapped));
}

Note that it uses the HashMapTranslator based on KeyTraits (String),
not IncomingKeyType (const char *).

I know we have a heterogenous variant of find() (confusingly called Find()),
where you can define your own translator and thus hash the value directly,
but we don't seem to have this for insert() or Set(). I guess you can work
around this by an explicit cast to String() before the call, but it's not
ideal.

/* Steinar */
--
Homepage: https://www.sesse.net/

Steinar H. Gunderson

unread,
Sep 10, 2024, 5:50:58 PM9/10/24
to Dave Tapuska, platform-architecture-dev
On Tue, Sep 10, 2024 at 11:42:18PM +0200, Steinar H. Gunderson wrote:
> Note that it uses the HashMapTranslator based on KeyTraits (String),
> not IncomingKeyType (const char *).

third_party/blink/renderer/platform/wtf/text/string_hash.h has
HashTraits<String>, so if you add a GetHash(const char *) overload
there, it _might_ work better. I haven't tested; I can try tomorrow
if nobody else knows this code better.

Dave Tapuska

unread,
Sep 10, 2024, 5:59:11 PM9/10/24
to Steinar H. Gunderson, platform-architecture-dev
Ya I suspected to add a hash traits based on a const char*. I'm not how often we insert via a const char* vs a String. Presumably it isn't much... I do think though the fix here is to just generate the strings as AtomicStrings since we have two tables we can actually avoid duplicate data. 

I was just wondering about other uses. (And it was end of day for me so I thought I'd share it before I left for anyone in apac to comment since there are some wtf experts there).

Dave

Steinar H. Gunderson

unread,
Sep 10, 2024, 6:04:25 PM9/10/24
to Dave Tapuska, platform-architecture-dev
On Tue, Sep 10, 2024 at 05:59:01PM -0400, Dave Tapuska wrote:
> Ya I suspected to add a hash traits based on a const char*.

I don't think you actually need new HashTraits for this, just an overload
added to HashTraits<String>::GetHash(). Like this:

template <>
struct HashTraits<String> : SimpleClassHashTraits<String> {
+ static unsigned GetHash(const char* key) {
+ return StringHasher::HashMemory(key, strlen(key));
+ }
static unsigned GetHash(const String& key) { return key.Impl()->GetHash(); }
static bool Equal(const String& a, const String& b) {
return EqualNonNull(a.Impl(), b.Impl());
}

which should probably be a good idea in general.

Steinar H. Gunderson

unread,
Sep 10, 2024, 6:07:02 PM9/10/24
to Dave Tapuska, platform-architecture-dev
On Wed, Sep 11, 2024 at 12:04:22AM +0200, Steinar H. Gunderson wrote:
> + static unsigned GetHash(const char* key) {
> + return StringHasher::HashMemory(key, strlen(key));
> + }

Eh, probably ComputeHashAndMaskTop8Bits(), not HashMemory(). Apologies
for the message flood.

Dave Tapuska

unread,
Sep 10, 2024, 6:28:56 PM9/10/24
to Steinar H. Gunderson, platform-architecture-dev

Jeremy Roman

unread,
Sep 11, 2024, 10:21:48 AM9/11/24
to Dave Tapuska, Steinar H. Gunderson, platform-architecture-dev
Making that constructor explicit is probably a no-go because it's how we construct strings from literals most of the time. I'd expect hundreds to thousands of places you'd need to turn "foo" into String("foo"), which might be bad for readability.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAHgVhZVJDUyawkZeXdVOLxvo7%2Boy6yV843WMiSZGjEjnBbMkSA%40mail.gmail.com.

danakj

unread,
Sep 11, 2024, 12:30:01 PM9/11/24
to Jeremy Roman, Dave Tapuska, Steinar H. Gunderson, platform-architecture-dev
On Wed, Sep 11, 2024 at 10:21 AM Jeremy Roman <jbr...@chromium.org> wrote:
Making that constructor explicit is probably a no-go because it's how we construct strings from literals most of the time. I'd expect hundreds to thousands of places you'd need to turn "foo" into String("foo"), which might be bad for readability.

Perhaps add a non-explicit ctor from char array? Compile-time literals are arrays, runtime cstrings are pointers. I don't think it would be ambiguous.
 

On Tue, Sep 10, 2024 at 6:28 PM Dave Tapuska <dtap...@chromium.org> wrote:

On Tue, Sep 10, 2024, 6:07 PM Steinar H. Gunderson <se...@chromium.org> wrote:
On Wed, Sep 11, 2024 at 12:04:22AM +0200, Steinar H. Gunderson wrote:
> +  static unsigned GetHash(const char* key) {
> +    return StringHasher::HashMemory(key, strlen(key));
> +  }

Eh, probably ComputeHashAndMaskTop8Bits(), not HashMemory(). Apologies
for the message flood.

/* Steinar */
--
Homepage: https://www.sesse.net/

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAHgVhZVJDUyawkZeXdVOLxvo7%2Boy6yV843WMiSZGjEjnBbMkSA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Dave Tapuska

unread,
Sep 11, 2024, 1:01:49 PM9/11/24
to danakj, Jeremy Roman, Steinar H. Gunderson, platform-architecture-dev
I presume Dana you mean something like:

  template <wtf_size_t N>
  explicit String(const char (&literal)[N]) : String(literal, N - 1) {}

That does not get tripped up as a warning for any callsites (even with the revert of sesse@ patch that has a fix for the hash).

danakj

unread,
Sep 11, 2024, 1:05:02 PM9/11/24
to Dave Tapuska, Jeremy Roman, Steinar H. Gunderson, platform-architecture-dev
On Wed, Sep 11, 2024 at 1:01 PM Dave Tapuska <dtap...@chromium.org> wrote:
I presume Dana you mean something like:

  template <wtf_size_t N>
  explicit String(const char (&literal)[N]) : String(literal, N - 1) {}

That does not get tripped up as a warning for any callsites (even with the revert of sesse@ patch that has a fix for the hash).

Well, I was meaning to invert the explicit.
 
  template <wtf_size_t N>

  String(const char (&literal)[N]) : String(literal, N - 1) {}

  template <wtf_size_t N>
  explicit String(const char* s) : ...


Ian Kilpatrick

unread,
Sep 11, 2024, 1:37:11 PM9/11/24
to danakj, Dave Tapuska, Jeremy Roman, Steinar H. Gunderson, platform-architecture-dev
Somewhat on topic:

FWIW we "recently"[1] made the AtomicString(const char*) constructor explicit as we had the issue of:

```
AtomicString getSomething(bool thing) {
  if (thing) {
    return "bar";
  }
  return "foo";
}
```

Non-obviously invoking the AtomicString ctor which was relatively expensive (if invoked in a tight loop for example[2]).

It wasn't too much work (primarily done by tkent@ ) to adjust the codebase to the explicit ctor, and a few non-obvious invokes were fixed.

Dave Tapuska

unread,
Sep 11, 2024, 2:00:55 PM9/11/24
to danakj, Jeremy Roman, Steinar H. Gunderson, platform-architecture-dev
Ah ya, of course.. face palm.

With the corrected code of course there are a number of things that trip up. Like we are converting arrays with base::ranges::copy (happens in ICU a few spots, and a few other array initializers).. I also learned about exceptions that use a const char* as opposed to Stirng.

dave.

Reply all
Reply to author
Forward
0 new messages