| Commit-Queue | +1 |
Thank you for the review.
The build failures (android and so on) are caused by `//third_party/smhasher is not visible from //base`. But before fixing the failures, I would like to ask a few questions:
(1) How to solve murmurhash3 license.
(2) Whether fnv1-a & 40bit mask is still better than md5 32-bit or not?
(3) Re-using reverted md5_constexpr_internal.h is not allowed? Even if the code will be planed to be replaced with AllocToken?
constexpr uint32_t MD5Hash32Constexpr(std::string_view string) {Takashi Sakamotoit's a bit confusing that this function doesn't return an MD5 hash digest. if we're only using the first 32 bits of the digest, why are we using MD5?
have you considered MurmurHash3 (example: https://source.chromium.org/chromium/chromium/src/+/main:components/safe_browsing/content/renderer/phishing_classifier/murmurhash3_util.cc) or FNV-1a (example: https://source.chromium.org/chromium/chromium/src/+/main:third_party/perfetto/include/perfetto/tracing/internal/fnv1a.h)?
The latest patch introduces murmurhash3. But I have to ask whether it is ok to add the code into //base/hash or not. Because its license seems not to be chromium. As far as I understand, chromium code must have CHROMIUM LICENSE header. I have to pull-request the original repo and wait until DEPS roll?
Regarding FNV-1a, it is `constexpr`. It looks better than murmurhash3, because it can be computed at compile time. However I would like to use the hash key as a part of `Crash address` to investigate crash reports. This is almost the same way as ELUD does.
ELUD's crash reports are `(crash.Address & 0xFFFF000000000000) = 0xEBAF000000000000` and the least significant 8-bits are also used for `kRemainderZapValue` (`0xED`). c.f. `components/gwp_asan/common/extreme_lightweight_detector_util.h`
(Regarding EULD, we will use information provided by the 40-bits to investiagte crashes)
So only 40bits(64-16-8) are available. I have a question: `40bit hash generated from 64bit hash (hash & 0xFFFFFFFFFF or something) is still better than 32bit hash?`
c.f. This CL's `Zap` code is:
```
void PartitionRoot::Zap(internal::SlotStart slot_start,
SlotSpanMetadata* slot_span,
uint32_t type_id) {
void* object = reinterpret_cast<void*>(slot_start.value());
uint64_t zap_value = internal::kIntendedLeakQuarantineMarker |
(static_cast<uint64_t>(type_id) << 8);
```
Takashi Sakamotoplease add a unit test for this that at least validates the test vectors from RFC 1321 in appendix [A.5](https://datatracker.ietf.org/doc/html/rfc1321#appendix-A.5). the ones from [RFC 2202](https://datatracker.ietf.org/doc/html/rfc2202#section-2) would be great, too. and NIST has some test vectors [here](https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data).
Regarding the `advanced_memory_safety_checks_hash.h` is just copy of
Takashi Sakamotoplease add a unit test for this that at least validates the test vectors from RFC 1321 in appendix [A.5](https://datatracker.ietf.org/doc/html/rfc1321#appendix-A.5). the ones from [RFC 2202](https://datatracker.ietf.org/doc/html/rfc2202#section-2) would be great, too. and NIST has some test vectors [here](https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data).
This file is just copy of
`gen/arm64-generic/chroot/usr/include/libchrome/base/hash/md5_constexpr_internal.h`
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:gen/arm64-generic/chroot/usr/include/libchrome/base/hash/md5_constexpr_internal.h
If possible, I would like to use the same content (if ok, I would like to use the same filename).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// domain. The author hereby disclaims copyright to this source code.I copied the same license header as `//third_party/smhasher/src/src/MurmurHash3.{h,cpp}`.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
const FreeHint& hint,
const internal::BucketSizeDetails& size_details) {Stephen NuskoI wonder if we should join these? size_details is just a slot_size (which is based on the data in FreeHint if it has size) and a bucket_index. Maybe we merge these two objects?
Eventually we might want to pass this down to the `GetSchedulerLoopQuarantineBranch` and for the AdvancedMemorySafetyMacro so that it can use the type info as well (if it exists)
Takashi SakamotoWhat are your thoughts?
Stephen NuskoI updated the base CL and now the argument is `FreeHintType<FreeHintFlags(flags)> hint`.
The `FreeHintType` is as small as possible and if callers don't need `hint.type_id`, `FreeNoHooksImmediateInternal` will take `{}`, c.f.
```
template <FreeFlags flags>
PA_ALWAYS_INLINE void PartitionRoot::FreeNoHooksImmediate(
internal::SlotStart slot_start,
SlotSpanMetadata* slot_span) {
auto size_details = SlotSpanToBucketSizeDetails(slot_span);
FreeNoHooksImmediateInternal<flags>(slot_start, slot_span, {}, size_details);
}
```The generated code seems no additional code for the `{}`:
```
2058dc4: 0f b6 4c 0a 1e movzx ecx, byte ptr [rdx + rcx + 0x1e]
2058dc9: 83 e1 3f and ecx, 0x3f
2058dcc: c1 e1 05 shl ecx, 0x5
2058dcf: 48 29 cf sub rdi, rcx
2058dd2: 0f 18 0f prefetcht0 byte ptr [rdi] // PREFETCH(slot_span)
2058dd5: 48 89 bd 80 fe ff ff mov qword ptr [rbp - 0x180], rdi
2058ddc: 4c 8b 6f 10 mov r13, qword ptr [rdi + 0x10] // r13 == `slot_span->bucket` ?
2058de0: 41 8b 5d 18 mov ebx, dword ptr [r13 + 0x18] // rbx == `bucket->slot_size` ?// if (brp_enabled()) // rsi == this (PartitionRoot)
2058de4: 80 7e 10 01 cmp byte ptr [rsi + 0x10], 0x1
2058de8: 75 55 jne 0x2058e3f <void partition_alloc::PartitionRoot::FreeInline<(partition_alloc::internal::FreeFlags)0>(void*)+0x17f>
2058dea: 4c 89 f1 mov rcx, r14
2058ded: 48 c1 e1 34 shl rcx, 0x34
2058df1: 0f 84 5a 02 00 00 je 0x2059051 <void partition_alloc::PartitionRoot::FreeInline<(partition_alloc::internal::FreeFlags)0>(void*)+0x391>
2058df7: 4d 8d 3c 1e lea r15, [r14 + rbx]
2058dfb: 49 83 c7 fc add r15, -0x4
2058dff: 45 8b 27 mov r12d, dword ptr [r15]
2058e02: 41 81 e4 ff ff ff 7f and r12d, 0x7fffffff
``````
void PartitionRoot::FreeInlineInternal<kNoHooks>(void* object) {
...
// Almost all calls to FreeNoNooks() will end up writing to |*object|.
PA_PREFETCH_FOR_WRITE(object);
auto [slot_start, slot_span] = GetSlotStartAndSlotSpanFromAddress(object);
// We are going to read from |*slot_span| in all branches, but haven't
// done it yet.
PA_PREFETCH(slot_span);
//// Expand: FreeNoHooksImmediate<flags>(slot_start, slot_span);
auto size_details = SlotSpanToBucketSizeDetails(slot_span);
//// Expand: FreeNoHooksImmediateInternal<flags>(slot_start, slot_span, {}, size_details);
#if PA_BUILDFLAG(ENABLE_BACKUP_REF_PTR_SUPPORT)
if (brp_enabled()) [[likely]] {
auto* ref_count = InSlotMetadataPointerFromSlotStartAndSize(
// ... snip
}
```(no code between `PartitionRoot::FreeInlineInternal(void* object) PREFETCH(slot_span)` and `if (brp_enabled()) `
So currently we don't need to merge 2 classes into 1. In the future, we may revisit the issue (for refactoring or ...), I thin.
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
dcheng@, I would like to ask about a hash for 40bit type-id.
The CL is...
(1) some of classes will be marked as LEAKED_SANITIZED_OBJECT() to avoid heap-use-after-free (the classes' instances will be leaked at free()/delete. Never re-used).
(2) When freeing the instances of LEAKED_SANITIZED_OBJECT-s, the memory regions will be zapped with bytes based on the classes' type.
(3) Because of `leak`, we expect that the number of LEAKED_SANITIZED_OBJECT-classes (the number of LEAKED_SANITIZED_OBJECT instances and the number of classes/structs marked as LEAKED_SANITIZED_OBJECT in chromium source code) is small.
Currently I'm thinking:
(1) use type-id generated from __PRETTY_FUNCTION__ and use 40bit of the type-id for zapping. i.e.
|<--16bit (reserved to identify LEAKED_SANITIZED_OBJECT)-->|
|<--40bit (type-id) -->|
|<--8bit (reserved) -->|
(totally 64bit)
I expect that the bytes will be shown in crash reports' crash addresses and help us to investigate what classes are accessed after free().
(2) directly using __PRETTY_FUNCTION__'s address may cause large chrome binary. So type-id should not be ((uintptr_t)__PRETTY_FUNCTION_ >> 8) & k40BitMask or something.
(3) in the future, we have a plan to replace the type-id with `AllocToken`.
Instead I would like to use `consteval uint32_t some_hash_function(const char*)` (or (const std::string_view))), i.e.
```
static constexpr uint32_t kPartitionAllocSanitizedObjectTypeId = [] { \
return some_hash_function(__PRETTY_FUNCTION__); \
}();
```
(using `uint64_t` and 40bit mask is better?)
Firstly I'm thinking of `md5_internal_constexpr.h`. It was reverted but some of chromium code (i.e. chormeos) is still using it, and its license was chromium. (I used 32-bit md5 hash...)
I also looked at //third_party code, but it seems that hash functions under //third_party are not `constexpr` (so not `consteval` either) and license are not chromium (this means I need to update the upstreams).
So my question is...
Best regards,
tasak
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Are we leaking that many different types of objects? I can't imagine that we'd use `LEAKED_SANITIZED_OBJECT()` very often.
I'm not sure we need a hash at all. Can we just roll a custom version of `absl::FastTypeId` (a templated class with a static member) and use the address of the static member as a marker here? We don't allow `absl::FastTypeId` in Chrome because `&base_internal::FastTypeTag<int>::kDummyVar == &base_internal::FastTypeTag<int>::kDummyVar` may not be true if the type IDs are generated in different components–but in this case, we just need a marker tag.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Are we leaking that many different types of objects? I can't imagine that we'd use LEAKED_SANITIZED_OBJECT() very often.
I agree with you.
We expect very small number of different types of objects.
I'm not sure we need a hash at all. Can we just roll a custom version of absl::FastTypeId
I see. The reason why I would like to use a hash is that `indexing`(like blinkgc) or `runtime values` are a little complex to see what types of objects are from the values. I think, __PRETTY_FUNCTION__ and hash are basically static. So we don't need heap dump (basically minidump doesn't contain) to identify object types.
I will take a look at `absl::FastTypeId`.
Thank you for the review.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
As an extra point: We won't be leaking to many objects for sure, but we also do hope to use this type of type zapping for non-leaked SANITIZED_OBJECT() objects as well. That should still be a relatively finite set, but not absolutely tiny, and we'll want to identify it server side as Takashi-san mentioned.
| Commit-Queue | +1 |
Looking at FastTypeId solution, this depends on `template` and `typename`. However, the type id for leaked sanitized object is used inside `static method`, i.e. operator free.
This means, we cannot use `this` (decltype(this) doesn't work). And also if using symbols, it will make it a little difficult to compare crash reports among different chrome versions. (possible, but a little difficult. Need to ask symbol server) Basically __PRETY_FUNCTION__ doesn't change between different chrome versions (except namespace changes and class/struct name changes... but in the case, I think the classes after the changes (=refactoring) are different from ones before the changes.)
So instead, I added fnv1a_consteval.h under partition_alloc and generate type ids by using it. (//nogncheck is not public for every one.)
Because of small number of leaked sanitized objects, I think any hash will do (but we should be careful about license and algorithm patents...) fnv1a algorithm looks public domain and very simple (simplicity makes everyone write almost the same code. So rarely license issue happens?)
I added fnv1a_consteval.h under partition_alloc instead. Does this look ok?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |