| Commit-Queue | +1 |
Friendly ping.
If there is any concern about the hash, I will add a new patch under:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/rapidhash/patches/
to implement consteval rapidhash.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
Friendly ping.
If there is any concern about the hash, I will add a new patch under:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/rapidhash/patches/
to implement consteval rapidhash.
FYI: constexpr rapidhash
https://chromium-review.git.corp.google.com/c/chromium/src/+/7914719
I don't claim to fully understand the limitations and restrictions here since I don't quite know how the pieces are going to be hooked up end to end; ultimately, I would still prefer an approach based on FastTypeId but I understand if that's not easy or possible here.
If we keep this approach, I think it would be good if:
- computead at compile time (not to affect non-leaked security objects)Nit: computed
- static between different chrome version (make it easy to see how the number of crash reports with each type id changes)I think we should give ourselves some room for changing how this works over time. Probably the easiest way to do that is just use a few bits as a generation counter?
I don't claim to fully understand the limitations and restrictions here since I don't quite know how the pieces are going to be hooked up end to end; ultimately, I would still prefer an approach based on FastTypeId but I understand if that's not easy or possible here.
If we keep this approach, I think it would be good if:
- we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
- and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
In the future, we will replace the type-id with AllocToken provided by clang.
and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Regarding the collisions, we will check whether lots of collisions occur or not. Currently there are very small number of (honestly speaking, no leaked security objects) leaked security objects, I expect we will rarelly see type-id collisions.
it sounds like there is a plan to use this on non-leaked objects for the zap pattern too
At that time, we will have already replaced type-id with AllocToken, I think.
- computead at compile time (not to affect non-leaked security objects)Takashi SakamotoNit: computed
Done
- static between different chrome version (make it easy to see how the number of crash reports with each type id changes)I think we should give ourselves some room for changing how this works over time. Probably the easiest way to do that is just use a few bits as a generation counter?
I would like to confirm... So you mean...
If we find type-ids with different generation counters, we will always find the ways to generate type-ids with the generation counters, and see whether the type are the same or not?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Takashi SakamotoI don't claim to fully understand the limitations and restrictions here since I don't quite know how the pieces are going to be hooked up end to end; ultimately, I would still prefer an approach based on FastTypeId but I understand if that's not easy or possible here.
If we keep this approach, I think it would be good if:
- we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
- and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
In the future, we will replace the type-id with AllocToken provided by clang.
and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Regarding the collisions, we will check whether lots of collisions occur or not. Currently there are very small number of (honestly speaking, no leaked security objects) leaked security objects, I expect we will rarelly see type-id collisions.
it sounds like there is a plan to use this on non-leaked objects for the zap pattern tooAt that time, we will have already replaced type-id with AllocToken, I think.
Just to add I think we might use this zap pattern on other macros sooner then AllocToken, I don't see a reason to wait unless I'm missing something @ta...@google.com?
However we're talking currently at talking 59 classes ([code search](https://source.chromium.org/search?q=%5B%5E%5C%2F%5D%2BADVANCED_MEMORY_SAFETY_CHECKS%5C(%20f:.*.h%20-f:base%2Fmemory%2Fadvanced_memory_safety_checks.h%20-f:test&ss=chromium))) and the worst case for a collision is we group two different crashes together for our teams monitoring only purposes. No runtime decision will be made on it, so we can just notice a collision and separate them based on call stack, and then fix the collision if needed.
- static between different chrome version (make it easy to see how the number of crash reports with each type id changes)Takashi SakamotoI think we should give ourselves some room for changing how this works over time. Probably the easiest way to do that is just use a few bits as a generation counter?
I would like to confirm... So you mean...
If we find type-ids with different generation counters, we will always find the ways to generate type-ids with the generation counters, and see whether the type are the same or not?
The suggestion as I read it is to reserve some amount of bits (say 2 so we can have generation 0 be the hash, generation 1 be FastTypeId, generation 2 be allocToken, etc, or whatever versions we change it to). Then when we upload to crash we can detect which version it is and compute and properly map the type ID.
I think given this is debugging data however, if we decide to do a hard break to change the format it will be okay without a generation counter. Since this is only for debugging purposes. The goal of having it stay the same is for our tracking across milestones so we can ask the question:
"Does this macro generate any crashes anymore?" This would enable us to ask ourselves if we remove it now that the UaF (or whatever) is fixed.
If we change the format it would breaks across chrome versions and we'd have to update our monitoring, but since it should break rarely and we are the consumers of it, I'm not to worried until we get to the world of AllocToken which will have full compiler symbolization support.
Takashi SakamotoI don't claim to fully understand the limitations and restrictions here since I don't quite know how the pieces are going to be hooked up end to end; ultimately, I would still prefer an approach based on FastTypeId but I understand if that's not easy or possible here.
If we keep this approach, I think it would be good if:
- we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
- and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Stephen Nuskowe have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
In the future, we will replace the type-id with AllocToken provided by clang.
and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Regarding the collisions, we will check whether lots of collisions occur or not. Currently there are very small number of (honestly speaking, no leaked security objects) leaked security objects, I expect we will rarelly see type-id collisions.
it sounds like there is a plan to use this on non-leaked objects for the zap pattern tooAt that time, we will have already replaced type-id with AllocToken, I think.
Just to add I think we might use this zap pattern on other macros sooner then AllocToken, I don't see a reason to wait unless I'm missing something @ta...@google.com?
However we're talking currently at talking 59 classes ([code search](https://source.chromium.org/search?q=%5B%5E%5C%2F%5D%2BADVANCED_MEMORY_SAFETY_CHECKS%5C(%20f:.*.h%20-f:base%2Fmemory%2Fadvanced_memory_safety_checks.h%20-f:test&ss=chromium))) and the worst case for a collision is we group two different crashes together for our teams monitoring only purposes. No runtime decision will be made on it, so we can just notice a collision and separate them based on call stack, and then fix the collision if needed.
Thank you for following up...
I think, if any collisions are detected,
I updated https://chromium-review.git.corp.google.com/c/chromium/src/+/7931207/3 to check type-id conflicts if DCHECK_IS_ON() and PA_BUILDFLAG(DCHECKS_ARE_ON).
(I have to monitor binary-size changes)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Takashi SakamotoI don't claim to fully understand the limitations and restrictions here since I don't quite know how the pieces are going to be hooked up end to end; ultimately, I would still prefer an approach based on FastTypeId but I understand if that's not easy or possible here.
If we keep this approach, I think it would be good if:
- we have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
- and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Stephen Nuskowe have some plan for allowing potential changes to how this type id is calculated in the future (even if that means some temporary pain across the transition period)
In the future, we will replace the type-id with AllocToken provided by clang.
and clarifying if we're going to try to have a mapping of these type ids to names (and seeing how many collisions there are in practice, since it sounds like there is a plan to use this on non-leaked objects for the zap pattern too)
Regarding the collisions, we will check whether lots of collisions occur or not. Currently there are very small number of (honestly speaking, no leaked security objects) leaked security objects, I expect we will rarelly see type-id collisions.
it sounds like there is a plan to use this on non-leaked objects for the zap pattern tooAt that time, we will have already replaced type-id with AllocToken, I think.
Takashi SakamotoJust to add I think we might use this zap pattern on other macros sooner then AllocToken, I don't see a reason to wait unless I'm missing something @ta...@google.com?
However we're talking currently at talking 59 classes ([code search](https://source.chromium.org/search?q=%5B%5E%5C%2F%5D%2BADVANCED_MEMORY_SAFETY_CHECKS%5C(%20f:.*.h%20-f:base%2Fmemory%2Fadvanced_memory_safety_checks.h%20-f:test&ss=chromium))) and the worst case for a collision is we group two different crashes together for our teams monitoring only purposes. No runtime decision will be made on it, so we can just notice a collision and separate them based on call stack, and then fix the collision if needed.
Thank you for following up...
I think, if any collisions are detected,
- if the number of objects which use type_id is small, it is possible to see all objects with the same type id (because of conflicts).
- if the number of objects is too large, I think we will have already replaced the type-id calculation with any better solution. (AllocToken is supported by clang and I expect that it will be useful type-id replacement.)
If we're only using this for a limited use case, then I guess it's fine. I think collisions become more interesting if we're considering using it for zapping all objects though.
LGTM assuming this is a short-term solution limited to relatively few types and that breaking changes to the type ID are OK.
- static between different chrome version (make it easy to see how the number of crash reports with each type id changes)Takashi SakamotoI think we should give ourselves some room for changing how this works over time. Probably the easiest way to do that is just use a few bits as a generation counter?
Stephen NuskoI would like to confirm... So you mean...
If we find type-ids with different generation counters, we will always find the ways to generate type-ids with the generation counters, and see whether the type are the same or not?
The suggestion as I read it is to reserve some amount of bits (say 2 so we can have generation 0 be the hash, generation 1 be FastTypeId, generation 2 be allocToken, etc, or whatever versions we change it to). Then when we upload to crash we can detect which version it is and compute and properly map the type ID.
I think given this is debugging data however, if we decide to do a hard break to change the format it will be okay without a generation counter. Since this is only for debugging purposes. The goal of having it stay the same is for our tracking across milestones so we can ask the question:
"Does this macro generate any crashes anymore?" This would enable us to ask ourselves if we remove it now that the UaF (or whatever) is fixed.
If we change the format it would breaks across chrome versions and we'd have to update our monitoring, but since it should break rarely and we are the consumers of it, I'm not to worried until we get to the world of AllocToken which will have full compiler symbolization support.
As long as we're OK with potentially changing the format and it's not frozen in stone, I'm fine with that. If you don't think we need a generation counter and are OK with breaking changes, then I won't insist on it.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you for the review.
LGTM assuming this is a short-term solution limited to relatively few types and that breaking changes to the type ID are OK.
Yeah. I agree that the solution should be short-term one.
Daniel Chengdcheng@, I would like to ask about a hash for 40bit type-id.
The CL is...
(1) some of classes will be marked as LEAKED_SANITIZED_OBJECT() to avoid heap-use-after-free (the classes' instances will be leaked at free()/delete. Never re-used).
(2) When freeing the instances of LEAKED_SANITIZED_OBJECT-s, the memory regions will be zapped with bytes based on the classes' type.
(3) Because of `leak`, we expect that the number of LEAKED_SANITIZED_OBJECT-classes (the number of LEAKED_SANITIZED_OBJECT instances and the number of classes/structs marked as LEAKED_SANITIZED_OBJECT in chromium source code) is small.Currently I'm thinking:
(1) use type-id generated from __PRETTY_FUNCTION__ and use 40bit of the type-id for zapping. i.e.
|<--16bit (reserved to identify LEAKED_SANITIZED_OBJECT)-->|
|<--40bit (type-id) -->|
|<--8bit (reserved) -->|
(totally 64bit)
I expect that the bytes will be shown in crash reports' crash addresses and help us to investigate what classes are accessed after free().
(2) directly using __PRETTY_FUNCTION__'s address may cause large chrome binary. So type-id should not be ((uintptr_t)__PRETTY_FUNCTION_ >> 8) & k40BitMask or something.(3) in the future, we have a plan to replace the type-id with `AllocToken`.
Instead I would like to use `consteval uint32_t some_hash_function(const char*)` (or (const std::string_view))), i.e.
```
static constexpr uint32_t kPartitionAllocSanitizedObjectTypeId = [] { \
return some_hash_function(__PRETTY_FUNCTION__); \
}();
```
(using `uint64_t` and 40bit mask is better?)Firstly I'm thinking of `md5_internal_constexpr.h`. It was reverted but some of chromium code (i.e. chormeos) is still using it, and its license was chromium. (I used 32-bit md5 hash...)
I also looked at //third_party code, but it seems that hash functions under //third_party are not `constexpr` (so not `consteval` either) and license are not chromium (this means I need to update the upstreams).So my question is...
- what hash is good for this kind of type-id?
- or using hash function is bad. I should find another way to generate type-id?
Best regards,
tasak
Stephen NuskoAre we leaking that many different types of objects? I can't imagine that we'd use `LEAKED_SANITIZED_OBJECT()` very often.
I'm not sure we need a hash at all. Can we just roll a custom version of `absl::FastTypeId` (a templated class with a static member) and use the address of the static member as a marker here? We don't allow `absl::FastTypeId` in Chrome because `&base_internal::FastTypeTag<int>::kDummyVar == &base_internal::FastTypeTag<int>::kDummyVar` may not be true if the type IDs are generated in different components–but in this case, we just need a marker tag.
Takashi SakamotoAs an extra point: We won't be leaking to many objects for sure, but we also do hope to use this type of type zapping for non-leaked SANITIZED_OBJECT() objects as well. That should still be a relatively finite set, but not absolutely tiny, and we'll want to identify it server side as Takashi-san mentioned.
Takashi SakamotoLooking at FastTypeId solution, this depends on `template` and `typename`. However, the type id for leaked sanitized object is used inside `static method`, i.e. operator free.
This means, we cannot use `this` (decltype(this) doesn't work). And also if using symbols, it will make it a little difficult to compare crash reports among different chrome versions. (possible, but a little difficult. Need to ask symbol server) Basically __PRETY_FUNCTION__ doesn't change between different chrome versions (except namespace changes and class/struct name changes... but in the case, I think the classes after the changes (=refactoring) are different from ones before the changes.)So instead, I added fnv1a_consteval.h under partition_alloc and generate type ids by using it. (//nogncheck is not public for every one.)
Because of small number of leaked sanitized objects, I think any hash will do (but we should be careful about license and algorithm patents...) fnv1a algorithm looks public domain and very simple (simplicity makes everyone write almost the same code. So rarely license issue happens?)
Acknowledged
// domain. The author hereby disclaims copyright to this source code.Takashi SakamotoI copied the same license header as `//third_party/smhasher/src/src/MurmurHash3.{h,cpp}`.
Acknowledged
return internal::MD5CE::Hash32(string);Takashi Sakamotonit: omit `internal::`
Acknowledged
constexpr uint32_t MD5Hash32Constexpr(std::string_view string) {Takashi Sakamotoit's a bit confusing that this function doesn't return an MD5 hash digest. if we're only using the first 32 bits of the digest, why are we using MD5?
have you considered MurmurHash3 (example: https://source.chromium.org/chromium/chromium/src/+/main:components/safe_browsing/content/renderer/phishing_classifier/murmurhash3_util.cc) or FNV-1a (example: https://source.chromium.org/chromium/chromium/src/+/main:third_party/perfetto/include/perfetto/tracing/internal/fnv1a.h)?
Takashi SakamotoThe latest patch introduces murmurhash3. But I have to ask whether it is ok to add the code into //base/hash or not. Because its license seems not to be chromium. As far as I understand, chromium code must have CHROMIUM LICENSE header. I have to pull-request the original repo and wait until DEPS roll?
Regarding FNV-1a, it is `constexpr`. It looks better than murmurhash3, because it can be computed at compile time. However I would like to use the hash key as a part of `Crash address` to investigate crash reports. This is almost the same way as ELUD does.
ELUD's crash reports are `(crash.Address & 0xFFFF000000000000) = 0xEBAF000000000000` and the least significant 8-bits are also used for `kRemainderZapValue` (`0xED`). c.f. `components/gwp_asan/common/extreme_lightweight_detector_util.h`
(Regarding EULD, we will use information provided by the 40-bits to investiagte crashes)So only 40bits(64-16-8) are available. I have a question: `40bit hash generated from 64bit hash (hash & 0xFFFFFFFFFF or something) is still better than 32bit hash?`
c.f. This CL's `Zap` code is:
```
void PartitionRoot::Zap(internal::SlotStart slot_start,
SlotSpanMetadata* slot_span,
uint32_t type_id) {
void* object = reinterpret_cast<void*>(slot_start.value());uint64_t zap_value = internal::kIntendedLeakQuarantineMarker |
(static_cast<uint64_t>(type_id) << 8);
```
Acknowledged
constexpr uint32_t MD5Hash32Constexpr(std::string_view string) {Takashi Sakamotocould we use `base::span<const uint8_t>` rather than `std::string_view`? callers can use `base::byte_span_from_cstring()`, `base::as_bytes()`, or `base::as_byte_span()` depending on what they want to compute a hash digest over.
in either case, please #include either <string_view> or "base/containers/span.h" depending on which way you go.
Acknowledged
return ((a & 0xff) << 24) | (((a >> 8) & 0xff) << 16) |Takashi Sakamotowe have `ExtractByte` above. could we use that here rather than sort of reimplementing it? i would hope that the compiler is smart enough to optimize down to more-or-less the same thing. or do you find otherwise?
Acknowledged
const uint32_t m =Takashi Sakamotohttps://google.github.io/styleguide/cppguide.html#General_Naming_Rules says we generally shouldn't abbreviate. `padded_length` would be a good name for this.
Acknowledged
return (((n + 1 + 8) + 63) / 64) * 64;Takashi Sakamotothis could overflow. could we use [`CheckedNumeric`](https://chromium.googlesource.com/chromium/src/+/main/base/numerics/README.md#checkednumeric_in-checked_math_h) here?
Acknowledged
struct MD5CE {Takashi Sakamotothe style guide says ["Do not use a class simply to group static members"](https://google.github.io/styleguide/cppguide.html#Nonmember,_Static_Member,_and_Global_Functions). a namespace (with `inline constexpr` for the constants) would suffice.
Acknowledged
Takashi Sakamotoplease add a unit test for this that at least validates the test vectors from RFC 1321 in appendix [A.5](https://datatracker.ietf.org/doc/html/rfc1321#appendix-A.5). the ones from [RFC 2202](https://datatracker.ietf.org/doc/html/rfc2202#section-2) would be great, too. and NIST has some test vectors [here](https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data).
Takashi SakamotoRegarding the `advanced_memory_safety_checks_hash.h` is just copy of
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Make leaked security objects provide type-id at `delete`.
The type-id should be:
- computed at compile time (not to affect non-leaked security objects)
- static between different chrome version (make it easy to see how the number of crash reports with each type id changes)
To generate type-id, add fnv1a_consteval.h under partition_root.h (not
//base/hash, because the hash is used only for the type-id). The key
will be replaced with AllocToken later. Once AllocToken is available, we
will remove the code immediately.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |