Hey Daniel!
I was able to reproduce the MultiplexRouter lifetime issue via test code and some synchonization games, so I figured I'd put up a CL and we'd be able to get it in for Christmas!
I suspect this could be causing some of the heap corruption we've been seeing in Mojo. This sort of thing would be a hotpath.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
base::OnceClosure error_handler_called_for_testing_;I am really not a fan of test-only callbacks. Is this something we'll need forever? Are there any possible alternatives?
&MultiplexRouter::OnPipeConnectionError, base::WrapRefCounted(this),I think this needs a comment to explain why it won't be a leak, since it looks like `this` owns `connector_`.
Is it important to add this ref preemptively? We can't just `scoped_refptr protect(this);` in `OnPipeConnectionError()`?
Based on the CL description, it sounds like it wouldn't work because the error handler callback is posted, and so the ref needs to exist while the callback is in flight, but more comments would be good.
// Without the extra reference fromIt looks like the extra reference causes the WeakPtrFactory to be invalidated on the wrong thread sometimes:
```
[273332:273343:FATAL:base/memory/weak_ptr.cc:27] DCHECK failed: sequence_checker_.CalledOnValidSequence(&bound_at) || HasOneRef(). WeakPtrs must be invalidated on the same sequenced thread as where they are bound.
Check failed at:
#0 0x60f235f15466 in ___interceptor_backtrace ??:0:0
#1 0x60f23c1ad548 in base::debug::CollectStackTrace(base::span<void const*, 18446744073709551615ul, void const**>) ./../../base/debug/stack_trace_posix.cc:1052:7
#2 0x60f23c17e81f in base::debug::StackTrace::StackTrace(unsigned long) ./../../base/debug/stack_trace.cc:279:20
#3 0x60f23be973c5 in logging::LogMessage::Flush() ./../../base/logging.cc:705:29
#4 0x60f23be96f30 in logging::LogMessage::~LogMessage() ./../../base/logging.cc:694:3
#5 0x60f23be63dc1 in ~DCheckLogMessage ./../../base/check.cc:147:3
#6 0x60f23be63dc1 in logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage() ./../../base/check.cc:143:32
#7 0x60f23be62c30 in operator() ./gen/third_party/libc++/src/include/__memory/unique_ptr.h:77:5
#8 0x60f23be62c30 in reset ./gen/third_party/libc++/src/include/__memory/unique_ptr.h:290:7
#9 0x60f23be62c30 in logging::NotReachedError::~NotReachedError() ./../../base/check.cc:306:16
#10 0x60f23beb6140 in base::internal::WeakReference::Flag::Invalidate() ./../../base/memory/weak_ptr.cc:27:3
#11 0x60f23beb70a6 in base::internal::WeakReferenceOwner::~WeakReferenceOwner() ./../../base/memory/weak_ptr.cc:87:12
#12 0x60f23e3c5d69 in mojo::Connector::~Connector() ./../../mojo/public/cpp/bindings/lib/connector.cc:177:1
#13 0x60f23e40257d in mojo::internal::MultiplexRouter::~MultiplexRouter() ./../../mojo/public/cpp/bindings/lib/multiplex_router.cc:468:1
#14 0x60f2369f7d0c in ~TestableMultiplexRouter ./../../mojo/public/cpp/bindings/tests/testable_multiplex_router.cc:56:1
#15 0x60f2369f7d0c in ~TestableMultiplexRouter ./../../mojo/public/cpp/bindings/tests/testable_multiplex_router.cc:52:53
#16 0x60f2369f7d0c in non-virtual thunk to mojo::test::TestableMultiplexRouter::~TestableMultiplexRouter() ./../../mojo/public/cpp/bindings/tests/testable_multiplex_router.cc:0:0
#17 0x60f2369f87e8 in DeleteInternal<mojo::AssociatedGroupController> ./../../base/memory/ref_counted.h:438:5
#18 0x60f2369f87e8 in Destruct ./../../base/memory/ref_counted.h:391:5
#19 0x60f2369f87e8 in Release ./../../base/memory/ref_counted.h:427:7
#20 0x60f2369f87e8 in Release ./../../base/memory/scoped_refptr.h:392:8
#21 0x60f2369f87e8 in ~scoped_refptr ./../../base/memory/scoped_refptr.h:280:7
#22 0x60f2369f87e8 in reset ./../../base/memory/scoped_refptr.h:311:18
#23 0x60f2369f87e8 in mojo::test::TestableMultiplexRouter::Scheduler::ResetRouter() ./../../mojo/public/cpp/bindings/tests/testable_multiplex_router.cc:83:11
#24 0x60f2369f9202 in Invoke<void (mojo::test::TestableMultiplexRouter::Scheduler::*)(), mojo::test::TestableMultiplexRouter::Scheduler *> ./../../base/functional/bind_internal.h:730:12
#25 0x60f2369f9202 in MakeItSo<void (mojo::test::TestableMultiplexRouter::Scheduler::*)(), std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::test::TestableMultiplexRouter::Scheduler, base::unretained_traits::MayNotDangle, (partition_alloc::internal::RawPtrTraits)0> > > ./../../base/functional/bind_internal.h:922:12
#26 0x60f2369f9202 in RunImpl<void (mojo::test::TestableMultiplexRouter::Scheduler::*)(), std::__Cr::tuple<base::internal::UnretainedWrapper<mojo::test::TestableMultiplexRouter::Scheduler, base::unretained_traits::MayNotDangle, (partition_alloc::internal::RawPtrTraits)0> >, 0UL> ./../../base/functional/bind_internal.h:1059:14
#27 0x60f2369f9202 in base::internal::Invoker<base::internal::FunctorTraits<void (mojo::test::TestableMultiplexRouter::Scheduler::*&&)(), mojo::test::TestableMultiplexRouter::Scheduler*>, base::internal::BindState<true, true, false, void (mojo::test::TestableMultiplexRouter::Scheduler::*)(), base::internal::UnretainedWrapper<mojo::test::TestableMultiplexRouter::Scheduler, base::unretained_traits::MayNotDangle, (partition_alloc::internal::RawPtrTraits)0>>, void ()>::RunOnce(base::internal::BindStateBase*) ./../../base/functional/bind_internal.h:972:12
#28 0x60f235fe2666 in base::OnceCallback<void ()>::Run() && ./../../base/functional/callback.h:155:12
#29 0x60f23bf756b8 in base::TaskAnnotator::RunTaskImpl(base::PendingTask&) ./../../base/task/common/task_annotator.cc:229:34
#30 0x60f23c022493 in RunTask<(lambda at ../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:474:11)> ./../../base/task/common/task_annotator.h:113:5
#31 0x60f23c022493 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::LazyNow*) ./../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:472:23
#32 0x60f23c0208ce in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() ./../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:346:40
#33 0x60f23c1c6f07 in base::MessagePumpEpoll::Run(base::MessagePump::Delegate*) ./../../base/message_loop/message_pump_epoll.cc:224:55
#34 0x60f23c024843 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run(bool, base::TimeDelta) ./../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:647:12
#35 0x60f23bf2c855 in base::RunLoop::Run(base::Location const&) ./../../base/run_loop.cc:134:14
#36 0x60f23c111da3 in base::Thread::Run(base::RunLoop*) ./../../base/threading/thread.cc:361:13
#37 0x60f23c11266a in base::Thread::ThreadMain() ./../../base/threading/thread.cc:436:3
#38 0x60f23c16676f in base::(anonymous namespace)::ThreadFunc(void*) ./../../base/threading/platform_thread_posix.cc:102:13
#39 0x60f235f6d587 in asan_thread_start(void*) _asan_rtl_:0
```
namespace mojo {Please fix this WARNING reported by ClangTidy: check: modernize-concat-nested-namespaces
nested namespaces can be concatenated...
check: modernize-concat-nested-namespaces
nested namespaces can be concatenated (https://clang.llvm.org/extra/clang-tidy/checks/modernize/concat-nested-namespaces.html)
(Note: You can add `Skip-Clang-Tidy-Checks: modernize-concat-nested-namespaces` footer to the CL description to skip the check)
(Lint observed on `linux-clang-tidy-rel`, but not on `android-clang-tidy-rel`)
auto router = base::WrapRefCounted(new TestableMultiplexRouter(
std::move(message_pipe), config, set_interface_id_namespace_bit, runner,
"TestableMultiplexRouter"));```suggestion
auto router = base::MakeRefCounted<TestableMultiplexRouter>(
std::move(message_pipe), config, set_interface_id_namespace_bit, runner,
"TestableMultiplexRouter");
```
destructor_wait_->TimedWait(base::Seconds(1));Why 1 second?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Connector connector_;the connector_ is owned by the MultiplexRouter, so it will be deleted at the same time. Is that not the issue here, that the connector_ and MultiplexRouter are deleted while the Connector is performing the error handling on its own sequence? Does the sequence check hit all the time if we make it prominent in the Connector dtor?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
drive-by: Can we do something like [this](https://source.chromium.org/chromium/chromium/src/+/main:mojo/public/cpp/bindings/shared_remote.h?q=symbol%3A%5Cbmojo%3A%3ASharedRemoteBase%3A%3ARemoteWrapper%3A%3ADeleteOnCorrectThread%5Cb%20case%3Ayes) to ensure the clean up happens in the right thread?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I addressed the feedback. I integrated the Deleter code, which I think was a really great suggestion from @Andrea. I think by guaranteeing the deletion happens on the same sequence as all the other calls, we guarantee that the Connector will be shutdown before the calls would come in or we guarantee the calls would happen before the MultiplexRouter is actually deleted. I augmented the being_destructed_ to be a mutable atomic bool and had that set in DeleteOnCorrectSequence and added checks before all the base::WrapRefCounted calls for PostTasks to ensure AddRef won't happen after the destruction has been scheduled.
I will look at the remote_unittest failures probably tonight, but it seems like the issue is hot so I thought I'd put out what I have now as an update.
I am really not a fan of test-only callbacks. Is this something we'll need forever? Are there any possible alternatives?
I updated to friend class the RaceConditionScheduler and I run the callback there right before I call into the method.
Connector connector_;the connector_ is owned by the MultiplexRouter, so it will be deleted at the same time. Is that not the issue here, that the connector_ and MultiplexRouter are deleted while the Connector is performing the error handling on its own sequence? Does the sequence check hit all the time if we make it prominent in the Connector dtor?
It's not deleted at *exactly* the same time.
"For both user-defined or implicitly-defined destructors, after executing the body of the destructor and destroying any automatic objects allocated within the body, the compiler calls the destructors for all non-static non-variant data members of the class, in reverse order of declaration, then it calls the destructors of all direct non-virtual base classes in reverse order of construction (which in turn call the destructors of their members and their base classes, etc), and then, if this object is of most derived class, it calls the destructors of all virtual bases.
Even when the destructor is called directly (e.g. obj.~Foo();), the return statement in ~Foo() does not return control to the caller immediately: it calls all those member and base destructors first."
https://www.en.cppreference.com/w/cpp/language/destructor.html
So ~MultiplexRouter is destroyed, and then the member variables are destroyed one at a time in reverse order, with the task_runner_ being destroyed after the connector_ for example.
I added a MultiplexRouter::DeleteOnCorrectSequence() that gets called by a Destruct method, and that will ensure the destruction is on the right thread and made being_destructed_ a mutable atomic bool set when that method is called. I followed up with checks on that where there were AddRefs to prevent the Use After Free where we have posted the DeleteOnCorrectSequence but then we call something else that AddRefs and posts additional tasks to be handled after that would be called.
&MultiplexRouter::OnPipeConnectionError, base::WrapRefCounted(this),I think this needs a comment to explain why it won't be a leak, since it looks like `this` owns `connector_`.
Is it important to add this ref preemptively? We can't just `scoped_refptr protect(this);` in `OnPipeConnectionError()`?
Based on the CL description, it sounds like it wouldn't work because the error handler callback is posted, and so the ref needs to exist while the callback is in flight, but more comments would be good.
Done
This is perfect! Truly the solution could not have been complete without that suggestion!
Please fix this WARNING reported by ClangTidy: check: modernize-concat-nested-namespaces
nested namespaces can be concatenated...
check: modernize-concat-nested-namespaces
nested namespaces can be concatenated (https://clang.llvm.org/extra/clang-tidy/checks/modernize/concat-nested-namespaces.html)
(Note: You can add `Skip-Clang-Tidy-Checks: modernize-concat-nested-namespaces` footer to the CL description to skip the check)
(Lint observed on `linux-clang-tidy-rel`, but not on `android-clang-tidy-rel`)
Done
auto router = base::WrapRefCounted(new TestableMultiplexRouter(
std::move(message_pipe), config, set_interface_id_namespace_bit, runner,
"TestableMultiplexRouter"));```suggestion
auto router = base::MakeRefCounted<TestableMultiplexRouter>(
std::move(message_pipe), config, set_interface_id_namespace_bit, runner,
"TestableMultiplexRouter");
```
Done
destructor_wait_->TimedWait(base::Seconds(1));Emily AndrewsWhy 1 second?
To ensure there's never a hang in the test. I added an EXPECT_TRUE so it would be an error to have the time elapse.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Fixed the remoting_unittests issue and should be good to go in if you guys are pleased with the change.
I removed the feature key, since ensuring destruction on sequence is safe enough and will not result in hangs.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
the RefCountedThreadSafe::DeleteInternal implementation.Can the CL description mention why the webnn changes are needed? (Though a number of WebNN tests appear to be broken)
// Deletes this object on the correct sequence to protect againstThis comment should explain what "the correct sequence is". That might be better suited as a class-level comment line 24 that explains the threading requirements for the class as a whole; then this can just reference that.
if (being_destructed_) {I'm wondering if this (and other similar checks) are potentially racy.
What prevents this from being `false` here, but then being set to `true` before the `PostTask()` call below?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Is this ready for a re-review?
Not yet, I think you are right about the race, but I'm thinking through when to get rid of the handler reference explicitly once the InterfaceEndpointClient and InterfaceEndpoint are deconstructed. I think I might have something later today.
The ASAN test failures are because I need to get the right RunLoop hook to run through the asynchronous deconstruction. That might take a little time too.
if (being_destructed_) {I'm wondering if this (and other similar checks) are potentially racy.
What prevents this from being `false` here, but then being set to `true` before the `PostTask()` call below?
I found a HasAtLeastOneRef API on RefCountedThreadSafe, and this will not be racy.
if (being_destructed_) {Emily AndrewsI'm wondering if this (and other similar checks) are potentially racy.
What prevents this from being `false` here, but then being set to `true` before the `PostTask()` call below?
I found a HasAtLeastOneRef API on RefCountedThreadSafe, and this will not be racy.
Actually, still a little racy
the RefCountedThreadSafe::DeleteInternal implementation.Can the CL description mention why the webnn changes are needed? (Though a number of WebNN tests appear to be broken)
I'm not surprised the WebNN tests need modifications. We've been struggling in WebNN to ensure that all the impl objects are destroyed during test shutdown because tasks need to be posted between threads.
void WebNNContextProviderImpl::SetDisconnectHandlerForTesting( // IN-TESTIs this comment necessary since the method is named "ForTesting"?
I think I have it now. What I ended up creating was a way for the multiplex router to try to increment itself before it executes code which may be racing during channel shutdown. If it's at a ref count of 0, we just return false and then no-op then early return. I went through and added this check on all the pieces of code and we have a feature flag to turn this on and off. I have put it enabled by default though.
the RefCountedThreadSafe::DeleteInternal implementation.Reilly GrantCan the CL description mention why the webnn changes are needed? (Though a number of WebNN tests appear to be broken)
I'm not surprised the WebNN tests need modifications. We've been struggling in WebNN to ensure that all the impl objects are destroyed during test shutdown because tasks need to be posted between threads.
I'm moving the WebNN test fixes to a separate CL to reduce the complexity of this one.
// Deletes this object on the correct sequence to protect againstThis comment should explain what "the correct sequence is". That might be better suited as a class-level comment line 24 that explains the threading requirements for the class as a whole; then this can just reference that.
Done
if (being_destructed_) {Emily AndrewsI'm wondering if this (and other similar checks) are potentially racy.
What prevents this from being `false` here, but then being set to `true` before the `PostTask()` call below?
Emily AndrewsI found a HasAtLeastOneRef API on RefCountedThreadSafe, and this will not be racy.
Actually, still a little racy
Done
void WebNNContextProviderImpl::SetDisconnectHandlerForTesting( // IN-TESTIs this comment necessary since the method is named "ForTesting"?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Some initial thoughts. I haven't looked at the tests yet.
delete this;Let's explain why `delete this` unconditionally is safe here, i.e. what precondition guarantees that this is safe.
base::WeakPtr<Connector> GetWeakPtrForTesting() {Fine to make this public (especially if that avoids the need for a `friend`)
void DeleteOnCorrectSequence() const override;Nit: group this with the other AGC methods
// This performs 3 atomic operations (TryAddRef + AddRef + Release) because
// scoped_refptr has no public constructor that adopts an existing ref without
// calling AddRef. The private AdoptRefTag constructor is restricted to
// AdoptRef(), which requires REQUIRE_ADOPTION_FOR_REFCOUNTED_TYPE and
// HasOneRef() — neither of which hold here.I think we can extend `scoped_refptr`'s API/helpers to support this so we don't need the extra refcount churn.
I don't have a good name for this though; to be honest, it is really bizarre to pass around a `T*` to which the refcount is zero, because if the refcount is zero, the object is at best partially valid...
auto prevent_destroy = TryMakeScopedRefPtr(this);Can we add a comment here? Presumably the destructor is potentially already on the stack here, but we can explain what scenario this could happen in.
auto prevent_destroy = TryMakeScopedRefPtr(this);Same comment here.
// out safely in that case.This is helpful, but maybe we should have a class-level comment that explains the threading preconditions/postconditions of this class.
The current comment looks something like:
```
// It is partially sequence-affine with several public methods that must be
// called on the sequence to which the MultiplexRouter is bound. See the
// constructor for details.
```
which doesn't really help readers (e.g. me :P) understand the requirements.
TEST_F(MultiplexRouterTest, CloseMessagePipeReleasesRouterSelfReference) {Probably missing something here but I'm a little fuzzy on where the self reference is coming from (as it seems like the test fixture itself also owns references, so I'm not quite sure what we're trying to exercise here)
TEST_F(MultiplexRouterTest, PassMessagePipe) {I think a high-level description of this test would be useful here.
// Now hook up a new router2 to prove we can continue using router1_Because I don't understand why this is important, or how this handles normally without reading things in a lot more details. I think this is supposed to be true because we passed the message pipe above by extracting it and are moving it to another route, hence reuse is OK, but I had to read the test body twice. And I'm still not quite sure why the association bits are significant yet.
std::string(reinterpret_cast<const char*>(response.payload())));The std::string probably isn't necessary here, though I suppose it doesn't hurt either.
TEST_F(MultiplexRouterLifetimeFixTest, PendingMessagesCleanedUpDuringDestruction) {Nit: wrap at 80 (run `git cl format`?)
base::BindOnce(Judicious use of BindLambdaForTesting could simplify some of these, e.g.:
```
...->PostTask(FROM_HERE, base::BindLambdaForTesting(
[r0 = std::move(router0), r1 = std::move(Router1), main_runner, &run_loop] {
}));
```
void ScheduleDeathAndWait(base::OnceClosure callback);And death of what? et cetera
void ScheduleOnWatcherHandleReady(base::OnceClosure callback,Can we add some basic comments here? For example, what watcher handle is this?
// We don't want to hang forever, this should signal eventually.It would be a test bug if this wasn't signalled though right? Why not just wait forever and let tests timeout in case of a bug?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +2 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +2 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Ok I think this should be resolved. I have a separate PR for the WebNN tests that needs to go in first, but after that this should be g2g.
Let's explain why `delete this` unconditionally is safe here, i.e. what precondition guarantees that this is safe.
I've refactored this class so it should handle any potential shutdown/handle error races properly.
Fine to make this public (especially if that avoids the need for a `friend`)
Done
Nit: group this with the other AGC methods
Done
// This performs 3 atomic operations (TryAddRef + AddRef + Release) because
// scoped_refptr has no public constructor that adopts an existing ref without
// calling AddRef. The private AdoptRefTag constructor is restricted to
// AdoptRef(), which requires REQUIRE_ADOPTION_FOR_REFCOUNTED_TYPE and
// HasOneRef() — neither of which hold here.I think we can extend `scoped_refptr`'s API/helpers to support this so we don't need the extra refcount churn.
I don't have a good name for this though; to be honest, it is really bizarre to pass around a `T*` to which the refcount is zero, because if the refcount is zero, the object is at best partially valid...
ok so I shifted gears, we post the last release on the sequence. This means that the Connector::HandleError call will be canceled when it's deleted.
Can we add a comment here? Presumably the destructor is potentially already on the stack here, but we can explain what scenario this could happen in.
We're posting with the WrapRefCounted AddRef helper so I think this is safe now.
auto prevent_destroy = TryMakeScopedRefPtr(this);Emily AndrewsSame comment here.
Done
This is helpful, but maybe we should have a class-level comment that explains the threading preconditions/postconditions of this class.
The current comment looks something like:
```
// It is partially sequence-affine with several public methods that must be
// called on the sequence to which the MultiplexRouter is bound. See the
// constructor for details.
```which doesn't really help readers (e.g. me :P) understand the requirements.
I think because we've fixed the Release sequencing maybe it's less important?
TEST_F(MultiplexRouterTest, CloseMessagePipeReleasesRouterSelfReference) {Probably missing something here but I'm a little fuzzy on where the self reference is coming from (as it seems like the test fixture itself also owns references, so I'm not quite sure what we're trying to exercise here)
I think this test is probably out of date. This is from when I was passing in the ref to the connector.
TEST_F(MultiplexRouterTest, PassMessagePipe) {I think a high-level description of this test would be useful here.
Done
// Now hook up a new router2 to prove we can continue using router1_Because I don't understand why this is important, or how this handles normally without reading things in a lot more details. I think this is supposed to be true because we passed the message pipe above by extracting it and are moving it to another route, hence reuse is OK, but I had to read the test body twice. And I'm still not quite sure why the association bits are significant yet.
This is used by the Remote::Unbind and then Remote::Bind API to move the mojo remote to different threads. I have documented it.
std::string(reinterpret_cast<const char*>(response.payload())));The std::string probably isn't necessary here, though I suppose it doesn't hurt either.
Removing this here and in the other tests in this file.
TEST_F(MultiplexRouterLifetimeFixTest, PendingMessagesCleanedUpDuringDestruction) {Nit: wrap at 80 (run `git cl format`?)
Done
base::BindOnce(Judicious use of BindLambdaForTesting could simplify some of these, e.g.:
```
...->PostTask(FROM_HERE, base::BindLambdaForTesting(
[r0 = std::move(router0), r1 = std::move(Router1), main_runner, &run_loop] {
}));
```
Done
void ScheduleDeathAndWait(base::OnceClosure callback);And death of what? et cetera
Done
void ScheduleOnWatcherHandleReady(base::OnceClosure callback,Can we add some basic comments here? For example, what watcher handle is this?
Done
// We don't want to hang forever, this should signal eventually.It would be a test bug if this wasn't signalled though right? Why not just wait forever and let tests timeout in case of a bug?
TimedWait returns false if it isn't signaled, which would cause the test to fail explicitly and not also cause the test pipeline to take an obscene amount of time before failing.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for working through this; I know it's been a bit of a mess. A few (hopefully last) questions from my side
if (bound_task_runner_ && !bound_task_runner_->RunsTasksInCurrentSequence()) {Remind me why this might be null?
// release — may delete off-sequence, same as pre-fix behavior.We should be safe in this case, or can the sequence be running the error handler?
i.e. if PostTask() fails is that a promise to never run more tasks?
EXPECT_EQ(std::string_view("hello again world!"), GetPayloadAsStringView(response));Nit: git cl format this CL for wrapping
| Commit-Queue | +1 |
if (bound_task_runner_ && !bound_task_runner_->RunsTasksInCurrentSequence()) {Remind me why this might be null?
I don't think it should be, so I added a CHECK to guard against a programming error.
// release — may delete off-sequence, same as pre-fix behavior.We should be safe in this case, or can the sequence be running the error handler?
i.e. if PostTask() fails is that a promise to never run more tasks?
According to the PostTask documentation, if it's true it might execute and if it's false it's guaranteed not to execute. https://source.chromium.org/chromium/chromium/src/+/main:base/task/task_runner.h;l=63-65
EXPECT_EQ(std::string_view("hello again world!"), GetPayloadAsStringView(response));Nit: git cl format this CL for wrapping
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// release — may delete off-sequence, same as pre-fix behavior.Emily AndrewsWe should be safe in this case, or can the sequence be running the error handler?
i.e. if PostTask() fails is that a promise to never run more tasks?
According to the PostTask documentation, if it's true it might execute and if it's false it's guaranteed not to execute. https://source.chromium.org/chromium/chromium/src/+/main:base/task/task_runner.h;l=63-65
This really only happens when the process is shutting down, so I think I'm OK just not handling the `PostTask()` failed case and leaking.
(I've noticed things increasingly trying to handle the `PostTask()` return value, and I feel that is generally not necessary, and I'll be attempting to remove this from `PostTask()` at some point; removing this branch saves me a bit of work down the road :)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think we need to handle PostTask failing in the Release because otherwise we can get ASAN bugs if the task runner the MultiplexRouter is running on is shutdown before all the refs are released.
// release — may delete off-sequence, same as pre-fix behavior.Emily AndrewsWe should be safe in this case, or can the sequence be running the error handler?
i.e. if PostTask() fails is that a promise to never run more tasks?
Daniel ChengAccording to the PostTask documentation, if it's true it might execute and if it's false it's guaranteed not to execute. https://source.chromium.org/chromium/chromium/src/+/main:base/task/task_runner.h;l=63-65
This really only happens when the process is shutting down, so I think I'm OK just not handling the `PostTask()` failed case and leaking.
(I've noticed things increasingly trying to handle the `PostTask()` return value, and I feel that is generally not necessary, and I'll be attempting to remove this from `PostTask()` at some point; removing this branch saves me a bit of work down the road :)
I'd rather not, because what can happen in test code is that the task runner gets shut down and then we get ASAN bugs galore.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think we need to handle PostTask failing in the Release because otherwise we can get ASAN bugs if the task runner the MultiplexRouter is running on is shutdown before all the refs are released.
In that case, we should just leak it and not release.
// release — may delete off-sequence, same as pre-fix behavior.Emily AndrewsWe should be safe in this case, or can the sequence be running the error handler?
i.e. if PostTask() fails is that a promise to never run more tasks?
Daniel ChengAccording to the PostTask documentation, if it's true it might execute and if it's false it's guaranteed not to execute. https://source.chromium.org/chromium/chromium/src/+/main:base/task/task_runner.h;l=63-65
Emily AndrewsThis really only happens when the process is shutting down, so I think I'm OK just not handling the `PostTask()` failed case and leaking.
(I've noticed things increasingly trying to handle the `PostTask()` return value, and I feel that is generally not necessary, and I'll be attempting to remove this from `PostTask()` at some point; removing this branch saves me a bit of work down the road :)
I'd rather not, because what can happen in test code is that the task runner gets shut down and then we get ASAN bugs galore.
Daniel ChengI think we need to handle PostTask failing in the Release because otherwise we can get ASAN bugs if the task runner the MultiplexRouter is running on is shutdown before all the refs are released.
In that case, we should just leak it and not release.
If we leak it, then we will get ASAN test failures because the memory doesn't get cleaned up.
ASAN tests will fail if objects are leaked. It does not matter if it doesn't impact production code and it only happens during process shutdown in the real world, etc. I think we need to just release the object if the PostTask fails to avoid the ASAN test failures, and I have spent like 4 months fixing them. The alternative is we provide some sort of a hook to wait on for tests where an event is signaled indicating destruction, but I've received previous feedback that there's a preference for not doing that.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Daniel ChengI think we need to handle PostTask failing in the Release because otherwise we can get ASAN bugs if the task runner the MultiplexRouter is running on is shutdown before all the refs are released.
Emily AndrewsIn that case, we should just leak it and not release.
If we leak it, then we will get ASAN test failures because the memory doesn't get cleaned up.
Yes, but does that happen in practice?
If it really does, we can just mark the object as potentially leaked. We already have precedent for that, e.g. https://source.chromium.org/chromium/chromium/src/+/main:base/threading/post_task_and_reply_impl.cc;l=83;drc=099aa05d9202a1953d8da94e110c0c5463f6f90e if we really need to take that path.
I don't think we really want to handle the return value of PostTask() here ultimately. And leaking it is still safer than running a `delete this` that might race with still-running tasks, even if the task runner is no longer accepting new tasks due to process shutdown.
| Code-Review | +1 |
// so we mark the object as leaking in case the PostTask fails.```suggestion
// so we mark the object as leaking in case the PostTask fails,
// i.e. at browser shutdown.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Alright should be GTG, had to resolve some merge conflicts.
// so we mark the object as leaking in case the PostTask fails.```suggestion
// so we mark the object as leaking in case the PostTask fails,
// i.e. at browser shutdown.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |