Thanks for the fix.
Question: why not just move the setflush(false) to be after start event?
also there seems to be a disconnect between stream and source wrapper in terms of flush states, I think the disconnect needs to be addressed
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the fix.
Question: why not just move the setflush(false) to be after start event?
also there seems to be a disconnect between stream and source wrapper in terms of flush states, I think the disconnect needs to be addressed
Hey Daoyuan, the reason I didn't move the setflush(false) after the start event was because the logic checks involving the flushed state are always coupled with the started state. For example, if we moved the setflush(false) after the started event in the source, we'd introduce a window in time in ServicePostFlushSampleRequest() where we're still flushed but also in a started state which would cause unintentionally taking some corner case related to backwards seeking.
Concerning the flushed state, is there a reason you're seeing that the source needs to know the flushed state of the stream? As far as I can tell, the sources only responsibility is to flush the stream, but the stream flushed state is only relevant to complex logic in the stream itself that doesn't concern the source. It seems like the source having control over the stream flushed state adds an unnecessary dependency between the source and what should be stream specific logic.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThanks for the fix.
Question: why not just move the setflush(false) to be after start event?
also there seems to be a disconnect between stream and source wrapper in terms of flush states, I think the disconnect needs to be addressed
Hey Daoyuan, the reason I didn't move the setflush(false) after the start event was because the logic checks involving the flushed state are always coupled with the started state. For example, if we moved the setflush(false) after the started event in the source, we'd introduce a window in time in ServicePostFlushSampleRequest() where we're still flushed but also in a started state which would cause unintentionally taking some corner case related to backwards seeking.
Concerning the flushed state, is there a reason you're seeing that the source needs to know the flushed state of the stream? As far as I can tell, the sources only responsibility is to flush the stream, but the stream flushed state is only relevant to complex logic in the stream itself that doesn't concern the source. It seems like the source having control over the stream flushed state adds an unnecessary dependency between the source and what should be stream specific logic.
can you expand some details on "would cause unintentionally taking some corner case related to backwards seeking."?
in terms of the flush state, thinking more about it, I am ok for this change.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThanks for the fix.
Question: why not just move the setflush(false) to be after start event?
also there seems to be a disconnect between stream and source wrapper in terms of flush states, I think the disconnect needs to be addressed
Daoyuan LiHey Daoyuan, the reason I didn't move the setflush(false) after the start event was because the logic checks involving the flushed state are always coupled with the started state. For example, if we moved the setflush(false) after the started event in the source, we'd introduce a window in time in ServicePostFlushSampleRequest() where we're still flushed but also in a started state which would cause unintentionally taking some corner case related to backwards seeking.
Concerning the flushed state, is there a reason you're seeing that the source needs to know the flushed state of the stream? As far as I can tell, the sources only responsibility is to flush the stream, but the stream flushed state is only relevant to complex logic in the stream itself that doesn't concern the source. It seems like the source having control over the stream flushed state adds an unnecessary dependency between the source and what should be stream specific logic.
can you expand some details on "would cause unintentionally taking some corner case related to backwards seeking."?
in terms of the flush state, thinking more about it, I am ok for this change.
Sure, so in MediaFoundationStreamWrapper::ServicePostFlushSampleRequest() there's an if-check with the following condition -
if (flushed_ && state_ == State::kStarted &&
last_start_time_ != kInvalidTime) { ..logic related to consecutive backward seek and inserting MEStreamTick events (see code for details)...}
If setFlush(false) were just moved to be after the QueueStartedEvent() call in MediaFoundationSourceWrapper, you would have a scenario where state_ == started but we are still in a flushed_==true state before we make it to the setFlush(false) call. During this window of time on another thread, this backward seek cornercase condition would be satisfied despite us not actually being in the scenario this cornercase is intended for. This is because the stream lock is not held between these 2 stream methods being called, so another thread can obtain the lock and perform MediaFoundationStreamWrapper::ServicePostFlushSampleRequest(). With my fix, this scenario is avoided by locking the stream and modifying the flushed_ and state_ together.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThanks for the fix.
Question: why not just move the setflush(false) to be after start event?
also there seems to be a disconnect between stream and source wrapper in terms of flush states, I think the disconnect needs to be addressed
Daoyuan LiHey Daoyuan, the reason I didn't move the setflush(false) after the start event was because the logic checks involving the flushed state are always coupled with the started state. For example, if we moved the setflush(false) after the started event in the source, we'd introduce a window in time in ServicePostFlushSampleRequest() where we're still flushed but also in a started state which would cause unintentionally taking some corner case related to backwards seeking.
Concerning the flushed state, is there a reason you're seeing that the source needs to know the flushed state of the stream? As far as I can tell, the sources only responsibility is to flush the stream, but the stream flushed state is only relevant to complex logic in the stream itself that doesn't concern the source. It seems like the source having control over the stream flushed state adds an unnecessary dependency between the source and what should be stream specific logic.
Evan Williamscan you expand some details on "would cause unintentionally taking some corner case related to backwards seeking."?
in terms of the flush state, thinking more about it, I am ok for this change.
Sure, so in MediaFoundationStreamWrapper::ServicePostFlushSampleRequest() there's an if-check with the following condition -
if (flushed_ && state_ == State::kStarted &&
last_start_time_ != kInvalidTime) { ..logic related to consecutive backward seek and inserting MEStreamTick events (see code for details)...}
If setFlush(false) were just moved to be after the QueueStartedEvent() call in MediaFoundationSourceWrapper, you would have a scenario where state_ == started but we are still in a flushed_==true state before we make it to the setFlush(false) call. During this window of time on another thread, this backward seek cornercase condition would be satisfied despite us not actually being in the scenario this cornercase is intended for. This is because the stream lock is not held between these 2 stream methods being called, so another thread can obtain the lock and perform MediaFoundationStreamWrapper::ServicePostFlushSampleRequest(). With my fix, this scenario is avoided by locking the stream and modifying the flushed_ and state_ together.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
PTAL when you get a chance! Also please add any additional reviewers from the google side.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you for the fix!
1. Can this change be covered by adding new tests here https://source.chromium.org/chromium/chromium/src/+/main:media/renderers/win/media_foundation_renderer_integration_test.cc or somewhere else? How can we ensure if this doesn't break the existing playback workflows?
2. Is this issue only happening on H264 codec or HEVC as well?
3. Is this issue only happening at the begging since `Start()` seems to be causing the race condition?
4. Is this `DRM_E_H264_SH_PPS_NOT_FOUND` error the only error that was caused by the issue?
stream->SetFlushed(false);Now you moved this to `QueueStartedEvent()` and `QueueSeekedEvent()` respectively. If `selected` is false, then now it doesn't change flushed to false anymore. Is this expected?
DVLOG_FUNC(2) << "flushed=false";nit: Rename to `flushed_` since `flushed` param is no longer available?
DVLOG_FUNC(2) << "flushed=false";nit: ditto
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you for the fix!
1. Can this change be covered by adding new tests here https://source.chromium.org/chromium/chromium/src/+/main:media/renderers/win/media_foundation_renderer_integration_test.cc or somewhere else? How can we ensure if this doesn't break the existing playback workflows?
2. Is this issue only happening on H264 codec or HEVC as well?
3. Is this issue only happening at the begging since `Start()` seems to be causing the race condition?
4. Is this `DRM_E_H264_SH_PPS_NOT_FOUND` error the only error that was caused by the issue?
Thanks for the feedback and good questions. Apologies for the delay, I was on-call last week.-
1. Currently familiarizing myself with the current test suite. The preexisting tests are very basic. If this coverage isn't sufficient to gain confidence existing playback workflows aren't broken, what has been sufficient for past changes in this area? Imo, I do think more general test scenarios need to be added to improve general playback workflow confidence for this area, but this seems like a larger scope of work that shouldn't block this small bug fix if it hasn't blocked previous fixes in this area. Aside from running the existing integration tests, I have manually tested this change using basic playback scenarios (playing/seeking/pause/play/rate change/etc.) with Netflix and haven't encountered any issues. I will look into adding some more tests that should improve the coverage for paths that would impact this change.
2. There's no reason this race condition couldn't happen with any media stream using this class regardless of codec type, but the impact of the dropped sample differs depending on how durable the codec is to dropped samples (particularly first key frames in this case). I'm only aware of a fatal error in the H264 nvidia HWDRM case.
3. This Start() method is also called in the seek path, so I wouldn't say the race condition happening mid-playback is out of the question. In my testing when I repro, it has always been at playback start at which point a seek can also occur to reach the point a user previously left off watching their content on Netflix.
4. DRM_E_SH_PPS_NOT_FOUND is the only fatal error I'm aware of. Something about the HWDRM H264 implementation with Nvidia seem to be less durable to a dropped first IDR frame and results in that fatal error. On non-nvidia hardware I've tested, the result is just that playback start is delayed and plays normally at the start of the next video segment.
Let me know if you have any further questions
stream->SetFlushed(false);Now you moved this to `QueueStartedEvent()` and `QueueSeekedEvent()` respectively. If `selected` is false, then now it doesn't change flushed to false anymore. Is this expected?
In practice, the flushed_ state should be irrelevant to unselected streams because unselected streams should not have any samples or process any samples. I don't like the name of the flushed_ variable because it doesn't accurately reflect the purpose of the variable, which is not to track the flushed state, but rather change how post-flush samples are processed before playback starts again (they are buffered and processed later after start is called). Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThank you for the fix!
1. Can this change be covered by adding new tests here https://source.chromium.org/chromium/chromium/src/+/main:media/renderers/win/media_foundation_renderer_integration_test.cc or somewhere else? How can we ensure if this doesn't break the existing playback workflows?
2. Is this issue only happening on H264 codec or HEVC as well?
3. Is this issue only happening at the begging since `Start()` seems to be causing the race condition?
4. Is this `DRM_E_H264_SH_PPS_NOT_FOUND` error the only error that was caused by the issue?
Thanks for the feedback and good questions. Apologies for the delay, I was on-call last week.-
1. Currently familiarizing myself with the current test suite. The preexisting tests are very basic. If this coverage isn't sufficient to gain confidence existing playback workflows aren't broken, what has been sufficient for past changes in this area? Imo, I do think more general test scenarios need to be added to improve general playback workflow confidence for this area, but this seems like a larger scope of work that shouldn't block this small bug fix if it hasn't blocked previous fixes in this area. Aside from running the existing integration tests, I have manually tested this change using basic playback scenarios (playing/seeking/pause/play/rate change/etc.) with Netflix and haven't encountered any issues. I will look into adding some more tests that should improve the coverage for paths that would impact this change.
2. There's no reason this race condition couldn't happen with any media stream using this class regardless of codec type, but the impact of the dropped sample differs depending on how durable the codec is to dropped samples (particularly first key frames in this case). I'm only aware of a fatal error in the H264 nvidia HWDRM case.
3. This Start() method is also called in the seek path, so I wouldn't say the race condition happening mid-playback is out of the question. In my testing when I repro, it has always been at playback start at which point a seek can also occur to reach the point a user previously left off watching their content on Netflix.
4. DRM_E_SH_PPS_NOT_FOUND is the only fatal error I'm aware of. Something about the HWDRM H264 implementation with Nvidia seem to be less durable to a dropped first IDR frame and results in that fatal error. On non-nvidia hardware I've tested, the result is just that playback start is delayed and plays normally at the start of the next video segment.
Let me know if you have any further questions
1.
> If this coverage isn't sufficient to gain confidence existing playback workflows aren't broken, what has been sufficient for past changes in this area? Imo, I do think more general test scenarios need to be added to improve general playback workflow confidence for this area, but this seems like a larger scope of work that shouldn't block this small bug fix if it hasn't blocked previous fixes in this area.
This insufficient test coverage in this area has been causing many issues even with small fixes. We have some technical debts such as http://crbug.com/40261337 and http://crbug.com/391538329. We discussed few times back but never made a good progress.
I have manually tested this change using basic playback scenarios (playing/seeking/pause/play/rate change/etc.) with Netflix and haven't encountered any issues.
Thank you for those manual tests!
I will look into adding some more tests that should improve the coverage for paths that would impact this change.
It would be great if you can help improve the test coverage in this area in the future.
2.
Ack
3.
Ack
4.
Interesting and good to known. In general, NVIDIA's playback success rate is more stable than others.
stream->SetFlushed(false);Evan WilliamsNow you moved this to `QueueStartedEvent()` and `QueueSeekedEvent()` respectively. If `selected` is false, then now it doesn't change flushed to false anymore. Is this expected?
In practice, the flushed_ state should be irrelevant to unselected streams because unselected streams should not have any samples or process any samples. I don't like the name of the flushed_ variable because it doesn't accurately reflect the purpose of the variable, which is not to track the flushed state, but rather change how post-flush samples are processed before playback starts again (they are buffered and processed later after start is called). Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
With the integration tests, they don't allow for testing more nuanced scenarios that this change impacts so I'm creating new unittests for MediaFoundationStreamWrapper. I'm hoping to have these up for review within the next 2 days.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan Williams@evanwi...@microsoft.com, what's your plan on this?
With the integration tests, they don't allow for testing more nuanced scenarios that this change impacts so I'm creating new unittests for MediaFoundationStreamWrapper. I'm hoping to have these up for review within the next 2 days.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Added seeking unittests to existing integration test suite. Also added MediaFoundationStreamWrapper unittests that exercise logic around buffering sample requests post-flush and pre-start.
stream->SetFlushed(false);Evan WilliamsNow you moved this to `QueueStartedEvent()` and `QueueSeekedEvent()` respectively. If `selected` is false, then now it doesn't change flushed to false anymore. Is this expected?
Sangbaek ParkIn practice, the flushed_ state should be irrelevant to unselected streams because unselected streams should not have any samples or process any samples. I don't like the name of the flushed_ variable because it doesn't accurately reflect the purpose of the variable, which is not to track the flushed state, but rather change how post-flush samples are processed before playback starts again (they are buffered and processed later after start is called). Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
This sounds good to me, thanks!
Changed SetSelected(false) to clear buffering_post_flush_samples_ (previously named flushed_) and also no longer set buffering_post_flush_samples_ when an unselected stream is flushed. In practice, this should be irrelevant as mentioned above, but it does stay consistent with the previous behavior of always clearing buffering_post_flush_samples_ for both selected and unselected streams.
DVLOG_FUNC(2) << "flushed=false";nit: Rename to `flushed_` since `flushed` param is no longer available?
Renamed flushed_ variable to buffering_post_flush_samples_ as it's more descriptive of the actual purpose of the variable. Updated logging to indicate when that variable is set.
DVLOG_FUNC(2) << "flushed=false";nit: ditto
Renamed flushed_ variable to buffering_post_flush_samples_ as it's more descriptive of the actual purpose of the variable. Updated logging to indicate when that variable is set.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Please fix build errors on some trybots.
stream->SetFlushed(false);Evan WilliamsNow you moved this to `QueueStartedEvent()` and `QueueSeekedEvent()` respectively. If `selected` is false, then now it doesn't change flushed to false anymore. Is this expected?
Sangbaek ParkIn practice, the flushed_ state should be irrelevant to unselected streams because unselected streams should not have any samples or process any samples. I don't like the name of the flushed_ variable because it doesn't accurately reflect the purpose of the variable, which is not to track the flushed state, but rather change how post-flush samples are processed before playback starts again (they are buffered and processed later after start is called). Will make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
Evan WilliamsWill make some changes that will add clarity to this variables purpose and also make the behavior consistent with clearing the flushed_ state when a stream is unselected.
This sounds good to me, thanks!
Changed SetSelected(false) to clear buffering_post_flush_samples_ (previously named flushed_) and also no longer set buffering_post_flush_samples_ when an unselected stream is flushed. In practice, this should be irrelevant as mentioned above, but it does stay consistent with the previous behavior of always clearing buffering_post_flush_samples_ for both selected and unselected streams.
Thank you so much for adding tests! I feel more comfortable with tests but I have a question. Please check out my comment.
DVLOG_FUNC(2) << "flushed=false";Evan Williamsnit: Rename to `flushed_` since `flushed` param is no longer available?
Renamed flushed_ variable to buffering_post_flush_samples_ as it's more descriptive of the actual purpose of the variable. Updated logging to indicate when that variable is set.
Ack. Sgtm, thanks!
DVLOG_FUNC(2) << "flushed=false";Evan Williamsnit: ditto
Renamed flushed_ variable to buffering_post_flush_samples_ as it's more descriptive of the actual purpose of the variable. Updated logging to indicate when that variable is set.
Acknowledged
The tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Please fix build errors on some trybots.
looking into this
The tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Half of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Half of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
I see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsPlease fix build errors on some trybots.
looking into this
Fixed by adding MEDIA_EXPORT to MediaFoundationSourceWrapper to make its symbols available to the unittest in debug builds
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Sangbaek ParkHalf of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
I see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
Ah, I see what you're saying. Creating that test with MediaFoundationSourceWrapper as the UUT would be quite complex given the async nature of the problem. I think putting this work in the next CL to craft this test along with other general MediaFoundationSourceWrapper tests makes sense to not further delay getting this fix out.
The way I was able to manually test was by creating a test build where MediaFoundationSourceWrapper::Start() was modified to sleep for 500ms during the window of time after MediaFoundationStreamWrapper::SetFlush(false) was called but before MediaFoundationStreamWrapper::QueueStartedEvent() was called. This expanded the window of time a sample could be asynchronously processed and dropped (because the stream hadn't started). Before my change, the H264 fatal error would occur reliably on my nvidia device and after my change it wouldn't occur.
If holding off on crafting an equivalent unittest is okay with you, lmk if there's an existing CL I should mention this scenario in to make sure it isn't forgotten.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsPlease fix build errors on some trybots.
Evan Williamslooking into this
Fixed by adding MEDIA_EXPORT to MediaFoundationSourceWrapper to make its symbols available to the unittest in debug builds
Acknowledged
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Sangbaek ParkHalf of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
Evan WilliamsI see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
Ah, I see what you're saying. Creating that test with MediaFoundationSourceWrapper as the UUT would be quite complex given the async nature of the problem. I think putting this work in the next CL to craft this test along with other general MediaFoundationSourceWrapper tests makes sense to not further delay getting this fix out.
The way I was able to manually test was by creating a test build where MediaFoundationSourceWrapper::Start() was modified to sleep for 500ms during the window of time after MediaFoundationStreamWrapper::SetFlush(false) was called but before MediaFoundationStreamWrapper::QueueStartedEvent() was called. This expanded the window of time a sample could be asynchronously processed and dropped (because the stream hadn't started). Before my change, the H264 fatal error would occur reliably on my nvidia device and after my change it wouldn't occur.
If holding off on crafting an equivalent unittest is okay with you, lmk if there's an existing CL I should mention this scenario in to make sure it isn't forgotten.
For now your new tests are enough to cover your new code. Please file a new crbug.com to follow up adding MediaFoundationSourceWrapper test to cover the race condition scenario. And add a comment somewhere in the code.
```
TODO(crbug.com/xyz): describe what we need to do
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Sangbaek ParkHalf of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
Evan WilliamsI see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
Sangbaek ParkAh, I see what you're saying. Creating that test with MediaFoundationSourceWrapper as the UUT would be quite complex given the async nature of the problem. I think putting this work in the next CL to craft this test along with other general MediaFoundationSourceWrapper tests makes sense to not further delay getting this fix out.
The way I was able to manually test was by creating a test build where MediaFoundationSourceWrapper::Start() was modified to sleep for 500ms during the window of time after MediaFoundationStreamWrapper::SetFlush(false) was called but before MediaFoundationStreamWrapper::QueueStartedEvent() was called. This expanded the window of time a sample could be asynchronously processed and dropped (because the stream hadn't started). Before my change, the H264 fatal error would occur reliably on my nvidia device and after my change it wouldn't occur.
If holding off on crafting an equivalent unittest is okay with you, lmk if there's an existing CL I should mention this scenario in to make sure it isn't forgotten.
For now your new tests are enough to cover your new code. Please file a new crbug.com to follow up adding MediaFoundationSourceWrapper test to cover the race condition scenario. And add a comment somewhere in the code.
```
TODO(crbug.com/xyz): describe what we need to do
```
Created
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Sangbaek ParkHalf of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
Evan WilliamsI see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
Sangbaek ParkAh, I see what you're saying. Creating that test with MediaFoundationSourceWrapper as the UUT would be quite complex given the async nature of the problem. I think putting this work in the next CL to craft this test along with other general MediaFoundationSourceWrapper tests makes sense to not further delay getting this fix out.
The way I was able to manually test was by creating a test build where MediaFoundationSourceWrapper::Start() was modified to sleep for 500ms during the window of time after MediaFoundationStreamWrapper::SetFlush(false) was called but before MediaFoundationStreamWrapper::QueueStartedEvent() was called. This expanded the window of time a sample could be asynchronously processed and dropped (because the stream hadn't started). Before my change, the H264 fatal error would occur reliably on my nvidia device and after my change it wouldn't occur.
If holding off on crafting an equivalent unittest is okay with you, lmk if there's an existing CL I should mention this scenario in to make sure it isn't forgotten.
For now your new tests are enough to cover your new code. Please file a new crbug.com to follow up adding MediaFoundationSourceWrapper test to cover the race condition scenario. And add a comment somewhere in the code.
```
TODO(crbug.com/xyz): describe what we need to do
```
Created crbug.com/460732308 and added a TODO comment as suggested
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
lgtm, thank you so much for the fix and adding tests!
Evan WilliamsThe tests here are testing what we expect which is great. Do we have a test that simulates the previous race condition? For example, without your changes that test will fail but with your changes now it should pass.
If that type of test already exists, then please add a comment to it. Thanks!
Sangbaek ParkHalf of the previous race condition scenario relied on the MediaFoundationStreamWrapper::SetFlushed(false) method which was removed. This limits the user's ability (me as a tester in this case) to manufacture how the race condition previously occurred, so I don't see how it's possible to do what you're suggesting. Please correct me if I'm missing something.
Evan WilliamsI see, the removed method in `MediaFoundationSourceWrapper`, not in `MediaFoundationStreamWrapper`.
I think it would be great if we can have a test at `MediaFoundationSourceWrapper` to mimick the race condition. Or you can try it in the next CL if it is too complex.
Sangbaek ParkAh, I see what you're saying. Creating that test with MediaFoundationSourceWrapper as the UUT would be quite complex given the async nature of the problem. I think putting this work in the next CL to craft this test along with other general MediaFoundationSourceWrapper tests makes sense to not further delay getting this fix out.
The way I was able to manually test was by creating a test build where MediaFoundationSourceWrapper::Start() was modified to sleep for 500ms during the window of time after MediaFoundationStreamWrapper::SetFlush(false) was called but before MediaFoundationStreamWrapper::QueueStartedEvent() was called. This expanded the window of time a sample could be asynchronously processed and dropped (because the stream hadn't started). Before my change, the H264 fatal error would occur reliably on my nvidia device and after my change it wouldn't occur.
If holding off on crafting an equivalent unittest is okay with you, lmk if there's an existing CL I should mention this scenario in to make sure it isn't forgotten.
Evan WilliamsFor now your new tests are enough to cover your new code. Please file a new crbug.com to follow up adding MediaFoundationSourceWrapper test to cover the race condition scenario. And add a comment somewhere in the code.
```
TODO(crbug.com/xyz): describe what we need to do
```
Created crbug.com/460732308 and added a TODO comment as suggested
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I'll take a look later today or tomorrow.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
buffering_post_flush_samples_ = true;Sorry for the ignorance and I haven't looked at this code for a while. But I am a bit confused about the relationship between MediaFoundationSourceWrapper::Start() and MediaFoundationSourceWrapper::FlushStreams(). The latter is called by the media pipeline during seek, and I guess after that the MediaEngine will call the former. But it doesn't make sense since we set `buffering_post_flush_samples_` to be true in Flush(), but set it to be false when we queue the Started/Seeked events. But we need it to be true to buffer samples. So I am really confused now. Do you mind help explaining the sequence of events once again for my understanding? Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
buffering_post_flush_samples_ = true;Sorry for the ignorance and I haven't looked at this code for a while. But I am a bit confused about the relationship between MediaFoundationSourceWrapper::Start() and MediaFoundationSourceWrapper::FlushStreams(). The latter is called by the media pipeline during seek, and I guess after that the MediaEngine will call the former. But it doesn't make sense since we set `buffering_post_flush_samples_` to be true in Flush(), but set it to be false when we queue the Started/Seeked events. But we need it to be true to buffer samples. So I am really confused now. Do you mind help explaining the sequence of events once again for my understanding? Thanks!
After the seek/flush, the stream can still receive data it's prepared to send to the MF pipeline, but as part of the MF design it's up to the source to control when that stream data can actually be sent to the pipeline to satisfy MF pipeline sample requests. So the purpose of buffering_post_flush_samples_ is to flag when storing data the stream obtains is required because it cannot send that data until the source allows the flow of samples via MediaFoundationSourceWrapper::Start(). Once the source is started, the special handling to require storing stream data is no longer required and the stream can satisfy any MF pipeline sample requests as they come; this is why buffering_post_flush_samples_ is set false once the started/seek event is queued. Not satisfying this MF contract results in the bug this change is trying to address where the stream sends a sample prior to the start event being queued and it is dropped because the stream should not be sending sample requests to the MF pipeline at this time. Hope that makes sense, let me know if you have any further questions!
nit: `CHECK(selected_);` for readibility
As things currently work, selected_ is not guaranteed to be true for this call. In MediaFoundationSourceWrapper::Start(), the presentationDescriptor specifies which streams are selected and while MediaFoundationStreamWrapper is guaranteed to eventually have selected_ be true in this scenario, the selected_ state is not actually set on MediaFoundationStreamWrapper class until just after QueueStartedEvent/QueueSeekedEvent. I could potentially move around the logic to make this CHECK valid and I don't think it would cause issues, but I've generally been trying to keep things the way they were unless relevant to this specific issue. Lmk what you think.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
buffering_post_flush_samples_ = true;Evan WilliamsSorry for the ignorance and I haven't looked at this code for a while. But I am a bit confused about the relationship between MediaFoundationSourceWrapper::Start() and MediaFoundationSourceWrapper::FlushStreams(). The latter is called by the media pipeline during seek, and I guess after that the MediaEngine will call the former. But it doesn't make sense since we set `buffering_post_flush_samples_` to be true in Flush(), but set it to be false when we queue the Started/Seeked events. But we need it to be true to buffer samples. So I am really confused now. Do you mind help explaining the sequence of events once again for my understanding? Thanks!
After the seek/flush, the stream can still receive data it's prepared to send to the MF pipeline, but as part of the MF design it's up to the source to control when that stream data can actually be sent to the pipeline to satisfy MF pipeline sample requests. So the purpose of buffering_post_flush_samples_ is to flag when storing data the stream obtains is required because it cannot send that data until the source allows the flow of samples via MediaFoundationSourceWrapper::Start(). Once the source is started, the special handling to require storing stream data is no longer required and the stream can satisfy any MF pipeline sample requests as they come; this is why buffering_post_flush_samples_ is set false once the started/seek event is queued. Not satisfying this MF contract results in the bug this change is trying to address where the stream sends a sample prior to the start event being queued and it is dropped because the stream should not be sending sample requests to the MF pipeline at this time. Hope that makes sense, let me know if you have any further questions!
Thanks!
I understand why we set `buffering_post_flush_samples_` to false as part of the "Start" process. But I am still confused with the order of operations.
During a seek, MFR::Flush() will first be called, followed by MFR::StartPlayingFrom():
https://source.chromium.org/chromium/chromium/src/+/main:media/base/pipeline_impl.cc;drc=db569cd7847d479358bd391a18b2c28393803333;l=399
MFR::Flush() will call MediaFoundationStreamWrapper::Flush(), which will set `buffering_post_flush_samples_` to be true.
MFR::StartPlayingFrom() will call MediaEngine mf_media_engine_->SetCurrentTime() and mf_media_engine_->Start(), which I suppose will trigger MediaFoundationSourceWrapper::Start(), which will set `buffering_post_flush_samples_` to be false. (I might be wrong. Please correct me!)
If that's the case, when will we set `buffering_post_flush_samples_` to allow sending samples again? I think this is where I am lost. I am sure I am wrong or missing something, and would like to learn! Thanks!
Evan Williamsnit: `CHECK(selected_);` for readibility
As things currently work, selected_ is not guaranteed to be true for this call. In MediaFoundationSourceWrapper::Start(), the presentationDescriptor specifies which streams are selected and while MediaFoundationStreamWrapper is guaranteed to eventually have selected_ be true in this scenario, the selected_ state is not actually set on MediaFoundationStreamWrapper class until just after QueueStartedEvent/QueueSeekedEvent. I could potentially move around the logic to make this CHECK valid and I don't think it would cause issues, but I've generally been trying to keep things the way they were unless relevant to this specific issue. Lmk what you think.
Thanks for pointing this out. I think I got confused with the two different `selected` value.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
buffering_post_flush_samples_ = true;Evan WilliamsSorry for the ignorance and I haven't looked at this code for a while. But I am a bit confused about the relationship between MediaFoundationSourceWrapper::Start() and MediaFoundationSourceWrapper::FlushStreams(). The latter is called by the media pipeline during seek, and I guess after that the MediaEngine will call the former. But it doesn't make sense since we set `buffering_post_flush_samples_` to be true in Flush(), but set it to be false when we queue the Started/Seeked events. But we need it to be true to buffer samples. So I am really confused now. Do you mind help explaining the sequence of events once again for my understanding? Thanks!
Xiaohan WangAfter the seek/flush, the stream can still receive data it's prepared to send to the MF pipeline, but as part of the MF design it's up to the source to control when that stream data can actually be sent to the pipeline to satisfy MF pipeline sample requests. So the purpose of buffering_post_flush_samples_ is to flag when storing data the stream obtains is required because it cannot send that data until the source allows the flow of samples via MediaFoundationSourceWrapper::Start(). Once the source is started, the special handling to require storing stream data is no longer required and the stream can satisfy any MF pipeline sample requests as they come; this is why buffering_post_flush_samples_ is set false once the started/seek event is queued. Not satisfying this MF contract results in the bug this change is trying to address where the stream sends a sample prior to the start event being queued and it is dropped because the stream should not be sending sample requests to the MF pipeline at this time. Hope that makes sense, let me know if you have any further questions!
Thanks!
I understand why we set `buffering_post_flush_samples_` to false as part of the "Start" process. But I am still confused with the order of operations.
During a seek, MFR::Flush() will first be called, followed by MFR::StartPlayingFrom():
https://source.chromium.org/chromium/chromium/src/+/main:media/base/pipeline_impl.cc;drc=db569cd7847d479358bd391a18b2c28393803333;l=399MFR::Flush() will call MediaFoundationStreamWrapper::Flush(), which will set `buffering_post_flush_samples_` to be true.
MFR::StartPlayingFrom() will call MediaEngine mf_media_engine_->SetCurrentTime() and mf_media_engine_->Start(), which I suppose will trigger MediaFoundationSourceWrapper::Start(), which will set `buffering_post_flush_samples_` to be false. (I might be wrong. Please correct me!)
If that's the case, when will we set `buffering_post_flush_samples_` to allow sending samples again? I think this is where I am lost. I am sure I am wrong or missing something, and would like to learn! Thanks!
Your understanding of the sequence of events looks correct. After buffering_post_flush_samples_ is set to false from MediaFoundationSourceWrapper::Start(), we actually don't expect it to be set true again (atleast until the next seek). When this flag is set, it prevents allowing samples from being sent to the MF pipeline, so we don't want it set true anytime after start. While set, buffering_post_flush_samples_ will store the stream data it receives in a dedicated buffer called post_flush_buffers_. Once buffering_post_flush_samples_ is set false, MF pipeline sample requests will be satisfied by pulling data out of post_flush_buffers_ until its empty. At that point, post_flush_buffers_ will no longer be used and sample requests will be satisfied as data becomes available to the stream. Let me know if that adds clarity or if there's some more questions you have
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
I didn't review the tests. The fix LGTM!
buffering_post_flush_samples_ = true;Evan WilliamsSorry for the ignorance and I haven't looked at this code for a while. But I am a bit confused about the relationship between MediaFoundationSourceWrapper::Start() and MediaFoundationSourceWrapper::FlushStreams(). The latter is called by the media pipeline during seek, and I guess after that the MediaEngine will call the former. But it doesn't make sense since we set `buffering_post_flush_samples_` to be true in Flush(), but set it to be false when we queue the Started/Seeked events. But we need it to be true to buffer samples. So I am really confused now. Do you mind help explaining the sequence of events once again for my understanding? Thanks!
Xiaohan WangAfter the seek/flush, the stream can still receive data it's prepared to send to the MF pipeline, but as part of the MF design it's up to the source to control when that stream data can actually be sent to the pipeline to satisfy MF pipeline sample requests. So the purpose of buffering_post_flush_samples_ is to flag when storing data the stream obtains is required because it cannot send that data until the source allows the flow of samples via MediaFoundationSourceWrapper::Start(). Once the source is started, the special handling to require storing stream data is no longer required and the stream can satisfy any MF pipeline sample requests as they come; this is why buffering_post_flush_samples_ is set false once the started/seek event is queued. Not satisfying this MF contract results in the bug this change is trying to address where the stream sends a sample prior to the start event being queued and it is dropped because the stream should not be sending sample requests to the MF pipeline at this time. Hope that makes sense, let me know if you have any further questions!
Evan WilliamsThanks!
I understand why we set `buffering_post_flush_samples_` to false as part of the "Start" process. But I am still confused with the order of operations.
During a seek, MFR::Flush() will first be called, followed by MFR::StartPlayingFrom():
https://source.chromium.org/chromium/chromium/src/+/main:media/base/pipeline_impl.cc;drc=db569cd7847d479358bd391a18b2c28393803333;l=399MFR::Flush() will call MediaFoundationStreamWrapper::Flush(), which will set `buffering_post_flush_samples_` to be true.
MFR::StartPlayingFrom() will call MediaEngine mf_media_engine_->SetCurrentTime() and mf_media_engine_->Start(), which I suppose will trigger MediaFoundationSourceWrapper::Start(), which will set `buffering_post_flush_samples_` to be false. (I might be wrong. Please correct me!)
If that's the case, when will we set `buffering_post_flush_samples_` to allow sending samples again? I think this is where I am lost. I am sure I am wrong or missing something, and would like to learn! Thanks!
Your understanding of the sequence of events looks correct. After buffering_post_flush_samples_ is set to false from MediaFoundationSourceWrapper::Start(), we actually don't expect it to be set true again (atleast until the next seek). When this flag is set, it prevents allowing samples from being sent to the MF pipeline, so we don't want it set true anytime after start. While set, buffering_post_flush_samples_ will store the stream data it receives in a dedicated buffer called post_flush_buffers_. Once buffering_post_flush_samples_ is set false, MF pipeline sample requests will be satisfied by pulling data out of post_flush_buffers_ until its empty. At that point, post_flush_buffers_ will no longer be used and sample requests will be satisfied as data becomes available to the stream. Let me know if that adds clarity or if there's some more questions you have
Ah, I see. Thanks for the explanation! Now I remember how it works. Then the fix is clear.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
lgtm
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
media: Fix race condition resulting in dropped sample
fixing race condition where stream sample is processed before the
stream start event is queued resulting in lost sample.
This CL addresses the race condition by removing the brief window
where the stream flush state is false (allowing sample processing)
but the stream started event has not been sent. With this change,
the flush state and start event occur together while under stream
lock (sample processing requires the lock).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |