TestingProfile may destroy the profile directory while it's still used

29 views
Skip to first unread message

Alexandr Ilin

unread,
Jun 7, 2018, 1:33:00 PM6/7/18
to Chromium-dev, scheduler-dev, storage-dev, Egor Pasko, Victor Costan

TL;DR

TestingProfile may destroy the profile directory while some background task still works with files inside of the directory. This leads to obscure flaky disk I/O errors in sqlite code: https://crbug.com/839886, https://crbug.com/839371.


Many tests use TestingProfile with the TestBrowserThreadBundle, for example, all subclasses of ChromeRenderViewHostTestHarness. After a test finishes, TestingProfile is destroyed on the UI thread together with the temporary profile directory, before all background tasks are executed.


This creates a problem because the code dealing with databases that are put into the profile directory usually offloads all disk I/O operations to a background task sequence. Some of such code may be executed in unit tests, often unintentionally. If a disk I/O operation executes after the profile directory is deleted, it will fail. It wouldn't be such a problem if sqlite doesn't treat disk I/O errors as DCHECK errors. I observed a similar problem with LevelDB databases but LevelDB only produces a warning in this case.


I've prepared a CL that deflakes this error and many unit tests start failing on all platforms.


I see a couple of different solutions to this problem:

  • Leave this race condition as is but don't call DCHECK on disk I/O failures in sqlite. This work is tracked here: https://crbug.com/839186.

  • Destroy the testing profile directory only after all background tasks are executed. It can be done for ChromeRenderViewHostTestHarness but it's not clear how to achieve this across all tests because the TestingProfile and the TestBrowserThreadBundle are separate abstractions.

  • Do not execute the DB code in tests that don't expect that. It is hard to do given that tests can pull a lot of dependencies.

  • Inject main thread task runners instead of background task runners for the DB code for testing.

  • Replace on-disk DBs with in-memory DBs for testing.


Any ideas?



Gabriel Charette

unread,
Jun 14, 2018, 4:21:02 PM6/14/18
to Alexandr Ilin, Chromium-dev, Egor Pasko, Victor Costan, scheduler-dev, storage-dev
Invoke TestBrowserThreadBundle::RunUntilIdle() from the TearDown() phase to flush any remaining tasks?

--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAH542OQ1bCq%2BeVimcDvL4nC22krHitGT%2BH%3DsA1bFfwEqMx3hwg%40mail.gmail.com.

Egor Pasko

unread,
Jun 15, 2018, 12:20:40 PM6/15/18
to alex...@chromium.org, chromium-dev, scheduler-dev, stora...@chromium.org, pwn...@chromium.org
On Thu, Jun 7, 2018 at 7:31 PM Alexandr Ilin <alex...@chromium.org> wrote:

TL;DR

TestingProfile may destroy the profile directory while some background task still works with files inside of the directory. This leads to obscure flaky disk I/O errors in sqlite code: https://crbug.com/839886, https://crbug.com/839371.


Many tests use TestingProfile with the TestBrowserThreadBundle, for example, all subclasses of ChromeRenderViewHostTestHarness. After a test finishes, TestingProfile is destroyed on the UI thread together with the temporary profile directory, before all background tasks are executed.


This creates a problem because the code dealing with databases that are put into the profile directory usually offloads all disk I/O operations to a background task sequence. Some of such code may be executed in unit tests, often unintentionally. If a disk I/O operation executes after the profile directory is deleted, it will fail. It wouldn't be such a problem if sqlite doesn't treat disk I/O errors as DCHECK errors. I observed a similar problem with LevelDB databases but LevelDB only produces a warning in this case.


I've prepared a CL that deflakes this error and many unit tests start failing on all platforms.


Oh, that's a really nice way to show about 300 tests flaky for a known reason!

PS: I thought the definition of 'deflake' implies making tests PASS, but this one makes flaky tests FAIL more reproducibly, is that correct?
 

I see a couple of different solutions to this problem:

  • Leave this race condition as is but don't call DCHECK on disk I/O failures in sqlite. This work is tracked here: https://crbug.com/839186.


I think this may cause other errors with these databases, just more subtly.
 
  • Destroy the testing profile directory only after all background tasks are executed. It can be done for ChromeRenderViewHostTestHarness but it's not clear how to achieve this across all tests because the TestingProfile and the TestBrowserThreadBundle are separate abstractions.


Isn't it already done for ChromeRenderViewHostTestHarness by calling thread_bundle_.reset() in RenderViewHostTestHarness::TearDown()? If it works this way, then probably the recommended solution is to use TestBrowserThreadBundle in more places? .. though my understanding of the latter is quite limited.

I see thread_bundle_.RunUntilIdle() in various other teardowns supposedly for the same reason. 

Gabriel Charette

unread,
Jun 19, 2018, 9:09:27 AM6/19/18
to Egor Pasko, alex...@chromium.org, chromium-dev, pwn...@chromium.org, scheduler-dev, stora...@chromium.org


Le ven. 15 juin 2018 12 h 19, Egor Pasko <pa...@chromium.org> a écrit :


On Thu, Jun 7, 2018 at 7:31 PM Alexandr Ilin <alex...@chromium.org> wrote:

TL;DR

TestingProfile may destroy the profile directory while some background task still works with files inside of the directory. This leads to obscure flaky disk I/O errors in sqlite code: https://crbug.com/839886, https://crbug.com/839371.


Many tests use TestingProfile with the TestBrowserThreadBundle, for example, all subclasses of ChromeRenderViewHostTestHarness. After a test finishes, TestingProfile is destroyed on the UI thread together with the temporary profile directory, before all background tasks are executed.


This creates a problem because the code dealing with databases that are put into the profile directory usually offloads all disk I/O operations to a background task sequence. Some of such code may be executed in unit tests, often unintentionally. If a disk I/O operation executes after the profile directory is deleted, it will fail. It wouldn't be such a problem if sqlite doesn't treat disk I/O errors as DCHECK errors. I observed a similar problem with LevelDB databases but LevelDB only produces a warning in this case.


I've prepared a CL that deflakes this error and many unit tests start failing on all platforms.


Oh, that's a really nice way to show about 300 tests flaky for a known reason!

PS: I thought the definition of 'deflake' implies making tests PASS, but this one makes flaky tests FAIL more reproducibly, is that correct?
 

I see a couple of different solutions to this problem:

  • Leave this race condition as is but don't call DCHECK on disk I/O failures in sqlite. This work is tracked here: https://crbug.com/839186.


I think this may cause other errors with these databases, just more subtly.
 
  • Destroy the testing profile directory only after all background tasks are executed. It can be done for ChromeRenderViewHostTestHarness but it's not clear how to achieve this across all tests because the TestingProfile and the TestBrowserThreadBundle are separate abstractions.


Isn't it already done for ChromeRenderViewHostTestHarness by calling thread_bundle_.reset() in RenderViewHostTestHarness::TearDown()? If it works this way, then probably the recommended solution is to use TestBrowserThreadBundle in more places? .. though my understanding of the latter is quite limited.

Yes, it's already done there but the problem here, I think, is that tasks bound Unretained to the TestingProfile are running after it was destroyed (i.e. need to RunUntilIdle() *before* destroying the TestingProfile, but must still overall reset the TestBrowserThreadBundle last because I think ~TestingProfile() assumes a functional task environment...).


I see thread_bundle_.RunUntilIdle() in various other teardowns supposedly for the same reason. 

  • Do not execute the DB code in tests that don't expect that. It is hard to do given that tests can pull a lot of dependencies.

  • Inject main thread task runners instead of background task runners for the DB code for testing.

  • Replace on-disk DBs with in-memory DBs for testing.


Any ideas?


--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
Reply all
Reply to author
Forward
0 new messages