[llvm-dev] [lit] check-all hanging

213 views
Skip to first unread message

David Greene via llvm-dev

unread,
Jan 2, 2019, 1:09:13 PM1/2/19
to llvm...@lists.llvm.org
Hi,

From time to time, I see check-all hang during running of lit tests.
The hang always happens at the > 90% completion stage and I'm pretty
sure all tests have been run and check-all is just waiting for
lit/python to exit. I see a single python processing running, taking
very little CPU time. An strace of that process shows this:

select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32168}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
futex(0x3bcc8c0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3bcc8c0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = 0
futex(0x3bcc8c0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = -1 EAGAIN (Resourc
e temporarily unavailable)
futex(0x3bcc8c0, FUTEX_WAKE_PRIVATE, 1) = 1
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
futex(0x3bcc8c0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = -1 EAGAIN (Resourc
e temporarily unavailable)
futex(0x3bcc8c0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3bcc8c0, FUTEX_WAKE_PRIVATE, 1) = 1
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)

It appears that python is waiting for some I/O or something which never
appears.

Has anyone else seen this before? Any ideas of what is going on or how
to fix it?

-David
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chandler Carruth via llvm-dev

unread,
Jan 2, 2019, 4:41:51 PM1/2/19
to David Greene, llvm...@lists.llvm.org
What you're seeing is just the fact that lit is waiting on subprocesses (select is waiting on the pipes i suspect).

Anyways, you'll need to dig into what it is waiting on, and what *that* process is doing that is stuck to make progress.

I've not seen anything like this, but I basically never run `check-all` these days because LLDB and sanitizer tests are too flaky. =[ I've not been able to interest anyone in fixing this either sadly.

Joel E. Denny via llvm-dev

unread,
Jan 2, 2019, 5:05:45 PM1/2/19
to Chandler Carruth, David Greene, llvm...@lists.llvm.org
Hi David, Chandler,

I see lldb tests hang often, and then I kill the dotest process.

I'd like to stop running check-all too, but I feel it's important when I modify FileCheck.  The flakiness that Chandler mentioned makes it time-consuming to verify test results.

Joel

Chandler Carruth via llvm-dev

unread,
Jan 2, 2019, 5:51:16 PM1/2/19
to Joel E. Denny, David Greene, llvm...@lists.llvm.org
Might be worth reporting this on the lldb list?

Kuba Mracek via llvm-dev

unread,
Jan 2, 2019, 5:51:59 PM1/2/19
to Joel E. Denny, Joel E. Denny via llvm-dev, Frédéric Riss, Chandler Carruth, David Greene
+Fred, +me

For LLDB tests: I believe this got much much better recently. Are you still seeing flaky LLDB tests? Any details you can share?
For sanitizer tests: I'm very much interesting in removing flakiness as well. Any specific tests you see as flaky?

Kuba

Joel E. Denny via llvm-dev

unread,
Jan 3, 2019, 10:33:25 AM1/3/19
to Kuba Mracek, Joel E. Denny via llvm-dev, David Greene
All,

Thanks for the replies.  Kuba: For LLDB, when were things expected to have improved?  It's possible things improved for me at some point, but this isn't something I've found time to track carefully, and I still see problems.

I ran check-all a couple of times last night at r350238, which I pulled yesterday.  Here are the results:

```
********************
Testing Time: 5043.24s
********************
Unexpected Passing Tests (2):
    lldb-Suite :: functionalities/asan/TestMemoryHistory.py
    lldb-Suite :: functionalities/asan/TestReportData.py

********************
Failing Tests (54):
    Clang :: CXX/modules-ts/basic/basic.link/p2/module.cpp
    Clang :: Modules/ExtDebugInfo.cpp
    Clang :: Modules/using-directive-redecl.cpp
    Clang :: Modules/using-directive.cpp
    Clang :: PCH/chain-late-anonymous-namespace.cpp
    Clang :: PCH/cxx-namespaces.cpp
    Clang :: PCH/namespaces.cpp
    LLDB :: ExecControl/StopHook/stop-hook-threads.test
    LeakSanitizer-AddressSanitizer-x86_64 :: TestCases/Linux/use_tls_dynamic.cc
    LeakSanitizer-Standalone-x86_64 :: TestCases/Linux/use_tls_dynamic.cc
    MemorySanitizer-X86_64 :: dtls_test.c
    MemorySanitizer-lld-X86_64 :: dtls_test.c
    lldb-Suite :: functionalities/register/register_command/TestRegisters.py
    lldb-Suite :: tools/lldb-server/TestGdbRemoteRegisterState.py
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestBorrowedReferences
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestDictionaryResolutionWithDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestExtractingUInt64ThroughStructuredData
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestGlobalNameResolutionNoDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestGlobalNameResolutionWithDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestInstanceNameResolutionNoDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestModuleNameResolutionNoDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestObjectAttributes
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestOwnedReferences
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonByteArray
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonBytes
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonCallableCheck
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonCallableInvoke
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonDictionaryManipulation
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonDictionaryToStructuredDictionary
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonDictionaryValueEquality
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonFile
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonInteger
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonIntegerToStr
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonIntegerToStructuredInteger
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonListManipulation
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonListToStructuredList
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonListValueEquality
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonString
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonStringToStr
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonStringToStructuredString
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonTupleInitializerList
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonTupleInitializerList2
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonTupleSize
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonTupleToStructuredList
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestPythonTupleValues
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestResetting
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonDataObjectsTest.TestTypeNameResolutionNoDot
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestAcquisitionSemantics
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestAutoRestoreChanged
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestAutoRestoreSemantics
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestDiscardSemantics
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestExceptionStateChecking
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestManualRestoreSemantics
    lldb-Unit :: ScriptInterpreter/Python/./ScriptInterpreterPythonTests/PythonExceptionStateTest.TestResetSemantics

  Expected Passes    : 57489
  Expected Failures  : 276
  Unsupported Tests  : 1883
  Unexpected Passes  : 2
  Unexpected Failures: 54

14 warning(s) in tests.
FAILED: CMakeFiles/check-all
```

I immediately ran it again and saw one new unexpected fail:

```
    lldb-Suite :: tools/lldb-mi/syntax/TestMiSyntax.py
```

and one new unresolved test:

```
    lldb-Suite :: tools/lldb-vscode/breakpoint/TestVSCode_setBreakpoints.py
```

On the second run but not the first, it hung all night long waiting for TestVSCode_setBreakpoints.py to terminate.  I killed dotest.py to get the final results.

I currently clone <https://github.com/llvm-project/llvm-project-20170507>.  I configure with `BUILD_SHARED_LIBS=true` and `-DLLVM_ENABLE_PROJECTS='clang;openmp;libcxx;libcxxabi;lldb;compiler-rt;lld;polly'`, among other options.  I have to run check-all with LD_LIBRARY_PATH pointing at my build's lib directory, or there are many more LLDB failures.  I believe that's not true for most test suites.  I'm building and testing under Ubuntu 18.04.1.

Hope that helps.  I'm happy to provide more details.  Just tell me where you'd like to start.

Thanks.

Joel

Kuba Mracek via llvm-dev

unread,
Jan 3, 2019, 10:53:47 AM1/3/19
to Joel E. Denny, fr...@apple.com, k...@google.com, Joel E. Denny via llvm-dev, David Greene
+Fred, +Kostya

Sent from my iPhone

David Greene via llvm-dev

unread,
Jan 3, 2019, 4:01:16 PM1/3/19
to Joel E. Denny, llvm...@lists.llvm.org
We're not running lldb tests, so something else is going on. I'll dig
into it.

-David

David Greene via llvm-dev

unread,
Jan 3, 2019, 4:21:23 PM1/3/19
to Chandler Carruth, llvm...@lists.llvm.org, Alexey Samsonov
Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> writes:

> What you're seeing is just the fact that lit is waiting on
> subprocesses (select is waiting on the pipes i suspect).

Right. Some digging revealed that it is waiting on
getline_nohang.cc.tmp, a tsan test.

I see that this test has been disabled for NetBSD, due to it sometimes
failing. I'm seeing the same on Linux.

How can we stabilize the sanitizer tests so that check-all can work
reliably? If some sanitizer tests are so flaky, I should think they
should be marked UNSUPPORTED. Who has the authority to make those
determinations?

Kuba Mracek via llvm-dev

unread,
Jan 3, 2019, 5:54:26 PM1/3/19
to David Greene, David Greene via llvm-dev, Dmitry Vyukov, Alexey Samsonov

> On Jan 3, 2019, at 1:21 PM, David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> writes:
>
>> What you're seeing is just the fact that lit is waiting on
>> subprocesses (select is waiting on the pipes i suspect).
>
> Right. Some digging revealed that it is waiting on
> getline_nohang.cc.tmp, a tsan test.
>
> I see that this test has been disabled for NetBSD, due to it sometimes
> failing. I'm seeing the same on Linux.
>
> How can we stabilize the sanitizer tests so that check-all can work
> reliably? If some sanitizer tests are so flaky, I should think they
> should be marked UNSUPPORTED. Who has the authority to make those
> determinations?

Dmitry Vyukov does. CC'ing him.

Kuba

Dmitry Vyukov via llvm-dev

unread,
Jan 4, 2019, 2:19:11 AM1/4/19
to Kuba Mracek, LLVM Dev, thread-sanitizer, David Greene, Alexey Samsonov
On Thu, Jan 3, 2019 at 11:54 PM Kuba Mracek <mra...@apple.com> wrote:
>
>
>
> > On Jan 3, 2019, at 1:21 PM, David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
> >
> > Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> writes:
> >
> >> What you're seeing is just the fact that lit is waiting on
> >> subprocesses (select is waiting on the pipes i suspect).
> >
> > Right. Some digging revealed that it is waiting on
> > getline_nohang.cc.tmp, a tsan test.
> >
> > I see that this test has been disabled for NetBSD, due to it sometimes
> > failing. I'm seeing the same on Linux.
> >
> > How can we stabilize the sanitizer tests so that check-all can work
> > reliably? If some sanitizer tests are so flaky, I should think they
> > should be marked UNSUPPORTED. Who has the authority to make those
> > determinations?
>
> Dmitry Vyukov does. CC'ing him.


Are there any special repro instructions? I am running all tsan tests
periodically on linux and none of them flakes.

David Greene via llvm-dev

unread,
Jan 4, 2019, 11:55:06 AM1/4/19
to Dmitry Vyukov, LLVM Dev, thread-sanitizer, Kuba Mracek, Alexey Samsonov
Dmitry Vyukov <dvy...@google.com> writes:

> Are there any special repro instructions? I am running all tsan tests
> periodically on linux and none of them flakes.

I don't think I'm doing anything especially interesting. I wonder if
lit parallelism has anything to do with it. I tend to run quite wide
(32 or more).

I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters.
I see this test hang pretty frequently.

-David

Dmitry Vyukov via llvm-dev

unread,
Jan 4, 2019, 12:17:08 PM1/4/19
to David Greene, LLVM Dev, thread-sanitizer, Kuba Mracek
On Fri, Jan 4, 2019 at 5:55 PM David Greene <d...@cray.com> wrote:
>
> Dmitry Vyukov <dvy...@google.com> writes:
>
> > Are there any special repro instructions? I am running all tsan tests
> > periodically on linux and none of them flakes.
>
> I don't think I'm doing anything especially interesting. I wonder if
> lit parallelism has anything to do with it. I tend to run quite wide
> (32 or more).
>
> I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters.
> I see this test hang pretty frequently.

Hi David,

The test is specifically a regression test for a deadlock:

// Make sure TSan doesn't deadlock on a file stream lock at program shutdown.
// See https://github.com/google/sanitizers/issues/454

So I wonder if it's not completely fixed.
I am sure it does not reproduce on my machine:

$ clang++ getline_nohang.cc -fsanitize=thread -O1 -g
$ stress ./a.out
192 runs so far, 0 failures
...
17137 runs so far, 0 failures
17377 runs so far, 0 failures

Could you please attach to the hanged process with gdb and do
backtrace of all threads?

Reply all
Reply to author
Forward
0 new messages