[SyneRBI/SIRF] occasional Bad file descriptor in cGadgetron (#641)

0 views
Skip to first unread message

Kris Thielemans

unread,
Apr 27, 2020, 7:22:08 AM4/27/20
to SyneRBI/SIRF, Subscribed

This job https://travis-ci.org/github/SyneRBI/SIRF-SuperBuild/jobs/679959452#L16334
from SyneRBI/SIRF-SuperBuild#377 (which is a DEVEL build) fails, while others are fine. The error is in the MR test

ERROR: test3.test_main
...
error: ??? "'write: Bad file descriptor' exception caught at line 545 of /Users/travis/build/SyneRBI/SIRF-SuperBuild/sources/SIRF/src/xGadgetron/cGadgetron/cgadgetron.cpp; the reconstruction engine output may provide more information"
-------------------- >> begin captured stdout << ---------------------
File: /Users/travis/build/SyneRBI/SIRF-SuperBuild/INSTALL/python/sirf/Gadgetron.py
Line: 1384
check_status found the following message sent from the engine:

I'll rerun the job, as I guess this won't happen again, but it is worrying nevertheless.

@evgueni-ovtchinnikov any ideas?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

Evgueni Ovtchinnikov

unread,
Apr 27, 2020, 7:43:52 AM4/27/20
to SyneRBI/SIRF, Subscribed

I have seen this error message in Travis logs many times, but the reported error never ever happened locally, so impossible to investigate, I am afraid.

Johannes Mayer

unread,
Apr 29, 2020, 11:02:22 AM4/29/20
to SyneRBI/SIRF, Subscribed

I get this one too from time to time

Kris Thielemans

unread,
Apr 29, 2020, 1:28:06 PM4/29/20
to SyneRBI/SIRF, Subscribed

hmmm. this is going to be tough then. Any ideas for writing some debugging checks and doing a special test-run with 1000 tests and see when it fails?

Richard Brown

unread,
Jun 9, 2020, 4:25:12 AM6/9/20
to SyneRBI/SIRF, Subscribed

@evgueni-ovtchinnikov I haven't looked through the source code, but if this pertains to file writing, could you put it in a for loop? Similar to what you already do for trying to connect to the gadgetron server)?

bool success = false;
unsigned num_attempts = 5;
for (unsigned i=0; i<num_attempts; ++i) {
    try {
         success = do_the_thing_that_causes_the_error();
    }
    catch {}
    if (success) break;
}
if (!success)
    throw std::runtime_error("bad file descriptor");

Evgueni Ovtchinnikov

unread,
Jun 9, 2020, 11:52:21 AM6/9/20
to SyneRBI/SIRF, Subscribed

@johannesmayer: if you get this error when running your mrtest.cpp, then one possible culprit is your MRAcquisitionData::read, where you create ISMRMRD::Dataset and call its methods readHeader, getNumberOfAcquisitions and readAcquisition without Mutex locking/unlocking.

I have very little idea what Mutex does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition.

@rijobro: what you suggest looks like papering over the crack, I am afraid. I would try to investigate a bit more before resorting to your fallback.

Evgueni Ovtchinnikov

unread,
Jun 10, 2020, 7:34:35 AM6/10/20
to SyneRBI/SIRF, Subscribed

added missing mutex locks/unlocks, HTH

Richard Brown

unread,
Jun 10, 2020, 7:46:22 AM6/10/20
to SyneRBI/SIRF, Subscribed

I have very little idea what Mutex does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition.

Mutex is used to stop multiple threads accessing the same files/variables simultaneously, leading to data races, etc.

So it could well be that missing mutex's solve the problem. Thanks.

Richard Brown

unread,
Jul 1, 2020, 2:32:34 PM7/1/20
to SyneRBI/SIRF, Subscribed

Bug still persisting (PR from today): https://travis-ci.org/github/SyneRBI/SIRF/jobs/703951360#L28836

Reply all
Reply to author
Forward
0 new messages