This job https://travis-ci.org/github/SyneRBI/SIRF-SuperBuild/jobs/679959452#L16334
from SyneRBI/SIRF-SuperBuild#377 (which is a DEVEL
build) fails, while others are fine. The error is in the MR test
ERROR: test3.test_main
...
error: ??? "'write: Bad file descriptor' exception caught at line 545 of /Users/travis/build/SyneRBI/SIRF-SuperBuild/sources/SIRF/src/xGadgetron/cGadgetron/cgadgetron.cpp; the reconstruction engine output may provide more information"
-------------------- >> begin captured stdout << ---------------------
File: /Users/travis/build/SyneRBI/SIRF-SuperBuild/INSTALL/python/sirf/Gadgetron.py
Line: 1384
check_status found the following message sent from the engine:
I'll rerun the job, as I guess this won't happen again, but it is worrying nevertheless.
@evgueni-ovtchinnikov any ideas?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
I have seen this error message in Travis logs many times, but the reported error never ever happened locally, so impossible to investigate, I am afraid.
I get this one too from time to time
hmmm. this is going to be tough then. Any ideas for writing some debugging checks and doing a special test-run with 1000 tests and see when it fails?
@evgueni-ovtchinnikov I haven't looked through the source code, but if this pertains to file writing, could you put it in a for loop? Similar to what you already do for trying to connect to the gadgetron server)?
bool success = false;
unsigned num_attempts = 5;
for (unsigned i=0; i<num_attempts; ++i) {
try {
success = do_the_thing_that_causes_the_error();
}
catch {}
if (success) break;
}
if (!success)
throw std::runtime_error("bad file descriptor");
@johannesmayer: if you get this error when running your mrtest.cpp
, then one possible culprit is your MRAcquisitionData::read
, where you create ISMRMRD::Dataset
and call its methods readHeader
, getNumberOfAcquisitions
and readAcquisition
without Mutex
locking/unlocking.
I have very little idea what Mutex
does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition
.
@rijobro: what you suggest looks like papering over the crack, I am afraid. I would try to investigate a bit more before resorting to your fallback.
added missing mutex locks/unlocks, HTH
I have very little idea what Mutex does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition.
Mutex is used to stop multiple threads accessing the same files/variables simultaneously, leading to data races, etc.
So it could well be that missing mutex's solve the problem. Thanks.
Bug still persisting (PR from today): https://travis-ci.org/github/SyneRBI/SIRF/jobs/703951360#L28836