Hi Kristian,
Thanks very much for the helpful reply, as always!
This issue happens intermittently. Whenever we recruit a batch of participants, most people can do the whole experiment just fine, but a few people report that the study stopped/froze. This typically happens at the start of a new component. There's no systematic pattern in the browsers that people are using when the study freezes. This makes me think that it's an issue with the server or the internet connection, rather than a problem with the study's scripts (but I could be wrong).
I would agree that it could just be the usual internet stutter, except there are a few things that concern me. One is that this tends to happen in 'batches', i.e. multiple people will report the same problem around the same time. Another issue is that often people will try to reload the study component multiple times before giving up. The fact that they can reload the component means that they still have internet access, yet the component still doesn't load.
We do load a large set of audio files at the start of some components, so maybe this is causing problems. But even so, I still don't know how to figure out exactly what the problem is, e.g. the server can't handle the requests, the worker's browser storage limits are exceeded, etc. Any suggestions for how to figure this out?
No I don't see anything unusual in the JATOS log around that same time. JATOS didn't restart and there were no warnings. Here are the relevant lines for this worker when they were unable to load the component:
2018-02-28 16:07:29,318 [INFO] - publix_access - PUT /publix/31/166/resultData?srid=816
2018-02-28 16:07:29,318 [INFO] - c.p.Publix - .submitResultData: studyId 31, componentId 166, studyResultId 816
2018-02-28 16:07:29,397 [INFO] - publix_access - POST /publix/31/studySessionData?srid=816
2018-02-28 16:07:29,398 [INFO] - c.p.Publix - .setStudySessionData: studyId 31, studyResultId 816
2018-02-28 16:07:29,432 [INFO] - publix_access - GET /publix/31/nextComponent/start?srid=816
2018-02-28 16:07:29,432 [INFO] - c.p.Publix - .startNextComponent: studyId 31, studyResultId 816
2018-02-28 16:07:29,469 [INFO] - publix_access - GET /publix/31/167/start?srid=816
2018-02-28 16:07:29,469 [INFO] - c.p.Publix - .startComponent: studyId 31, componentId 167, studyResultId 816
2018-02-28 16:07:29,897 [INFO] - publix_access - GET /publix/31/167/initData?srid=816
2018-02-28 16:07:29,898 [INFO] - c.p.Publix - .getInitData: studyId 31, componentId 167, studyResultId 816
2018-02-28 16:07:29,910 [INFO] - publix_access - POST /publix/31/heartbeat?srid=816
2018-02-28 16:07:30,023 [INFO] - c.p.GeneralSingleBatchChannel - .open: studyId 31, studyResultId 816
It looks to me like everything was fine and this worker should have started component 167 (I don't know what the very last line is reporting - GeneralSingleBatchChannel - but I assume this isn't an error). So I'm not sure how to debug any further, because I can't find any errors...
By the way, we have had other reports that a study stopped/froze during a component. However in these cases I don't think the problem could be with the server or internet connection, because the study should simply run locally once all of the resources have been loaded at the start. So it must be a problem with the study code, right? Or is it the case that any jatos functions called during a component (e.g. jatos.appendResultData, heartbeat) could cause the study to stop unexpectedly if they fail?
Thanks very much for letting me know how to send a link to allow a worker to continue a study run - that all makes sense and is very useful!
Best wishes,
Becky