Hi!
I'm testing Chronos 3.0.1 at DCOS 1.8 and faced with the problem, that REST API POST and DELETE operations are not responding.
I.e. if try to add new task then the request waits until nginx timeout (60 secs). But the new job appeared later (after 1-2 minutes) in the Chronos tasks list. If I try to delete the task, the situation is the same - request timeout and task deletes after several minutes.
This appeared after several days of adding/removing test jobs. No configs were changed or packages installed since all worked correctly. And all Chronos GET requests work fine and very quick. This is very weird.
Some details:
# journalctl -xeb -u dcos-adminrouter.service
Mar 30 14:08:22 localhost.localdomain nginx[3389]: 2017/03/30 14:08:22 [error] 23860#0:
*266995 upstream timed out (110: Connection timed out) while reading
response header from upstream, client: <client-ip>, server:
dcos.*, request: "DELETE /service/chronos/v1/scheduler/job/job_105551
HTTP/1.1", upstream:
"http://<slave-ip>:30041/v1/scheduler/job/2017033013574578349802_105551",
host: "<master-ip>"
Mar 30 14:08:22 localhost.localdomain nginx[23860]: localhost.localdomain nginx: <client-ip> - - [30/Mar/2017:14:08:22 -0500] "DELETE /service/chronos/v1/scheduler/job/job_105551 HTTP/1.1" 504 189 "-" "curl/7.29.0"
There are no load or heavy tasks running at the cluster right now. Cluster reboot does not solve the problem.
Please, advice.