Raimonds,
Those 504 "
electric Charlie" errors are caused when the request takes too long, and the edge proxy times out the request. (There is are various ngnix/apache proxies in front of the actual JIRA instance). The timeout is 90s.
Why is JIRA timing out? It may be that the JQL request is just taking too long, but otherwise JIRA is healthy. That could "naturally" happen on some instances with complicated permissions schemes. It would also happen if the instance was being restarted. But I would be surprised if it would happen very often with maxResults=500. If you were doing a lot of requests in parallel, then maybe that extra load is causing each request to take longer.
The other reason is the JIRA instance is not healthy. Raising a
support.atlassian.com request for these situations will allow us to collect information for those specific instances and maybe track down the problem further. Get the customer to tell the support stuff their instance, the JQL that is running, and the user the JQL is running as. Also the time that the errors occurred.
In either case: this is probably a good opportunity to "embrace failure". Getting a 504 might be rare, but it is normal. If you are executing in parallel and get a failure, you should probably "back off" your parallelism and maybe introduce some delay between requests. In fact a "ramp up" mechanism is probably a good idea. Start by making sequential requests, and if there are lots of results to get, gradually introduce parallelism, but on error or slowness back off. Effectively: you are introducing parallelism to increase throughput. Therefore: measure your throughput (number of issues fetched per minute) and adaptively decrease parallelism if the throughput is actually going down.
=Matt