Hi David,
To be honest I am not sure what might be happening here.
The only edge case I can think of is that your job completes, but hits the walltime while FWS is communicating with the database to update the state to COMPLETED which occurs after the job completes. Typically, the database communication to update the state would only take a few seconds maximum, so the chances of your job completing but then hitting walltime during FWS communication would be quite small.
One thing you could do to try to debug would be to see how long the database communication might be taking. For example, pick a COMPLETED job and examine its Launch object in the database, particularly the timestamps that show you when the job started RUNNING and when it was tagged as COMPLETED. Then compare that time to the actual or expected runtime of your job (if you have that somewhere). If there is a big discrepancy it could be an indicator that the database is taking way to way too long to update for your job, and hitting walltime in the middle of the update.
Do you happen to have very large workflows (e.g. 1000 FWS or more)? I could see this perhaps being a bigger problem as the workflows get larger, although I think we have done a lot recently to speed up database updates of large workflows.
Note that there is no way to explicitly mark a FW as completed. While if you are desperate and risk-taking you could try manually calling the Launchpad.complete_launch() method, I wouldn't really recommend this and suggest you try fixing the underlying problem.