If the firewall rule allows IAP to connect, it might be an issue with the instance itself. One theory could be the internal firewall rule. Another could be that the instance is not responsive due utilization.
As a first step you can verify
serial port console logs for existing errors on the instance. Rebooting the instance as per this
third party article (we do not endorse but rather as an example) seems to have fixed the issue. You may also try using
gcloud commands in
cloudshell with "verbosity=debug" flag as to get a better idea about at what step it is failing. Another approach is to spin a VM in the same VPC, SSH to it and SSH from it to the troublesome instance through internal IPs (if using gcloud compute SSH command use --internal-ip flag from the new instance) .
This would test two things. The first being connectivity, meaning if you are not able to connect to the fresh instance, then it is connectivity related and reviewing firewall rules again would be beneficial. The second if you are able to connect to the new instance, and if you are not able to connect from it to the troublesome instance, it is most probably an issue with the troublesome VM itself.
As this is more of a technical question, I suggest to post at
serverfault.com as you have a larger community that might help. Posting there, provide all the tests and log output mentioned above. Make sure you redact any personal identifiable information including project IDs and Instance IDs. I hope the above helps.