To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/8182abe359974a51953669a436ee6766%40pittcountync.gov.
As an update, I contacted Microsoft and am now in worse shape than before. A second server started exhibiting the same problems without anyone having done anything on it. I had a tech yesterday AM that escalated it. That tech got disconnected twice while working on it. Just before or after the second disconnect is when the second server started having issues. Had to wait 3 hours to get a different person on the phone and they were definitely not second level. Was doing things like trying to disable IPv6 (which I requested he not do) and disable the firewall. He finally told me to reinstall the update and let him know what happened. I have asked the manager of the 3rd tech to please escalate and have me contacted this morning, as (shockingly) reapplying the update didn't work.
I am hoping my last Exchange server holds out, as I am completely baffled by the second one dying. I have to assume that either it was something the 2nd tech did or there is some setting that get replicated that is the source of the issue. Although, if that is the case, I have no idea why one server is still standing. Things I saw the 2nd tech do include:
While I wait for Microsoft to get back to me, I welcome any input. Assuming that my last server holds out and I see any responses(!).
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/8456e94f130046b3a7362706482f5e08%40smithcons.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/d77b8b81abd14aa28d4c8c1c06f99296%40pittcountync.gov.
Finally got this issue resolved after many, many hours on the phone with Microsoft. We had configured the RPC dynamic port range (as documented at https://support.microsoft.com/en-us/help/154596/how-to-configure-rpc-dynamic-port-allocation-to-work-with-firewalls) across the enterprise as part of an effort to better lock down connectivity between VLANs. I had considered the wisdom in applying this to servers but had proceeded anyway. Since this had been done several weeks back and we are in the middle of several security initiatives, I did not put 2 and 2 together on this one (in other words, I had totally forgotten about it). Basically, this setting went into effect when I rebooted the Exchange Server to apply updates. And, let’s just say that Exchange doesn’t care for this particular setting. After removing these registry keys and rebooting, everything started working again. There was no problem with the Windows update or CU17, all of which got applied to the servers after the issue was resolved.
As always, thanks to those that offered assistance.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/d77b8b81abd14aa28d4c8c1c06f99296%40pittcountync.gov.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/90f2508e49ea4df9842dff2095a69da6%40pittcountync.gov.
How did you ever figure it out?
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/90f2508e49ea4df9842dff2095a69da6%40pittcountync.gov.
They had told me on Friday that they believed it was an OS issue and were submitting a request to the Windows team. They indicated they believed the core problem to be the Exchange AD topology service not starting with "Error 1061: The Service Cannot Accept Control Messages At This Time".
Friday night, I got an email asking “do you have any registry settings at HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc\Internet?” As soon as I looked at that location, I realized what it was and what had happened.
I don’t know how they came to that conclusion, and they didn’t explain any further. It is my assumption that the error message or something else in the setup logs (which had been uploaded) was familiar to someone in Windows support.
While it was a self-inflicted wound and it is hard to be critical of support folks who had no idea that this change had been made, I have to say that it was a very painful experience. I wound up talking to 4 different people, suffered through extremely poor quality audio and having an agent get disconnected twice while working due to “technical issues”, and what seemed to be truly bizarre troubleshooting steps. When you are sitting there at 11:00 PM with 2 servers down, and the tech just keeps hitting restart on a service that won’t start and hasn’t started the last 5 times he did it (without changing anything), it is pretty frustrating. Tack on things like asking me repeatedly, “did you uninstall the other update?”, which I kept exasperatingly responding to with “Windows doesn’t let you uninstall that update. Do you see how there is no uninstall button? If you are able to uninstall it some other way, it is fine”. That particular guy left me at midnight with the instruction to re-install the update that started the whole problem and then just email him back if that didn’t happen to solve the problem.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/5be410e1978943d0b5268a6636fd8e79%40smithcons.com.
Tier 1 support is appalling. Always has been; since it moved out of Charlotte, San Antonio, and Australia. During COVID it’s been even worse because of the poor communications (as you mention) from their home systems.
I generally start a call with a request for escalation. 😊
I empathize greatly.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/625eabd865b64bac8948e55b599232ce%40pittcountync.gov.
I actually escalated twice. The first person to look at it, to his credit, pretty quickly determined it was beyond his abilities and escalated it himself. It was the second guy I was working with, who seemed to have some understanding of what he was doing, that got disconnected twice while he was working on it.
The second time he was disconnected was around 4:20 PM. I got an email from his co-worker at around 4:45 that he had technical issues and would get back with me. I initially responded for him to contact me in the morning, but then a few minutes later we got alerts that a second server was down. I responded back immediately that we now had a second server down and I needed someone to contact me ASAP. I got no response for nearly an hour and I finally decided I had no choice but to call back in and try to get a live person. I didn’t know why the 2nd server went down and I was deeply concerned that the last one would to. I explained the situation, and asked for the next available agent. I basically went back in the queue and had to wait another 2 hours for someone to call me. That person was obviously tier 1 and was the worst experience by far. The first time he called he said he was going to have to call me back because “my voice was coming out of his computer speaker”. It took another 15 minutes for him to call back and begin troubleshooting, none of which I was terribly impressed with. When I left him around midnight with the “reinstall the update and email me”, I knew that I was never going to get anywhere with him. I did as he asked and when, shockingly, that didn’t work, I emailed him supervisor asking to have it escalated and to contact me in the morning. At that point, I figured it didn’t matter if the last server failed, there was nobody I was going to be talking to that night that was going to be able to help.
The person I got on Thursday morning worked with me through the end. While there were “why is he doing that” moments (one I didn’t mention is that 2 of the techs wanted to disable IPv6, which I know from lots of MBS posts was a no-no), he was pleasant enough to deal with and seemed to mostly follow a logical path.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntexchange/371879b8a19148fda80d77dae4d8713e%40smithcons.com.