With that said, I am writing this post to vent about how Exchange Online, in its current implementation, is not enterprise worthy. This is because there are many crippling flaws with Exchange Online, as it is coupled with Outlook. And these flaws are not happen-chance bugs, but rather massive design flaws akin to Catch 22.
If you do not keep the cache to well below that limit, you are going to have ample crashes, and then the slow agonizing process of rebuilding occurs. If your mailboxes surpass that 50gb+ limit, then the OST is likely to crash repeatedly.
In fact, because of the saas model of Office, I have noticed that during some of those monthly update cycles, the OST and\or the Index become corrupt for heavy users. And that is with the semiannual channel, which is the most deferred channel of updates allowed. The only way to defer these updates more than this, would be to manually manage them and shut auto updates off. But that in itself is like a full-time job.
Another problem with Exchange Online is the fact that you cannot use Outlook in online mode. The performance limitations due to latency and enforced quotas renders online mode unusable. That is why in a VDI and multi user environment, I was relegated to using a 1 month cache for Outlook. And that cache was relocated to a dfs share. This is done using this regkey:
Another problem with Exchange Online is the fact that you do not really know where your mailboxes are hosted. Microsoft automatically determines where they think it is best hosted. Their idea of best, as you can imagine, is probably defined by their own self-interested goals. They might position you far away, for the sake of their own load balancing strategies, and you may incur problems as a result. And they think that they can just sort of sweep these complaints under the rug by throwing the blame on your specific configuration or mail usage patterns.
Still, it is always better to have your data hosted as close to you as possible, to limit performance problems. Also, not every datacenter is created equal. Yes, they do standardize their equipment and procedures, but I would rather have my mail hosted in Washington or San Francisco, rather than in Iowa. Because it is probably better managed. These centers could be better staffed, better run, more likely to have newer\faster equipment. Who knows.
There is a fantastic script written by, Joe Palarchio that will give you a generic report of the number of mailboxes, and their locations. To give you an idea of how fragmented your services really are. You can find his post here: -365-script-to-determine-exchange-online-mailbox-location/
If you are having problems with a mailbox, find where the mailbox is located, and if it happens to be far away, then perform a move. On that move, the mailbox will be checked for errors and corruption. So that should kill two birds with one stone.
Anyways, once you kick off a generic New-MoveRequest us...@domain.tld command, you ought to check the target server that Microsoft picks for you, so that it is close enough for your taste. This is done by running the Get-MoveRequestStatistics cmdlet. Like so,
The last workaround I want to mention is an old configuration, where only headers will be downloaded. You can set Outlook to header-only mode. This is useful, if you intend on caching the entire mailbox for searchability. The trade off is that day-to-day use is a bit annoying, because each message has to be opened in order to be marked as read. They will not automatically be marked as read, if selected in the preview pane. To get around that hassle, I wrote a VSTO add-in in C# that would download all new messages as full items upon click; while leaving the rest of the mailbox in header-only mode. So long as you do not intend on performing deep, specific searches, this should work to get any message by date\to\from and any other metadata (anything contained in the header). This turned 70gb mailboxes down to 4gb in OST size.
Limited search capabilities, Limited availability depending on the mail app you use on mobile devices, and things like that. It really depends on what your clients need and where/how they expect to use exchange online. But those are two examples that I remember.
We did some maintenance on the cluster, and then attempted to fail over. After waiting about 3 minutes I decided to check (as a healthy cluster should fail over within 60-90 seconds). Afterwards, I saw three DB that refused to mount.
Not really a warm fuzzy feeling, so I decided to fail back over to SCC-Node1. When that occurred I attempted to mount both of the databases that refused to mount (within Cluster Admin utility by bringing the IS Instance online) and failed. Once I looked within the Event Viewer: Application logs I saw a slew of these error messages:
Before fully understanding why this occurred you should understand how dbTime works and how the Information Store works. There is a good article here about the Information Store process, but to sum it up there is a Log Buffer that is within the memory. To help reduce I/O on your disk subsystem, transactions are written to the Log Buffer. The Log Buffer (being 1MB in size, since transaction logs are that size) then commits to a log when one of two things happen:
As for the dbTime, when a new log is generated a unique database time is generated based off the log generation time. The dbTime can be higher than the actual current log time, which is normal. The dbTime is utilized more as a counter than an actual time stamp.
What it looked like was that when SCC-Node1 failed over to SCC-Node2, the disk subsystem was overwhelmed along with the network backbone for the SAN, thus causing a mismatch in the time. If you look at Error -566 utilizing the Exchange Error Tool you would see that this theory does match up to the actual error:
Once this is completed you should have no errors. Check out the status of your DB utilizing ESEUTIL /MH. The db should show a clean shutdown state with 0x0 logs required (meaning there are no logs that are required for the db to replay):
So the situation I wrote about above covers when the logs are not corrupt, but the dbTime is off. It is kind of weird you are getting this when attempting ESEUTIL /R but here are a few pointers (exchange 2007+)..
Run ESEUTIL /ML against the logs. This should check the checksum to ensure nothing is corrupt. If everything is ok, then I would simply move the checkpoint to a different location and then run this..
The /A will allow you to essentially replay the database with the logs available, and assume a loss for the logs that are unable to replay. It is cleaner than ESEUTIL /P but you are still assuming a loss of data.
Hey, I ran the /r command and it went through successful but the database is still suspended and failed. Are the logs supposed to replay by themselves because they are not. Should I use the /a switch this is part of a DAG so should the other DAG members replicate the logs? I dont want to run update which means I have to re-seed the entire database over the WAN which takes forever.
c80f0f1006