I hinted at file growths in my previous posts about shrinking data and log files. Then I talked about growing log files in my post Database Log VLFs. However, there are still some unanswered questions about growing data files.
There are competing factors as to how much you want to autogrowth to be. The only real consensus is that the defaults are horrible. You need to balance out how long of a pause it will take for a growth, how long of a pause your users will tolerate, how fast is your database growing, how much fragmentation smaller growths will cause, and how much fragmentation you will tolerate.
The partial answer is to say how much data was used before, how much data is being used now, and predict how much data will be used in the future using that trend. A better answer is to say what the peak usage was over a week and compare that to past weeks, which would account for things that cause spikes in file usage such as index rebuilds or data loads.
Your production servers should have some kind of monitoring software in place, and the monitoring software is already capturing this type of information to help make all the pretty charts and graphs you see through the GUI. You need this software to run so you can get alerted when things go wrong, and pulling this data will just make it more valuable.
For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool. Other monitoring tools are going to collect the same basic data in a database you can read.
Speaking of this not being perfect, it double-dips. If there are two databases growing on the same drive with 10 GB free, it says that each one has 10 GB of growth available. I was not able to rewrite this to take that into account while also accounting for databases with multiple files in a single filegroup spread across multiple drives.
Recompiles can be a hidden bottleneck on your server, and it may not be too obvious. I should know, it happened to me. Unfortunately for me, that was before I was a proponent of tracing or using XEvents to watch for recompiles. Fortunately for you, the experience changed me, and I enjoy sharing my experiences.
I got into it, reworded the query a little, created a perfect index for it, and it got it down to 0.100 seconds to run. This was a large concession on my part because I HATE perfect indexes for a single query, and feel strongly that indexes should focus more on being reusable by many queries. However, even that only got the web page to load in 10 seconds.
What I saw was a response that stuck with me, a lot. Management stepped back to recognize a difference someone made, and made sure they knew they were appreciated. Then took it a step further bringing the family into it, having my wife feeling proud and appreciated at the same time.
This post is about recompiles, how they can drag you down, and how you can find out about them. I covered how they drug me down, and in past posts I casually mentioned you should use server-side traces or XEvents to monitor them. Then I left it up to you to figure out how. Lets try a different approach today.
Your production servers should have some kind of monitoring software in place, and the monitoring software is capturing this type of information to help make all the pretty charts and graphs you see through the GUI. You need this software to run so you can get alerted when things go wrong, but pulling this data will just make in more valuable.
That being said, the true value of this information is when you combine it with the type of server it is, the reasonable expectations of the users, the amount of time the server is waiting on I/O, and so much more. To start you out, look at the Wait Stats Baseline Comparison using Idera, too.
These numbers are going to have natural variances to them. Did a user run a couple large queries during one interval but not another? Was the shared storage under different loads? Really anything can come up. The longer the period the more you smooth out these numbers.
Files in SQL Server need to grow as the database grows, and in very specific circumstances need to be shrunk as well with additional maintenance. There is quite a bit different between log files and data files, so I created a separate post for Shrinking Database Data Files.
The final reason is because you have too many VLFs. When a log file is created or grows it splits up the new space into smaller chunks called Virtual Log Files that SQL Server will cycle through when writing logs. If the database is growing in small increments it creates tons of these, slowing SQL Server down. The point here is that you resolve the issue by shrinking the log (delete smaller VLFs), then grow the log to the desired size in larger chunks (create bigger VLFs). The details on finding what size your VLFs are, what size new VLFs will be, and just about everything else VLF related is in my post Database Log VLFs.
Tim Ford (bt) noticed how we moved to advanced topics and asked us to remember where we came from in his Entry-Level Content Challenge. This is my second post to take on his challenge, and I hope others will take him up on this!
Each change made to the database is assigned a Log Sequence Number (LSN), and SQL Server keeps track of the oldest LSN it still needs for any purpose. This purpose can be many things, but is typically the oldest LSN of the following:
If the last LSN in a VLF is older than what you need to keep, that VLF can be reused. This makes is very efficient because this becomes something like a set-based operation for SQL Server, the entire VLF is cleared and marked for reuse at once instead of line-by-line.
To help you picture how the VLFs work, someone at Microsoft drew you a picture on Transaction Log Physical Architecture. Then you can look at the results of DBCC LogInfo, and it will make a lot more sense when you see a VLF on each line there along with its status.
In addition to this, VLFs each need to be processed when restoring or recovering a database. Recovering is part of restarting SQL services, so you should be hitting this issue at least once a month with your Windows Updates. Every VLF adds some overhead to this process, and a huge number adds a lot of overhead to lengthen this process.
I have a script for that. This is set to filter only logs that I want to look at, but you can comment out there WHERE clause on the final statement to see it all. Also, it would be extremely rare for me to look at individual VLFs, so this is only looking at the sums and averages for each file.
By default, every database is based off of model, which has a 1 MB log files growing in 10% increments. So 1/10th of a MB for the first growth, which is just a tiny fragment. If you leave it this way on large databases you could end up with a 1 TB log trying to grow 100 GB at a time, which your users will notice.
For every database EXCEPT TempDB, this means shrinking the logs (it deletes VLFs to get to the size you want), then growing the logs again. Both manual and automatic growths will split the new physical space into VLFs, but that depends on your version of SQL Server.
Grahm Kent (bt) ran tests on the performance differences between two log files in his post Slow recovery times and slow performance due to large numbers of Virtual Log Files. Both were 5 GB, but one was 16 VLFs (320 MB each) while the other was 20,480 VLFs (0.25 MB each). It was good to see transaction performance as well as recovery performance covered.
If your database is absolutely critical to the point that squeezing an extra couple milliseconds of performance throughout the day would be noticed or if a server coming back online 15 seconds faster would save thousands of dollars, test it. See how it performs in load tests and service restarts using your test servers and your database.
I see TempDB using more memory than I feel it should and found a way to resolve it. Previously I dove into technical details and questioning what I saw in my post TempDB Memory Leak? which is great for proving and debugging an issue. Now I want to step back and talk about it from a practical perspective.
My recommendation is to take the peak size outside of a one-time process and make the total TempDB data files at least 150% of that size. If you have 4 TempDB data files and the peak size used is 4 GB, 150% of that is 6 GB, so make each file at least 1.5 GB. Then set autogrowth to something reasonable because one-time processes will happen.
All of those reads caused lots of waits on the server. In my 12-hour sample period I saw a total of 34,000. The units shown for this field are fine for comparison reasons, but they appear to be a total of the ms/s on the chart for values collected every 6 minutes, not the total waits. You have to query the tables behind Idera Diagnostic Manager to get the real values. In this case, the total waits were 138 minutes for shared and 49 for exclusive latches, for a total of 187 minutes of waiting for the 12 hours.
All bloggers started as beginners. We picked things up, learned them, and got to the point where we wanted to start sharing our knowledge. The problem is that most of us moved on to talking about advanced topics and left people to fend for themselves on the difficult entry-level learning curve. My blog, Simple SQL Server, was actually created with me saying I was going to fill that gap, right up until the time I figured out the best way to learn advanced topics was to write things down as I learned.
93ddb68554