Im currently getting into PerfView for performance analysis for my (C#) apps.But typically those apps use a lot of database calls.So I asked myself questions like:- How much time is spent in Repositories?- (How much time is spent waiting for SQL Queries to return?) -> I don't know if this is even possible to discover with PerfView
But from my traces I get barely any useful results. In the "Any Stacks" View it tells me (when I use grouping on my Repository) that 1,5 seconds are spent in my Repsoitory (the whole call is about 45 seconds). And i know this is not really true, because the repositories calls the database A LOT.
Is it just that CPU metric is not captured when waiting for SQL Queries to complete because CPU has nothing to do in this period of time and therefore my times are just including data transformation times etc in the repository?
What i missed is turning on thread times option to get times of blocked code (which is what's happening during database calls i suppose). I got all the stacks now, just have filter out the uninteresting things. But i don't seem to get anywhere.
What's especially interesting for me when using "Thread Time" is the BLOCKED_TIME. But with it the times are off i think. When you look at the screenshot, it tells me that CPU_TIME is 28,384. Which is milliseconds (afaik), but BLOCKED_TIME is 2,314,732, which can't be milliseconds. So percentage for CPU_TIME is very low with 1.2% but 28 out of 70 seconds are still a lot. So the Inclusive Percentage time is comparing apples and oranges here. Can somebody explain?
What I missed (and Vance Morrison was explaining it in his video tutorial actually) is: When doing a wall clock time analysis with perfview, you get accumulated time from all the threads that have been "waiting around" in what is called "BLOCKED_TIME". Which means for a 70 seconds time, alone the finalizer thread adds 70 seconds to this "BLOCKED_TIME" because it was sitting there not doing anything (at least almost anything in my case).
So when doing wall clock time analysis it is important to filter out what you're interested in. For example search for the thread that was taking the most CPU-time and just include this one in your analysis and go further down the stack to find pieces of your code that are expensive (and also might lead to DB or Service Calls). As soon as you a analysis from the point of view of a method you are really getting the times that were spent in this method and the accumulated "BLOCK_TIME" is out of the picture.
What I found most useful is searching for methods in my own code that "seemed time consuming", i switched to the callers view for this method. Which shed some light from where it's called and in the callees view what is responsible for the consuming time further down the stack (a DB call in a repository or service calls for getting some data).
PerfView, released recently by Microsoft, has the ability to collect Event Tracing for Windows (ETW) data to trace the call flow of processes identifying the frequency with which functions are called. Until now, this tool has only been used internally within Microsoft by developers responsible for ensuring optimal performance with components of the operating system.
In addition to profiling process performance data (something tools like Perfmon, PAL and xperf can't easily do), PerfView also has the ability to analyze process memory heaps to help determine if memory is being used efficiently. It also has a Diff capability that allows you to determine any differences between traces to help spot any regressions. Finally, the tool has a Dump capability to generate a process memory dump.
Installing PerfView
Version 1.0 of the product includes a zip file with just one executable file, perfview.exe, making installation easy. You can copy the file to the various servers that you want to trace and then analyze the data there or on your local workstation. PerfView is supported on Windows Vista, Windows 7, Windows Server 2008 and Windows Server 2008 R2.
Collecting Profile Data
PerfView leverages Event Tracing for Windows, which has been built into the operating system since Windows 2000 Server. Only recently have tools such as XPerf and PerfView taken advantage of ETW data for troubleshooting performance problems.
Viewing the Results
Once you have collected data during the time period for the performance issue, you can analyze the ETL file with PerfView. The ETL file will appear in the lefthand pane with the name you provided during the collection dialog or run command. By double-clicking the ETL file, about a dozen individual leaf nodes will appear with names indicating their contents. For example, you will see TraceInfo, Processes, Events, CPU Stacks, etc. as seen in Figure 2. By double-clicking each node, an appropriate viewer will reveal the contents.
As you can see in the example, the function System.DateTime.get_Now() is executing 87% of the time. Therefore, to get the biggest bang for the buck, you would want to focus on optimizing either the number of times this module is called, or optimize the code within the module. While this is a trivial example, the tool can help you to identify misbehaving applications and where they are wasting time.
PerfView is a user-friendly tool that can be used to collect and analyze ETW data for profiling process performance data issues. The tool can quickly reveal the operating system functions that are being executed on behalf of the process, gaining insight to where performance problems may be lurking.
PerfView is a tool for quickly and easily collecting and viewing both time and memory performance data. PerfView uses the Event Tracing for Windows (ETW) feature of the operating system which can collect information machine wide a variety of useful events as described in the advanced collection section. ETW is the same powerful technology the windows performance group uses almost exclusively to track and understand the performance of windows, and the basis for their Xperf tool. PerfView can be thought of a simplified and user friendly version of that tool. In addition PerfView has ability to collect .NET GC Heap information for doing memory investigation (Even for very large GC heaps). PerfView's ability to decode .NET symbolic information as well as the GC heap make PerfView ideal for managed code investigations .
PerfView was designed to be easy to deploy and use. To deploy PerfView simply copy the PerfView.exe to the computer you wish to use it on. No additional files or installation step is needed. PerfView features are 'self-discoverable'. The initial display is a 'quick start' guide that leads you through collecting and viewing your first set of profile data. There is also a built in tutorial. Hovering the mouse over most GUI controls will give you short explanations, and hyperlinks send you to the most appropriate part of this user's guide. Finally PerfView is 'right click enabled' which means that you want to manipulate data in some way, right clicking allows you to discover what PerfView's can do for you.
PerfView is a V4.6.2 .NET application. Thus you need to have installed a V4.6.2 .NET Runtime on the machine which you actually run PerfView. On Windows 10 and Windows Server 2016 has .NET V4.6.2. On other supported OS you can install .NET 4.6.2 from standalone installer. PerfView is not supported on Win2K3 or WinXP. While PerfView itself needs a V4.6.2 runtime, it can collect data on processes that use V2.0 and v4.0 runtimes. On machines that don't have V4.6.2 or later of the .NET runtime installed, it is also possible to collect ETL data with another tool (e.g. XPERF or PerfMonitor) and then copy data file to a machine with V4.6.2 and view it with PerfView.
Hopefully the documentation does a reasonably good job of answering your most common questions about PerfView and performance investigation in general. If you have a question, you should certainly start by searching the user's guide for information
Inevitably however, there will be questions that the docs don't answer, or features you would like to have that don't yet exist, or bugs you want to report. PerfView is an GitHub open source project and you should log questions, bugs or other feedback at
If you are just asking a question there is a Label called 'Question' that you can use to indicate that. If it is a bug, it REALLY helps if you supply enough information to reproduce the bug. Typically this includes the data file you are operating on. You can drag small files into the issue itself, however more likely you will need to put the data file in the cloud somewhere and refer to it in the issue. Finally if you are making a suggestion, the more specific you can be the better. Large features are much less likely to ever be implemented unless you yourself help with the implementation. Please keep that in mind.
Perhaps the best way to get started is to simply try out the tutorial example. On windows 7 it is recommended that you doc your help as described in help tips. PerfView comes with two tutorial examples 'built in'. Also we strongly suggest that any application you write have performance plan as described in part1 and part2 of Measure Early and Often for Performance .
This view shows you where CPU time was spent. PerfView took a sample of where each processor is (including the full stack), every millisecond (see understanding perf data) and the stack viewer shows these samples. Because we told PerfView we were only interested in the Tutorial.exe process this view has been restricted (by 'IncPats') to only show you samples that were spent in that process.
It is always best to begin your investigation by looking at the summary information at the top of the view. This allows you to confirm that indeed the bulk of your performance problem is related to CPU usage before you go chasing down exactly where CPU is spent. This is what the summary statistics are for. We see that the process spent 84% of its wall clock time consuming CPU, which merits further investigation. Next we simply look at the 'When' column for the 'Main' method in the program. This column shows how CPU was used for that method (or any method it calls) over the collection time interval. Time is broken into 32 'TimeBuckets' (in this case we see from the summary statistics that each bucket was 197 msec long), and a number or letter represents what % of 1 CPU is used. 9s and As mean you are close to 100% and we can see that over the lifetime of the main method we are close to 100% utilization of 1 CPU most of the time. Areas outside the main program are probably not interesting to use (they deal with runtime startup and the times before and after process launch), so we probably want to 'zoom in' to that area.
3a8082e126