The following sections provide brief step-by-step guides of how to setup and run NVIDIA Nsight Compute to collect profile information. All directories are relative to the base directory of NVIDIA Nsight Compute, unless specified otherwise.
The UI executable is called ncu-ui. A shortcut with this name is located in the base directory of the NVIDIA Nsight Compute installation. The actual executable is located in the folder host\windows-desktop-win7-x64 on Windows or host/linux-desktop-glibc_2_11_3-x64 on Linux. By default, when installing from a Linux .run file, NVIDIA Nsight Compute is located in /usr/local/cuda-/nsight-compute-. When installing from a .deb or .rpm package, it is located in /opt/nvidia/nsight-compute/ to be consistent with Nsight Systems. In Windows, the default path is C:\Program Files\NVIDIA Corporation\Nsight Compute .
After starting NVIDIA Nsight Compute, by default the Welcome Page is opened. The Start section allows the user to start a new activity, open an existing report, create a new project or load an existing project. The Continue section provides links to recently opened reports and projects. The Explore section provides information about what is new in the latest release, as well as links to additional training. See Environment on how to change the start-up action.
The ncu can act as a simple wrapper that forces the target application to load the necessary libraries for tools instrumentation. The parameter --mode=launch specifies that the target application should be launched and suspended before the first instrumented API call. That way the application waits until we connect with the UI.
Select the target machine at the top of the dialog to connect and update the list of attachable applications. By default, localhost is pre-selected if the target matches your current local platform. Select the Attach tab and the target application of interest and press Attach. Once connected, the layout of NVIDIA Nsight Compute changes into stepping mode that allows you to control the execution of any calls into the instrumented API. When connected, the API Stream window indicates that the target application waits before the very first API call.
Use the API Stream window to step the calls into the instrumented API. The dropdown at the top allows switching between different CPU threads of the application. Step In (F11), Step Over (F10), and Step Out (Shift + F11) are available from the Debug menu or the corresponding toolbar buttons. While stepping, function return values and function parameters are captured.
Use Resume (F5) and Pause to allow the program to run freely. Freeze control is available to define the behavior of threads currently not in focus, i.e. selected in the thread drop down. By default, the API Stream stops on any API call that returns an error code. This can be toggled in the Debug menu by Break On API Error.
To quickly isolate a kernel launch for profiling, use the Run to Next Kernel button in the toolbar of the API Stream window to jump to the next kernel launch. The execution will stop before the kernel launch is executed.
Once the execution of the target application is suspended at a kernel launch, additional actions become available in the UI. These actions are either available from the menu or from the toolbar. Please note that the actions are disabled, if the API stream is not at a qualifying state (not at a kernel launch or launching on an unsupported GPU). To profile, press Profile Kernel and wait until the result is shown in the Profiler Report. Profiling progress is reported in the lower right corner status bar.
Instead of manually selecting Profile, it is also possible to enable Auto Profile from the Profile menu. If enabled, each kernel matching the current kernel filter (if any) will be profiled using the current section configuration. This is especially useful if an application is to be profiled unattended, or the number of kernel launches to be profiled is very large. Sections can be enabled or disabled using the Metric Selection tool window.
Profile Series allows to configure the collection of a set of profile results at once. Each result in the set is profiled with varying parameters. Series are useful to investigate the behavior of a kernel across a large set of parameters without the need to recompile and rerun the application many times.
For more details on these options, see Command Line Options. The options are grouped into tabs: The Filter tab exposes the options to specify which kernels should be profiled. Options include the kernel regex filter, the number of launches to skip, and the total number of launches to profile. The Sections tab allows you to select which sections should be collected for each kernel launch. Hover over a section to see its description as a tool-tip. To change the sections that are enabled by default, use the Metric Selection tool window. The Sampling tab allows you to configure sampling options for each kernel launch. The Other tab includes the option to collect NVTX information or custom metrics via the --metrics option.
Once the session is completed, the Nsight Systems report is opened in a new document. By default, the timeline view is shown. It provides detailed information of the activity of the CPU and GPUs and helps understanding the overall behavior and performance of application. Once a CUDA kernel is identified to be on the critical path and not meeting the performance expectations, right click on the kernel launch on timeline and select Profile Kernel from the context menu. A new Connection Dialog opens up that is already preconfigured to profile the selected kernel launch. Proceed with optimizing the selected kernel using Non-Interactive Profile Activity
You can switch between different Report Pages using the tab bar on the top-left of the report.You can also use Ctrl + Shift + N and Ctrl + Shift + P shortcut keys or corresponding toolbar button to navigate next and previous pages, respectively.A report can contain any number of results. The Current dropdown allows switching between the different results in a report.
On the Details page, use the Compare - Add Baseline button for the current result to become the baseline all other results from this report and any other report opened in the same instance of NVIDIA Nsight Compute get compared to.When a baseline is set, every element on the Details page shows two values: The current value of the result in focus and the corresponding value of the baseline or the percentage of change from the corresponding baseline value.
On the Details page, many sections provide rules with valuable information on detected problems and optimization suggestions.Rules can be user-defined too. For more information, see the Customization Guide.
Use the Connection Dialog to launch and attach to applications on your local and remote platforms. Start by selecting the Target Platform for profiling. By default (and if supported) your local platform will be selected. Select the platform on which you would like to start the target application or connect to a running process.
When using a remote platform, you will be asked to select or create a Connection in the top drop down. To create a new connection, select + and enter your connection details. When using the local platform, localhost will be selected as the default and no further connection settings are required. You can still create or select a remote connection, if profiling will be on a remote system of the same platform.
Depending on your target platform, select either Launch or Remote Launch to launch an application for profiling on the target. Note that Remote Launch will only be available if supported on the target platform.
Application Executable: Specifies the root application to launch. Note that this may not be the final application that you wish to profile. It can be a script or launcher that creates other processes.
Select Attach to attach the profiler to an application already running on the target platform. This application must have been started using another NVIDIA Nsight Compute CLI instance. The list will show all application processes running on the target system which can be attached. Select the refresh button to re-create this list.
Finally, select the Activity to be run on the target for the launched or attached application. Note that not all activities are necessarily compatible with all targets and connection options. Currently, the following activities exist:
Remote devices that support SSH can also be configured as a target in the Connection Dialog. To configure a remote device, ensure an SSH-capable Target Platform is selected, then press the + button. The following configuration dialog will be presented.
When a remote connection is selected in the Connection Dialog, the Application Executable file browser will browse the remote file system using the configured SSH connection, allowing the user to select the target application on the remote device.
On Linux and Mac host platforms, NVIDIA Nsight Compute supports SSH remote profiling on target machines which are not directly addressable from the machine the UI is running on through the ProxyJump and ProxyCommand SSH options.
These options can be used to specify intermediate hosts to connect to or actual commands to run to obtain a socket connected to the SSH server on the target host and can be added to your SSH configuration file.
Note that for both options, NVIDIA Nsight Compute runs external commands and does not implement any mechanism to authenticate to the intermediate hosts using the credentials entered in the Connection Dialog. These credentials will only be used to authenticate to the final target in the chain of machines.
When using the ProxyJump option NVIDIA Nsight Compute uses the OpenSSH client to establish the connection to the intermediate hosts. This means that in order to use ProxyJump or ProxyCommand, a version of OpenSSH supporting these options must be installed on the host machine.
b1e95dc632