No, not really, only the code.
At a high level it's just a few key components:
- Kernel - a binary blob that runs on device and does computations (for CUDA backends it's a PTX/CUDA kernel)
- Stream - defines execution order of kernels on device (mapped to CUDA stream for CUDA backend)
- Event - events to define ordering between streams
- DeviceMemory - a bag of bytes on device
- CommandBuffer - a specification that describes multiple kernel launches to save host overheads
- StreamExecutor - bundles all of these things together under single API
This is a thin abstraction layer on top of underlying device and platform-specific libraries (CUDA, ROCM).
Eugene