TL;DR Version: Unified Memory is not working on the Tegra K1. On the Tegra X1, the Unified Memory model is behaving same as zero copy.
This inference is from nvprof/nvvp. The unified and zero copy executions do not show any copy calls.
NOTE: However, this must be taken with a pinch of salt as the "unified memory profiling" option for nvprof is unsupported on the Tegra devices. I cannot confirm with 100% certainty that there is no copy happening. From the execution patterns, it looks more likely that there is no copy happening. (For comparison, the x86 profile shows D->H and H->D copy execution for the unified api when seen in NVVP).
I have attached 3 cuda source files (standard, unified, zero copy) along with 6 nvprof files, 3 for TX1 and 3 for x86. These can be opened in NVVP if you wish to dig further.
The compilation command I used was
/usr/local/cuda/bin/nvcc file.cu -ccbin /usr/bin/cc -gencode arch=compute_XY,code=sm_XY -I/usr/local/cuda/include -o file
Where file is the filename, XY is the compute code of your GPU. 32 for K1, 53 for X1.
Lastly, I'm not sure why unified memory file is not working on the Tegra K1. I've tried to investigate it and put it more time into debugging it than I would have liked to. If you wish to investigate it, let me know what comes up.
-Shehzan