Dealing with Performance, I'm going to assume you're passing in the depth buffer as a constant buffer to a vertex shader. If so, the constant buffers used in the DirectX x86 (including HoloLens) have to pack into 16 bit aligned structures. It doesn't matter if you use 16, 24 or 32, they will always pack into 16 bit aligned. This means if you use 24 bits- you're internally using 2-16bits (32 bits) with 8 bits being unused (24+8 unsued =32bits), so for performance you may want to stick some other value in those 8 unused bits, or convert your depth buffer into a full 32 bits. This is how the DirectX SIMD data types work on x86 platforms for heap allocated variables.
DXT5 converts a 4x4 block of pixels into 128 bits of output (consisting of 64 bits of alpha channel data (two 8 bit alpha values and a 4x4 3 bit lookup table) followed by 64 bits of encoded color data) resulting in a 4:1 compression ratio with 8-bit RGBA input data.