There are at least two options I'm aware of for MacOS, possibly three. The one I have direct experience with is the VideoToolbox api. It supports only h.264 at the time of this writing, but it works on both iOS and MacOS. Like the other solutions discussed here, the result is a handle to a block of memory in VRAM. Ideally, it should stay there without being copied to RAM more than the once required for it to be packaged for network delivery to the remote h.264 app. Similarly, I'd be interested to know whether on decode, where the product is an OpenGL texture, it can stay on the card, and be composited using OpenGL for display. H.264 is supported but I don't believe it is universally implemented on browsers, so until Apple stops snubbing vp8/vp9 or until some new standard overshadows all of them, the VideoToolbox is probably not a great solution. Also, it is VERY poorly documented. There is a session from WWDC where it is discussed carefully, but even so, it is a strange interface. The other option is that Intel must have MacOS API's to their on-chip hardware encode/decode. This will only work on MacOS, but if that's OK, it's probably the best solution, since it produces vp8/vp9 and will be similar to solutions that others produce as a model for LINUX and Windows. The third is pure conjecture on my part because I have no direct knowledge about it, but I would be shocked if AMD and Nvidia do not have API's for the Mac to access their GPU facilities. Not all Macs have GPU's, so you would want to create an abstract implementation that can manage either if you want to take advantage of the GPU when it is available. It is virtually guaranteed to be more performant.
Personally, it's the "plumbing" I'm more interested in. Since encoding by definition produces very small products, and since it has to be in RAM ultimately, I'm not very concerned with that end of things. The performance improvements and parallelism provided by hardware encoding will outweigh the hit for moving the compressed frames backwards on the memory bus to VRAM. On the decoding side, the performance/power improvement will be less of a win, and dragging uncompressed textures from VRAM to RAM could outweigh the performance improvements. Ideally, I'd like to be able to treat the decoded frame as an abstract image handle that allows me polymorphically to do any necessary compositing/blitting with OpenGL. In theory, the image ought to be able to stay in VRAM until it is released, i.e. it never HAS to be copied to RAM.