What is demonstrated in the tutorial is the right idea: decode each video frame and use that data to fill an OpenGL texture. I do not have experience with the DrawDib functions, and I do not know if there are complexities associated with a VR Juggler application object creating a device context as is shown in the tutorial. There is more than one way to decode video, though.
What I would do is use ffmpeg as the decoder (I am assuming it can decode 3DS Max video). That said, ffmpeg used to be quite a hassle to build on Windows, and that may still be true. It is possible that more code will be required for the frame decoding as compared to what is shown in that tutorial, but it will be able to happen independently of the native windowing system. That can be a pretty critical detail, particularly with the Win32 API since it is replete with functions that want window handles, device contexts, etc.
Once you have the video decoding sorted out, the next challenge is going to be the frame rate. Basically, an OpenGL context is redrawn by the VR Juggler Draw Manager as frequently as possible. The video you want to play probably has a frame rate of 24 Hz, and it would look quite bad when rendered at 60 Hz, 120 Hz, or whatever maximum rate rendering can be achieved with your application. You would probably need to do the decoding in a separate thread and then swap the contents of the OpenGL texture only when the current frame has been decoded—and maybe after some sort of artificial delay ((1 / frame rate) - frame decoding time). That way, the application can render at its rate, but the texture "frame" changes at the rate of the source video.
-Patrick