This most probably means that your DICOM study does *not* embed MP4/MPEG2 file, but is a CINE sequence of individual 2D instances.
In this case, you have to download the individual frames and encode a MP4/MPEG2 from them. This can for instance be done using the "ffmpeg" or "avconv" command-line tools, or using the OpenCV module for Python, as explained on the following page:
The individual JPEG/PNG frames corresponding to the 2D instances can be easily retrieved using the REST API of Orthanc:
Sébastien-