It's easy to get wrapped up in the details of all this so let me try to explain this at a
higher level. I do assume you read what's above. I rely on your understanding &
performing certain of those tasks as I've already described them. This is not a
replacement for what's come before, just a clarification.
You should always start by trying to get VDH to download the item in question. It is by
far the easiest way to do things. But VDH makes no claims to being able to handle every
case so this approach should be something you go to only when VDH doesn't work.
There are 2 types of streams that this approach can handle. One is HTTP Live Streaming
(HLS). The other is Dynamic Adaptive Streaming over HTTP (DASH). In both cases, there
is a high-level file sent to your browser that describes the streams that can play in
some sort of video player in a web page you are surfing to. This high-level file is
called a manifest. You have to find the manifest as your first step in getting the
content you're having trouble downloading.
For HLS, the manifest is a file whose extension is .m3u8. For DASH, the manifest is a
file whose extension is .mpd. These are plain text files & you would do well to open
them in a text editor & look at them to get an idea of how the information inside is
structured. Remember, you're dealing with computers here. A computer is just a pile of
electronic circuitry that can't think. So a manifest is a rather simple-minded way of
describing stream data.
I describe upthread here how to find your manifest. Open the Network Monitor & filter on
either .m3u8 or .mpd. You don't know ahead of time which one will show up so you have to
try them both. We've even encountered situations lately in which both types of manifest
are used on the same web page. That means that you might have to look for a .m3u8, then
a .mpd, then maybe even a .m3u8 again, & maybe even a .mpd again. You just have to make
some guesses & be persistent. Sadly, there are also web sites that use neither .m3u8
nor .mpd. Sorry, I have no solution for those.
Quite often, especially in the case of HLS streams, filtering on .m3u8 or .mpd will turn
up multiple manifests. Start working with the first one that is displayed. This is
usually the master manifest. It normally does not describe a single stream directly, but
rather it describes a collection of stream manifests. Often enough, its file name
contains the word "master." When you see multiple manifests, the most likely situation
is that the first one is the master manifest & the others are stream manifests. Those
stream manifests will have entries in the master manifest.
Once you've found a manifest, you want to do 2 things with it:
- Download it onto your system so you can open it in a text editor (like Notepad).
- Get its URL.
I describe how to do both of these tasks upthread. I've already said you should look
inside your manifest to get an idea of what's in there. But don't break your head trying
to understand it in full. You have a tool that interprets a manifest for you & reports
what is in there in a reasonably easy to read format. That tool is ffprobe. I say its
report is reasonably easy to read. Not totally easy, but if you look carefully at the
ffprobe output, it starts to make a certain kind of sense. I have never read any
documentation that explains what ffprobe reports. I just figured it out by looking at
it. No, I'm not some kind of wizard. Just look at the report & you'll see things that
are pretty obvious to understand. There's also a few things that I don't completely
understand but the important parts are pretty clear.
You need to understand that ffprobe can report what's in a .m3u8 manifest same as in
a .mpd manifest. It generates the same kind of report in both cases. All you have to do
is execute this command:
ffprobe "
http://URL-of-the-master-manifest"
In order to illustrate what ffprobe is telling you, I ran ffprobe against some files I
already have on my system. That's like this:
ffprobe "x:\directory\directory\directory\file"
Here are some samples of ffprobe output for various file types I happen to have handy.
This is not a complete list of every type of file there is. It's just a few types that
you are likely to encounter. Also, I am extracting only a small portion of what ffprobe
reports so you can focus on the bits that you will use to decide what to download.
There's a lot of other detail in ffprobe output. It's certainly interesting but most of
it is not pertinent to downloading content.
Here's what ffprobe tells you about a regular old .mp4:
Stream #0:0(eng): Video: h264 (High) (avc1
Stream #0:1(eng): Audio: aac (LC) (mp4a
This .mp4 file consists of 2 streams. Streams. That's the terminology ffmpeg & ffprobe
use for these things. They are the video track & the audio track of the file. This file
has a Stream #0:0 that is a video track & a Stream #0:1 that is an audio track.
Sometimes the audio track comes before the video track. This is not significant. It
only matters that there is a video track & an audio track.
Here's what ffprobe tells you about a .mkv file:
Stream #0:0(eng): Audio: opus
Stream #0:1: Video: av1
Almost the same information. This file happens to have the audio before the video,
whereas my sample .mp4 had them in the other order. That's an insignificant difference.
See past that. The major difference I want to point out is that the .mp4 shows avc1 in
the video stream & the .mkv shows av1.
Here's what ffprobe tells you about a .webm file:
Stream #0:0(eng): Audio: opus
Stream #0:1(eng): Video: vp9
Here's what ffprobe tells you about a .mp3 file:
Stream #0:0: Audio: mp3
The .mp3 format is audio-only. There is only one stream, one track, in a .mp3.
Here's what ffprobe tells you about an audio-only .mp4 file:
Stream #0:0(und): Audio: aac (LC) (mp4a
I got this from one of the audio files I downloaded from the Metropolitan Opera. This is
the same information that I show above for the regular .mp4 file that consists of both
video & audio. This one has only the audio but the information ffprobe displays for the
audio stream is the same in both cases.
Here's what ffprobe tells you about a captions or subtitles file:
Stream #0:0: Subtitle: webvtt
You need to look at the ffprobe report in order to decide what file type you need on your
output file in the ffmpeg command that will download the object. You have to provide an
output file name with the right extension depending on what the input is. There's 2
parts of the ffmpeg command that interact & I will explain that here in a moment. One
part is the type of stream you will be taking as your input. Hang on, I'm getting to the
other part.
You need to look carefully at other details that ffprobe reports, details I'm not showing
here. But they are really obvious once you start looking at what ffprobe reports. On
the lines for video streams, you will see the video resolution: 640x480, 1280x720,
1920x1080, 3840x2160. Every manifest will have its own pattern of resolutions. Every
manifest is different. But the concept is the same. Other things you'll need to look
for are a bit less noticeable. These are numbers followed by kb/s. These are bit rates
that you can use to choose between 2 video streams that show the same resolution. Audio
streams also have bit rates & if the manifest has more than one audio stream, this can
help you decide which one to download. The higher the kb/s, the higher the quality of
the stream & the larger the file will be when you download it. One other detail to look
for in video streams is a number followed by fps. This is the frame rate, frames per
second, of the video, another factor that indicates the quality of the video.
The stream number provides a name for you to use when you decide which stream to
download. I have examples upthread here of manifests that include several streams. So
it's not always true that you just get streams 0:0 & 0:1. You have to look at the
ffprobe output to determine which one(s) is(are) video & which audio.
The simplified form of the ffmpeg command for downloading something is this:
ffmpeg parameters -i "URL of the master manifest" -codec: copy -map p:q -map r:s "x:\output directory\output file name.correct extension"
The -map parameters select the streams you want out of the manifest. If you study a
report from ffprobe & decide that, let's say, video stream 0:6 is the one you want, and
audio stream 0:9 is the one you want, you would code this:
ffmpeg parameters -i "URL of the master manifest" -codec: copy -map 0:6 -map 0:9 "x:\output directory\output file name.correct extension"
This will put the video stream into the output file in front of the audio stream.
Equally, you could do this:
ffmpeg parameters -i "URL of the master manifest" -codec: copy -map 0:9 -map 0:6 "x:\output directory\output file name.correct extension"
The streams would end up in the other order. That is insignificant. All video players
can handle files configured either way. The important idea here is that ffmpeg can put
separate video & audio streams into a single output file. If you do want to keep the 2
streams in separate files, just execute 2 ffmpeg commands, one with only -map 0:6, the
other with only -map 0:9.
This use of the -map parameter simplifies something I talk about upthread here. I
pointed out one manifest that had partial URLs & the complicated way you had to
reconstruct the full URLs. You can forget all of that. With the -map parameter, ffmpeg
figures it all out for you. Poof. Magic. I hadn't figured this out when I wrote that
stuff upthread.
All the ffmpeg examples I show in this thread include the output parameter -codec: copy.
This tells ffmpeg to copy the input to the output unchanged. This suppresses one of the
interesting & powerful functions of ffmpeg: encoding. I think that's the term. Maybe
it's transcoding. Converting. Hey, I've never pretended to be an expert on ffmpeg. I
know only the little bits I explain here & no more. You've read my rants about the lousy
ffmpeg documentation. I would know more but for that. Anyway, -codec: copy does no
processing of the input. That's why you have to look at the ffprobe report to know what
kind of stream you're getting. Streams aren't always .mp4. All my examples in this
thread have had .mp4 output files but that's just a bit lucky. We all know there's
plenty of other file types out there. So if you use -codec: copy, you have to make sure
your output file name has the right extension.
But if you leave off -codec: copy, ffmpeg does additional processing to encode the input,
not just copy it. Even if your output file name matches the type you figured out from
the manifest, ffmpeg still encodes the file as it gets written. If, on the other hand,
let's say you figured out that the input is .mkv but you code an output file name
with .mp4. During the download ffmpeg will convert the file. Ffmpeg looks at the file
extension of your specified output file & generates output to match that file extension.
Be aware that this is a very CPU-intensive process. Expect ffmpeg to use about 100% of
your CPU & expect to hear your case fans blowing up a hurricane. This is not something
to be alarmed about, though. Case fans were invented to blow harder when the CPU works
harder. And ffmpeg using 100% of your CPU should not cause any undue problems of system
responsiveness. Operating systems are supposed to support multitasking. That means if
ffmpeg is using 100% of your CPU when you open some other application, the operating
system is supposed to cut ffmpeg's resource allocation so the other task can run. You
might see ffmpeg's CPU usage drop to 80% so your other task can use the 20% it needs. If
you want to run 3 or 4 other tasks, they should all coexist quite sociably. You should
not get the least bit disturbed because ffmpeg (or anything else for that matter) uses
100% CPU, assuming it isn't just having a problem & looping. So if you omit -codec:
copy, you are asking ffmpeg to do a lot of extra processing. I just want to make sure
you understand this possibility. Don't do it by accident & then be surprised.
Omitting -codec: copy should have no effect on the speed of your download. Modern CPUs &
disk drives operate at a much higher speed than even the fastest Internet connection
these days. So your system can still process & write whatever is coming in off the line
faster than ffmpeg can feed it in, even if you run some other tasks that take system
resources away from ffmpeg.
When it comes to -codec: copy it's like this. Omit it & you can code any name for your
ffmpeg output file; ffmpeg will convert the input to the type of the output file.
Include -codec: copy & you have to make sure the output file name has the right
extension. I haven't experimented a whole lot with doing conversions with ffmpeg but I
believe a mismatched output file extension when you code -codec: copy gives an error &
the download is not performed. Maybe somebody will encounter a situation in which you
either confirm or disprove what I'm saying. Do post here about that.
So here's the steps:
1. Find the master manifest, get it on your system, get its URL.
2. Run ffprobe on the manifest using its URL.
3. Read the ffprobe report, figure out what's beng offered, choose what stream or streams you want to download.
4. Decide whether to include or omit -codec: copy, name your output file accordingly.
5. Download with ffmpeg using the stream identifiers in -map parameters.