About NVIDIA CUDA support

393 views
Skip to first unread message

Maro Huang

unread,
Sep 24, 2021, 4:35:09 AM9/24/21
to Video DownloadHelper Q&A

Hello

There are already many NVIDIA graphics cards that support CUDA GPU acceleration.
If we use this feature, it will reduce the CPU load.

Is it possible to incorporate CUDA support into the new version?

Maro Huang

unread,
Sep 24, 2021, 4:38:16 AM9/24/21
to Video DownloadHelper Q&A
ffmpeg_cuda.png

Maro Huang 在 2021年9月24日 星期五下午4:35:09 [UTC+8] 的信中寫道:

jcv...@gmail.com

unread,
Sep 25, 2021, 3:29:14 AM9/25/21
to Video DownloadHelper Q&A
hi,

no sorry it's not planned.

jerome

hilander

unread,
Sep 25, 2021, 9:33:17 PM9/25/21
to jcv...@gmail.com, video-download...@googlegroups.com
CUDA support

.

Hello Marco Huang,

This is what i use:

Lenovo desktop computer 64-bit

3.1 GHz processor

16Gb chip ram, with 1Gb GPU (msi geforce GT 710)

SSD for windows operating system.
HDD for linux operating systems.
DVD for installing kali linux to HDD.

Use bios boot menu to boot to linux operating systems on hdd.

Use default boot, to boot to windows.

These are the steps i took to gain CUDA support in kali:

Boot to windows and get linux operating system for free:
https://cdimage.kali.org/kali-images/kali-weekly/


Download and use this version:
https://cdimage.kali.org/kali-images/kali-weekly/kali-linux-2021-W38-live-amd64.iso

Use windows to burn kali iso to DVD.

Use bios boot to DVD to install kali.

While running computer off of the DVD, use the FAILSAFE boot option to install kali to HDD.

Focus on getting kali installed to HDD, the install steps do not need to be done in sequential order. Your root partition should be greater that 100Gb in order to give you room to install all the stuff you need to run CUDA support, get updates, install other desired programs. Use advanced install options, so that you will get the /tmp /var /root /home partions installed.

After kali install is completed, boot to windows.

Get a flashdrive.

Go to https://linuxmint.com/edition.php?id=290 and download the iso file.

Go to www.pendrivelinux.com and download the flashdrive install program from: https://www.pendrivelinux.com/downloads/Universal-USB-Installer/Universal-USB-Installer-2.0.0.7.exe

While in windows (xp,7,8,10) run the universal usb installer

Install linuxmint 20.2 to the flashdrive.

Use bios boot so you can run linuxmint 20.2 from flashdrive. While in mint, run the gparted program, aka Gnome PARTition EDitor. You will need this program to adjust the size of the /root partition in kali above 100Gb. Increasing the size of the /var and /tmp partions will also help, make them bigger then 8Gb.

Shutdown computer.

Remove lid from desktop computer, and remove the GPU card from the motherboard. Connect vga cable to vga port on motherboard.

Use bios boot menu, to boot kali from hdd.

Follow directions listed here: https://www.kali.org/docs/general-use/install-nvidia-drivers-on-kali-linux/

I followed the original directions from that web page on august 20, 2021, and it worked for me at that time. You will see that this page has been updated recently. Hopefully the changes will not mess up your efforts.

After all the directions from the https://www.kali.org/docs/general-use/install-nvidia-drivers-on-kali-linux/ website have been completed, shut down your computer, reinstall your GPU card.

Connect your vga cable to GPU card. Put lid back on computer.

Use bios boot menu, to boot kali from HDD.

Firefox is in kali by default. Video Download Helper can be put in the add-ons.

Good luck.

.

--
You received this message because you are subscribed to the Google Groups "Video DownloadHelper Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to video-downloadhelper...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/video-downloadhelper-q-and-a/e3eb08ba-b106-4e30-aa4b-e6448cea8d1en%40googlegroups.com.

Maro Huang

unread,
Sep 26, 2021, 12:47:53 PM9/26/21
to Video DownloadHelper Q&A
Hello

I found a simple way (although it is not perfect).

First, replace the original FFMPEG of Video DownloadHelper with a full version that supports CUDA (for example, ffmpeg-N-103833). The original FFMPEG path may be in "C:\Program Files\net.downloadhelper.coapp\converter\build\win\64". For safety, it is recommended that you back up the original folder first.

Rename FFMPEG (the executable file that was changed to the full version) to "ffmpeg-original.exe".

Create a Batch file in the FFMEPG folder, the content is "%~dp0ffmpeg-original -hwaccel cuda -i %*", the file name can be taken as "ffmpeg-cuda.bat".

螢幕擷取畫面 2021-09-27 002418.png

Use "Bat To Exe Converter" to convert "ffmpeg-cuda.bat" to "ffmpeg.exe". Please be careful, Video DownloadHelper should not allow you to customize FFMPEG file names.

螢幕擷取畫面 2021-09-27 002559.png

After all the steps are completed, your FFMPEG folder should look like this.

螢幕擷取畫面 2021-09-27 002521.png

Then just use Video DownloadHelper as usual without any changes. However, you may find that when downloading multiple streams at the same time (for example, more than 10), compared to before, the CPU load is reduced while the GPU load is higher.

螢幕擷取畫面 2021-09-27 002446.png

This helped me a lot, and also found something to do for my GPU. (: D) Because before downloading more than one stream at the same time, the CPU load is very high, and sometimes FFMPEG may crash and the files will be corrupted. Now switch to this method, you can download more streams at the same time, and the computer can also do other tasks.

hla...@rocketmail.com 在 2021年9月26日 星期日上午9:33:17 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 26, 2021, 10:09:06 PM9/26/21
to Video Download Helper Google Group
You've caught my attention with your comment about multiple VDH downloads causing crashes
& degraded system performance, with downloaded files being corrupted. Actually, I don't
think they're corrupted, they're just incomplete, which may be splitting hairs because
either way, you'll have to download them again. I have been thinking about discussing
something with Michel on this score. Michel has said on here a number of times that part
of what he downloads is video metadata that cannot be added to a downloaded file until
the download completes. He has said that he keeps this metadata in storage & allows the
operating system to deal with it. My belief is that "letting the operating system deal
with it" means the data is allowed to go to the page file. When a download completes, I
sometimes see a ridiculous amount of activity on my page file. This is after a large
download from a particular site on which I have found that I have to single-thread my
downloads. This is a pay site. When I log in there, I briefly see a web page that says
Cloudfront is protecting the site from distributed denial of service attacks. When I
first started using this site, I was trying to push through any number of concurrent VDH
downloads. I was seeing maybe the first one getting reasonable service, between 5
million & 9.5 million bytes per second download speed, from the site, & the others
getting, I kid you not, 100,000 bytes per second download speeds, similar to dial-up
modem speeds. It dawned on me that I was being seen as a DDoS threat. So I started
single-threading my VDH downloads & things improved. Their watchdog is being perhaps a
little too zealous at his task. So I see bad system performance during VDH download
finalizing when I'm doing only 1 download at a time.

I have been considering asking Michel to change the way VDH downloads. Instead of
keeping all this metadata in program storage, he should periodically checkpoint it to a
file throughout a download. Periodically could mean on a timer, every so many blocks of
data downloaded, every time the metadata reaches a certain size threshold, any number of
criteria. This would reduce the reliance on the page file, which is a resource used by
every application you're running at any given time. No, the answer is not to allocate a
larger page file. Windows expands the page file dynamically as needed. I have my page
file on a single-partition hard drive that is 6T. Not K, not M, not G, T. 6T. So my
page file allocation is as big as Windows could ever want it to be & I regularly see
thrashing out the wazoo when one of these large downloads completes. My system becomes
very sluggish, not totally unresponsive, but quite bad. Firefox becomes nearly unusable.
The CoApp usually terminates completely, despite the fact that because I'm
single-threading, there's plenty of downloads in the VDH queue. It takes so long, maybe
20 minutes or longer, that my logon on the web site times out & the queued VDH downloads,
once the system stops thrashing, all rapidly fail in quick succession because I'm no
longer logged on. Fortunately, the way I operate lets me put all the lost downloads back
into the VDH queue without too much hassle. But losing my system for 15 minutes to half
an hour is a hassle. I am patient enough to just let the system sort things out, and so
far it has always sorted itself out. But I wish VDH were architected differently.
Relying on Microsoft to build brilliant software is not a recipe for success. Frankly,
I'm not so interested in hearing that this isn't a problem on Linux, which is probably a
far more intelligent system. What I suggest would work just as well on Linux, so it's
not like I'd expect the Windows VDH & the Linux VDH to be forked. Make them both work
the same. Checkpointing the metadata to a file is a smarter approach no matter the
platform.

But the intriguing idea here is to replace the customized ffmpeg inside the CoApp with a
standard ffmpeg. I'm not at all interested in this NVidia CUDA whatever it is you're
talking about. I only barely understand what you're trying to do. But swapping the real
ffmpeg into the CoApp is something I could do in about 10 seconds since I regularly use
standard ffmpeg instead of VDH, so I've got the executable here already on my system. Do
you have any experience with simply replacing the captive ffmpeg inside VDH with the
standard one? I wouldn't be CUDA-izing this ffmpeg, like you describe. I'd just copy
the standard ffmpeg into the CoApp directory. If this stops the system thrashing I see a
few times a day, I'd do this in a heartbeat. I may do this anyway as an experiment. I'm
not sure what features of VDH I would lose. I'd like to hear that detail from Michel.
But if this avoids the thrashing I've been seeing, I'd be all for it.

Maro Huang

unread,
Sep 27, 2021, 12:37:12 AM9/27/21
to Video DownloadHelper Q&A
For replacing VDH's FFMPEG with standard FFMPEG (without starting CUDA), I simply tested the performance, and there is a difference.

With the default VDH installation, usually when 8 to 9 streams are downloaded at the same time, VDH will become unstable, and all the stream downloads will often end at the same time. At that time, the streaming was actually still playing.

I installed three versions of Firefox (Beta, Developer, Nightly), each of which only downloads five streams, which can operate stably. However, this caused a waste of memory in net.downloadhelper.coapp-win-64.exe.

By the way, using a single thread when downloading a stream is the most stable, which is very useful when downloading live images, but it is more time-consuming when downloading videos.

After switching to the standard FFMPEG, I can download more than 15 videos stably with one Firefox without changing the detailed settings.

However, the memory load of standard FFMPEG seems to be much higher than VDH's FFMPEG.

So for me, the CPU load problem that has been the most troubled before has been solved, but it has become the problem of insufficient memory. (In fact, it is the GPU that shares part of the work of the CPU)

Perhaps the best way is to recompile FFMPEG to make it lighter. But I tried to compile before but failed, so I might not consider compiling again for the time being.

If FFMPEG in the next version of VDH supports CUDA, even if there is no interface available, according to my current experimental results, CUDA can still be used for GPU acceleration.

Wild Willy 在 2021年9月27日 星期一上午10:09:06 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 27, 2021, 4:11:20 AM9/27/21
to Video Download Helper Google Group
Wow. You really want to know how it all performs, don't you?

My horizons are a bit less distant. I simply swapped out the ffmpeg in the CoApp
directory & swapped in the real ffmpeg. Mine says it's version
2021-07-11-git-79ebdbb9b9-full_build-www.gyan.dev built by MSYS2 project. I just got it
a couple of months ago, on July 11 to be precise, which you can see from the version
number. I've tried a handful of tests.

- From my pay site I've downloaded a number of plain MP4s. This was with VDH restricted
to just one download at a time.
- I downloaded a random clip off YouTube.
- I downloaded the game highlights of a football game off nfl.com. This claimed to be
HLS streaming that gave me a short MP4. I downloaded it because it wouldn't play in the
web page on the site. Plus VLC has much better controls than any web site player for
rewinding & replaying segments, especially in slow motion.

Everything worked. But then, I didn't exactly exercise the boundary conditions that
we're discussing here. I still need to try a live stream, & I need to try bumping up
that VDH parameter to see how it goes with multiple concurrent downloads. I did not
experience the system freeze with any of these tests, but my system doesn't freeze with
every download. I haven't identified a pattern of events leading up to the cases in
which I observed one of these freezes. In other words, I can't cause one on demand, so I
can't run a test that I know will either prove or disprove that swapping ffmpeg has
solved the problem.

Interesting, what you say about the footprint of the versions of ffmpeg. When I run
ffmpeg, among the first things it logs is a list of the switches that were used when it
was built. Well, ffmpeg doesn't actually have a log. I capture the very verbose ffmpeg
output via the Windows standard 2> redirection on the command line. When you get certain
errors in VDH, you get something that looks a lot like the ffmpeg log. I've never
bothered comparing the two but I suspect the captive ffmpeg in the CoApp is one of those
"lighter" builds. I prefer the heavier build because (a) I don't really know what I'm
doing, & (b) I have no idea what features I may or may not want to use. The ffmpeg
documentation is so useless, I just think it's safer to use the heaviest ffmpeg I can
find. There are several weights of ffmpeg available where I got my copy, which is at
ffmpeg.org.

Also, Michel has told me in a discussion on here (I think I have a reference to it among
the threads in my Table of Contents thread under the discussion of separate video &
audio) that VDH does not use ffmpeg to download files & streams. He said he has coded
invocations of other standard APIs to download content. I think he uses ffmpeg only for
post-download activities like aggregating separate video & audio, converting files from
one type to another, & other functions like that. In the tests I ran, I don't think the
MP4s behind the paywall even exercised ffmpeg. I think my other 2 tests did but I'm not
sure. I did see the usual things in the VDH menus, including download progress. If I go
a few days without seeing the system freeze, I'll say swapping in standard ffmpeg cures
that. I can't reach that conclusion so far. I'll also have to see if I have reason to
try multiple concurrent downloads. Those have always caused a freeze when one of them
ended.

My Table of Contents thread is here:

https://groups.google.com/g/video-downloadhelper-q-and-a/c/BzPLK2YyL-s

I'm pretty sure the problem lies with the stuff Michel has said he keeps in storage until
a download ends. I don't think that has anything to do with ffmpeg. Making such high
demands on the page file is not a very sociable thing for any application to do. Like I
said earlier, the page file is a resource shared by every running process, not just
applications, in the system. By not periodically checkpointing to a file then discarding
the data VDH keeps in program storage, VDH is being selfish about the page file. That
makes everything else in the system suffer. It even makes VDH & the CoApp suffer. This
is not a good architecture & it ought to be changed. I do hope Michel joins this
conversation. I want to know what features I don't have because I'm not using the
captive ffmpeg he supplies with the CoApp. I also want to know how relevant are any of
our speculations about the performance of VDH.

Maro Huang

unread,
Sep 27, 2021, 7:46:34 AM9/27/21
to Video DownloadHelper Q&A
This discussion became interesting.

Indeed, I don't know that FFMPEG is responsible for those functions in VDH. However, from the fact that net.downloadhelper.coapp-win-64.exe sometimes consumes a lot of CPU and memory resources, it is definitely not just forwarding the stream source to FFMPEG.

In my observation records, sometimes net.downloadhelper.coapp-win-64.exe will use several GB of memory, and it will not be automatically released after the download is complete.

So I guess that all streaming information is processed by net.downloadhelper.coapp-win-64.exe to provide various important functions, such as a progress bar.

When I tested the effect of FFMPEG enabling CUDA, I made a simple batch (of course it has been converted to .exe), the content is as follows:
"%~dp0ffmpeg-N-103833\ffmpeg -hwaccel cuda -i %1 -c copy -bsf:a aac_adtstoasc %2"

In this way, I only need to enter the streaming source URL and the file storage location. And most importantly, the load on the CPU and memory is unexpectedly small. If I want to download more than 50 streams at the same time, I might consider doing so.

So I have considered using PyQt to make GUI and manage FFMPEG threads. However, there are some functions of VDH that cannot be replaced by GUI. For example, on the page of multiple streaming sources, the streaming URL can be detected as the source is switched.

But this makes me have a guess, if FFMPEG is used as the core of the download, and CUDA is enabled, the performance may be very good. But I think the most difficult thing to overcome is the progress bar. From the output information of FFMPEG, it may not be easy to obtain progress information.

After that, I may make a simple GUI program that can find the streaming URL from the page, generate FFMPEG threads, and provide basic file name management. Having said that, it is no longer a discussion of VDH, but how to download streaming videos more efficiently.

VDH has many great features, which I like very much. If it can be more "lightweight", it would be great.

Wild Willy 在 2021年9月27日 星期一下午4:11:20 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 27, 2021, 1:15:11 PM9/27/21
to Video Download Helper Google Group
Michel has just posted an interesting comment over here:

https://groups.google.com/g/video-downloadhelper-q-and-a/c/S4K1QgvT-K4

I don't know what services of ffmpeg the current VDH uses during download itself. I
don't know whether the progress bar is something VDH generates from using ffmpeg or
otherwise. Other discussions I've had on here with Michel lead me to believe he's
calculating the progress bar some other way. He's also acknowledged that the calculation
is not always correct. But certain things appear to be about to change. I don't know
what effect that will have on what we are observing. Specifically, I don't know what
effect it will have on VDH's load on the page file. That is what interests me.

As for when the CoApp relinquishes storage after a download completes, that is controlled
by a setting in the VDH Settings dialog. On the Behavior tab of the VDH Settings dialog,
look for a setting named CoApp idle exit timer. When that timer expires, the CoApp is
supposed to terminate. In my experience, it indeed does terminate after that timer
expires. Are you not observing that? I have mine set to 600,000. What is your setting?

Maro Huang

unread,
Sep 27, 2021, 2:20:24 PM9/27/21
to Video DownloadHelper Q&A
This is really great news!

The new version that Michel is developing is exactly what I expected,
This way I don’t have to write programs (:D)

As for the idle exit timer, I use the default value, which is 60000.

Directly handle the streaming by FFMPEG, I think it can have better performance,
It would be better if you can support CUDA at compile time, even if you can't set it through the interface.

Unfortunately, I am not familiar with FFMPEG, and I am not sure what functions it has.
Calling it directly is like calling shell instructions when designing a Linux program.
Similarly, it is tricky to analyze the returned information. (sigh)

I have searched for information on using Python GUI to make streaming download programs.
For example, it is necessary to simulate the browser to play the stream, obtain the URL of the xhr stream file, and so on.

Although I have procrastination (the doctor has not yet certified), I should be able to make a simple sample.
As for the complete function like VDH, it may not be possible (: D)

In this way, are you willing to change to a graphics card that supports CUDA GPU acceleration? (: D)

Wild Willy 在 2021年9月28日 星期二上午1:15:11 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 27, 2021, 8:42:25 PM9/27/21
to Video Download Helper Google Group
I only recently added a graphics card to my system. My moherboard has a video adapter
built in & is supposed to have 2 video outlets, one is DisplayPort, the other HDMI. I
had my TV connected to the HDMI port as monitor #2. During a thunderstorm, the HDMI port
got fried. Very weird since (a) I have all my computer equipment plugged in on the
battery backup side of a UPS (which also provides surge protection), (b) I have all my
audio/visual equipment on the other side of the room plugged into a second UPS, although
very little of that is on the battery backup side, but it is also a surge protector, (c)
no other parts of any of my computer or A/V equipment were affected, specifically the DP
video outlet still works fine. Not to mention the rest of the motherboard, the CPU, all
my 6 HDDs, my surround receiver, my TV, my cable box, other A/V components. So in order
to continue to use my TV as the second monitor, I needed to add a video card. It is
NVidia but I'm not using it for its other functions, which include audio. Maybe it
already supports CUDA. Is CUDA meant to be a reference to barracuda? In any case, I
haven't had this card long enough to feel I have amortized its cost yet. Yeah, I know,
video cards aren't very expensive. But I'm a cheapskate. Deal with it. It's a GeForce
GT 710. Does that have CUDA? I'm being lazy. Let me look it up. I've found this:

https://www.nvidia.com/en-us/geforce/gaming-laptops/geforce-710m/specifications/

Apparently, I've got CUDA. I'm a bit perplexed by the M on the end of the model name, but
Windows Computer Management reports the video adapter as I quoted it above. I don't have
the original packaging any more, which isn't like me. I do stll have the original bill
of sale (not from Amazon) & it appears that the OEM of the card is MSI. I gather it's up
to the OEM whether the CUDA feature gets implemented. I suppose the way to find out if
I've actually got the CUDA feature enabled is to have ffmpeg inside VDH CUDA-ized & see
what happens. Speaking of which, how would I tell if some of the CPU load has been
offloaded to the GPU? Something in the Resource Monitor?

As for the CoApp supporting CUDA, is it simply a matter of building ffmpeg with a
particular parameter? If your video adapter doesn't do CUDA but ffmpeg is built with the
CUDA parameter, will everything still work? In that case, it seems advisable to go ahead
& have a CUDA-ized ffmpeg inside VDH.

But in the end, how much improvement would you get? The constraining factor here is the
speed of your Internet connection. More specifically, the speed at which the server is
willing to deliver your download to you. Since I've gotten my fancy-ass new ISP with the
supersonic connection, I have yet to encounter a server that uses more than a few percent
of my connection capacity. Every server seems to throttle downloads more or less
mercilessly, with YouTube being particularly bad. Even the most generous servers use
barely 20% of my Internet connection capacity. What percent improvement does offloading
some processing to the GPU give you? Your download isn't going to go any faster, is it?
I should think that having VDH not be such a pig about using the page file would have a
great deal larger impact. I hope changing VDH to actually use ffmpeg for the download
has an impact on that.

Wild Willy

unread,
Sep 27, 2021, 9:26:26 PM9/27/21
to Video Download Helper Google Group
I went against my better instincts & decided to actually look at the reprehensibly bad
ffmpeg documentation, such as it is, that comes with the ffmpeg package. I specifically
looked up the -hwaccel switch. Curiously, the documentation does not list CUDA as an
option. However, there is a switch named -hwaccels which lists the available hardware
acceleration types available in the build of ffmpeg I have. Of course, the first one
listed by my build of ffmpeg is CUDA. Typical that the documentation is so faulty. In
addition, if I'm reading the documentation correctly, coding -hwaccel auto may be a more
reliable approach.

Then there's this comment in the ffmpeg documentation:

> Note that most acceleration methods are intended for playback and will not be faster than
> software decoding on modern CPUs. Additionally, ffmpeg will usually need to copy the
> decoded frames from the GPU memory into the system memory, resulting in further
> performance loss. This option is thus mainly useful for testing.

So I question the usefulness of this switch. I'm not so sure it accomplishes anything
during a download. I believe the performance improvement you're seeing is simply that
using standard ffmpeg makes a lower demand on the page file when a download completes. I
ask again, how exactly do you measure the improvement offered by offloading some
processing to the GPU? What is the difference you see between using -hwaccel cuda &
omitting the switch, & how do you measure that? Have you tried -hwaccel auto? Have you
looked at what's going on during a download separate from what's going on during the
finalization of that download?

Wild Willy

unread,
Sep 27, 2021, 9:49:20 PM9/27/21
to Video Download Helper Google Group
I'm coming to the conclusion that you are attributing performance improvements to the
wrong things. How do you know any processing is being offloaded to the GPU? How do you
measure what processing load is being carried by the GPU? Do you have a tool like the
standard Windows Resource Monitor that reports what is happening inside the GPU?

In addition, when I use ffmpeg directly to download streams, I do not use the -bsf
switch. If I'm reading this lousy ffmpeg documentation correctly, the particular switch
value you're using is applicable only in a very limited set of cases. I've seen this
switch mentioned before but I have found it unnecessary on the ffmpeg invocations I have
used to download streams. I emphasize. I use ffmpeg successfully to download streams &
I don't have that switch coded. Where did you get the idea to use the particular
invocation of ffmpeg you are showing above? Is this something you figured out on your
own or did you find it somewhere via online searches?

I would be curious to know what improvements you see by simply replacing the captive
ffmpeg supplied with the CoApp by the standard ffmpeg. I've done that & so far, I have
not noticed any difference in how VDH works. I can't say I've exercised it very hard yet
but I like simple over complicated. What happens when you do the simple thing instead of
the complicated thing? What happens when you simply replace one ffmpeg with the other?
I suspect that any impact of making this simple swap is seen when other functions of VDH
are invoked, functions that I seldom use & have not used yet since I made this swap.

Maro Huang

unread,
Sep 27, 2021, 10:35:04 PM9/27/21
to Video DownloadHelper Q&A
I think one thing is very important. What I mean by "performance improvement" does not mean "unlimited downloads".
Indeed, Internet bandwidth is very limited, and even too many downloads, which is equivalent to DDos for service providers.

I think what you said "I'm coming to the conclusion that you are attributing performance improvements to the
wrong things." This sentence is not good. You have to go through the "Peer Review Evaluation" of the "same process" at least before you should make a judgment on my conclusion.

Going back to the definition of performance improvement, for me, I want my computer to "not be blocked by the download streaming load."
So, I hope that even if I download 20+ streams at the same time, the computer can still let me do other tasks.

It may be said that my computer performance is not good enough, but for me, I was really troubled by two things:
1. When using VDH for a single browser program, when downloading about 8 to 9 streams,
the situation of "all streaming programs ending at the same time" often occurs.
2. Multiple browsers use VDH each, each downloading about 5 streams, the download is stable,
but the load on the CPU and memory is heavy, a total of about 15 streams are downloaded, and the CPU load will be over 85% .

Although our discussion topic is VDH, I pay more attention to "how to be more efficient for a streaming download user."
But the conclusion is embarrassing. If you are willing to manually obtain the streaming URL (such as m3u8) and manually manage the file name, it is most efficient to directly use FFMPEG.

In the previous discussion, I have a screenshot that mentioned how to verify that FFMPEG supports CUDA and confirm that CUDA has been installed.

As for how to confirm that the CPU load is transferred to the GPU, you can use the "task manager" of Windows.
You can notice that, generally speaking, unless you are watching a movie (including offline movie files), the GPU load is usually around 0%.

When I first started testing FFMPEG, I did it with "not using CUDA", and I also saw a "common result", that is, the CPU load increased significantly.
However, after using CUDA, you can see the GPU load is up to 20% or more (of course, I downloaded only about 10 streams), and the CPU load is below 5%.
So my preliminary estimate is that I can process 50 streams at the same time with the CPU, GPU, and memory alone,
but my internet bandwidth will be full, and it may even be rejected by the web service provider.

But this problem is not difficult to deal with, I can use the cloud host to set up a proxy to solve it, but this topic is out of the scope of our discussion.

All in all, because VDH currently has its own design of streaming processing methods, it also has its own performance costs.
As such a convenient software download, I think it is an acceptable cost.

But I have found that under the premise of "assuming you have specific hardware" and "the hardware is not very expensive",
the CPU load can be reduced and the computer can handle more things at the same time.
I think this research is attractive.

Wild Willy 在 2021年9月28日 星期二上午9:49:20 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 28, 2021, 12:35:16 AM9/28/21
to Video Download Helper Google Group
Ah yes. I see now where you did the -hwaccels thing. I believe we are not using the
same Windows. I'm on 7 & neither the Task Manager not the Resource Monitor shows GPU
usage. Are you on Windows 10? I like the improved Task Manager, if that's what you're
using.

OK. So you can monitor GPU usage. I'm curious to know if the GPU takes up some
processing load during download or during finalization. In my case, when I have multiple
VDH downloads going, the downloads proceed normally until one of them ends. It's during
the finalization that I see problems, and not every time. But if I see 2 downloads end
at about the same time, then I see problems every time with my system thrashing to death.
About the same time means the second download ends while the first one is still
finalizing. The first one finalizing causes much activity on the page file, and when the
second one ends, then they're both beating the page file to death. Since the VDH we have
today does not use ffmpeg to perform the downloads, I don't believe the CUDA parameter
has any effect during a download. That is, unless VDH invokes ffmpeg periodically under
the covers while a download is still in progress. During finalization of any HLS
download, I believe ffmpeg is invoked, and I would agree that CUDA could make a
difference then.

When Michel comes out with this new VDH that uses ffmpeg directly instead of his current
API invocations, then I would say CUDA might make a greater difference during the
download & not just during finalization. In my case, I see a problem with thrashing on
the page file when a single-threaded download completes & there is no finalization. You
don't get finalization when you are downloading non-HLS MP4s, or at least, that's my
understanding. The only thing that is happening in this case when the download ends is
the stuff VDH has been keeping in storage needs to be written to the downloaded file.
That's when it beats up the page file. The stuff "in storage" isn't in storage. It's on
the page file. And then when VDH discards the stuff after getting it from the page file
(which is all within the operating system, of course, because an application doesn't
explicitly request page file reads or writes), there's more page file activity as the
operating system removes the stuff from the page file. If VDH starts using ffmpeg
directly, I'm hoping this page file activity is greatly reduced if not eliminated. If
VDH is no longer keeping this stuff in storage, there won't be any need for the paging
activity. When I've done ffmpeg downloads, I haven't observed excessive page file
activity so I'm very optimistic.

When I talk about "finalization" I'm talking about what you can see in the VDH progress
status window, the blue dot menu. I have seen downloads go from the "downloading" state
to the "finalizing" state as reported in the VDH blue dot menu. I have also seen
downloads go from the "downloading" state to the "aggregating" state. If you watch the
blue dot menu carefully, you can see the word "finalizing" or "aggregating" in the blue
dot menu in the cases I'm talking about. But I have also seen downloads go from the
"downloading" state to simply completed, without going through either the "finalizing" or
"aggregating" states. It is this latter situation in which I have seen even
single-threaded downloads cause my system to come to its knees because VDH is causing
thrashing on the page file. At least, after 15 or 20 minutes of thrashing, my system has
always come back to life.

Maro Huang

unread,
Sep 28, 2021, 1:32:03 AM9/28/21
to Video DownloadHelper Q&A
I do use Windows 10, and I remember that GPU information was available about one to two years ago.
I didn't care about this information before, until I started to develop AI/ML with Python, GPU is very important.
This card of mine was purchased for the purpose of developing AI/ML.

Regarding the question you mentioned, indeed, this is exactly what I said "the downloaded stream will end together".
Usually in this case, the stored media file cannot be played, but the detailed reason why it cannot be played, I Did not study.

However, you have also used FFMPEG, even if you close it with "Ctrl+C", the media files can still be played normally, which is different from the situation of VDH.
As Michel said, VDH does not directly use FFMPEG to download, so the situation when the program is closed (or crashes?) is of course different from FFMPEG.

Judging from the situation that net.downloadhelper.coapp-win-64.exe sometimes takes up a large amount of memory,
multiple streaming downloads ended at the same time, which is probably related to coapp.
Maybe it’s not necessarily that the coapp is defective. I have encountered programs developed by Node.js before,
which become unstable and easy to crash when using a large amount of memory. So maybe it might be a problem with the Node.js compiler?
I have no idea.

After our series of discussions, another issue emerged.
According to the design of VDH, can the problem I raised earlier be solved by simply replacing FFMPEG?
Because I’m not sure about the architecture of VDH, I can only say not rigorously,
"Although I can’t prove it accurately, as a result, I can indeed download more streams at the same time with a single browser than in the past.
And it is more stable and not easy to collapse."

This may not have much to do with CUDA, but is mainly due to FFMPEG.
But for me, my current vision is not just the current VDH, or even the next generation of VDH.

Because I have made a simple batch, I only need to obtain the streaming URL through VDH, and then set the storage file name,
I can use the power of the GPU, so that my computer can handle the usual tasks and download a lot of streams at the same time.

do you know? It's like the hidden power being unearthed, and it makes me happy.
Wild Willy 在 2021年9月28日 星期二下午12:35:16 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 28, 2021, 11:21:28 PM9/28/21
to Video DownloadHelper Q&A
I still want to know how much GPU is being used during a download.  Do you see much GPU usage while data is being transmitted?  This is, of course, while doing a download with VDH.  There is a difference between the download & what happens after the last byte is received.  It may all look like VDH is "doing the download," but I am trying to make a distinction between the time during which the Internet connection is being used & the time after the Internet connection is no longer being used but VDH is still manipulating the downloaded file in some way.  I want to know when you see significant GPU usage.  Is it during the actual data transmission or after the data transmission stops?

I also still want to know about GPU usage when you supply the CUDA parameter value to ffmpeg compared to when you don't supply it, all while doing a download with VDH.  Your AI/ML project sounds interesting but it is off topic here.  I want to know about GPU load during VDH usage.  And I want to know whether the GPU gets used significantly during data transmission or after data transmission but while VDH is still processing the downloaded file.

I'm still trying to get at what benefit -hwaccel cuda gives.  Also, have you tried -hwaccel auto instead?  It seems to me that would be a better choice, if it makes any difference at all.

Have you tried simply swapping the standard ffmpeg into the CoApp directory without doing your bat-to-exe conversion?  What were your results of doing that?  That's what I've done & it seems to have no bad effect on VDH.  But then, maybe the ffmpeg in the CoApp directory is not even being used during any of the functions I've invoked.  This is still a question to which I have not had an answer.

Please directly address the questions I've asked.

Maro Huang

unread,
Sep 29, 2021, 1:12:21 AM9/29/21
to Video DownloadHelper Q&A
I understand what you want to know, but because the interaction between VDH  operation and GPU load needs to be tested in detail,
I need to plan the way of experimentation, so I can't give you the answer for the time being.

In a stream download process, there are many stages, such as receiving binary data, creating local files, stream encoding, data buffering and transcoding storage, etc.
If GPU acceleration is enabled in VDH's FFMPEG, it can bring benefits at those stages, which requires detailed testing.

As for other questions, I can give clear answers.

About "I'm still trying to get at what benefit -hwaccel cuda gives. Also, have you tried -hwaccel auto instead? It seems to me that would be a better choice, if it makes any difference at all."

The answer is No, I didn't try to use "-hwaccel auto".
Because I know I only have CUDA hardware acceleration, I don't trust FFMPEG's detection.

About "Have you tried simply swapping the standard ffmpeg into the CoApp directory without doing your bat-to-exe conversion? What were your results of doing that? That's what I've done & it seems to have no bad effect on VDH. But then, maybe the ffmpeg in the CoApp directory is not even being used during any of the functions I've invoked. This is still a question to which I have not had an answer."

The answer is Yes, and in my testing process, even if CUDA is not enabled,
the number of streaming downloads that can be performed by a single browser can be doubled and remain stable.

As the previous discussion said, this conclusion is not precise. This has nothing to do with the proportion of VDH that FFMPEG is responsible for.
Even if FFMPEG is only responsible for 1%, this part is the problem with the built-in FFMPEG of VDH. It makes sense to replace FFMPEG.

I don't have the ability to investigate in detail whether the problem of "multiple stream downloads terminated at the same time" we encountered was caused by Coapp, FFMPEG, or both.

Because VDH is not an open source code program, the cost of researching this topic is too high for me.
My personal idea is this. VDH can provide me with the function of getting the m3u8 URL conveniently.
Coupled with the batch I made by myself, I can already make great progress in downloading and streaming.

Then, I happily wait for the new version of VDH.

Wild Willy 在 2021年9月29日 星期三上午11:21:28 [UTC+8] 的信中寫道:

Maro Huang

unread,
Sep 29, 2021, 5:57:24 AM9/29/21
to Video DownloadHelper Q&A
I did a simple experiment to test the three settings of FFMPEG, CUDA hardware acceleration,
Auto hardware acceleration and no hardware acceleration.

Although it was a simple experiment, it was quite troublesome to do.
I set up 10 streaming downloads at the same time.

1.  No hardware acceleration
無硬體加速.png

2. CUDA hardware acceleration
啟用cuda.png

3. Auto hardware acceleration
啟用自動.png

My conclusion is as follows:

1. Enabling hardware acceleration can slightly reduce the CPU load, but perhaps it can make the CPU load more stable.
However, to my surprise, the load on the GPU didn't increase significantly.

2. The load status of hardware acceleration selection CUDA or Auto is similar,
so maybe FFMPEG has correctly set CUDA to accelerate.

Maybe downloading 10 streams at the same time is too easy for this graphics card :D

Maro Huang 在 2021年9月29日 星期三下午1:12:21 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 29, 2021, 8:20:06 AM9/29/21
to Video Download Helper Google Group
There is a clear benefit from -hwaccel but I don't think the GPU is giving that benefit.
Your GPU usage level is pretty constant at 4%-5% with and without -hwaccel. That could
just be the usual load required by having the computer on & displaying command windows &
the Task Manager. But without -hwaccel, your CPU is running at 23% while with -hwaccel,
your CPU is running at only 6% or 8%. That's a significant drop. Something is helping
out but it doesn't look like it's the GPU. I didn't believe the GPU would be picking up
anything after I discovered the comment I quoted in above from the ffmpeg documentation.
That comment leads me to believe that GPU hardware acceleration occurs only during video
playback. A download isn't a video playback. So the GPU wouldn't be accelerating
anything. Your results seem to confirm that. But it looks to me like something else is
stepping in. Some hardware is accelerating something but I don't know what. Do you?

The thing about ffmpeg is that it is a sort of Swiss Army knife for video processing. It
performs all kinds of functions. The ffmpeg package comes with something called ffplay.
This is, if I understand correctly, a video player that uses ffmpeg internally to play
back the video file. I have also seen mention somewhere that VLC uses ffmpeg. I have
used ffmpeg to modify an MP4 file on my system to delay the audio track so it synchs
better with the video track. I have used ffmpeg to consolidate several small MP4s into a
single larger MP4 so that I can play back 1 file instead of 4, 5, 6 files one after the
other. It is my understanding that the VDH function for merging an audio-only MP4 with a
video-only MP4 uses ffmpeg. And of course, I use ffmpeg to download the audio-only,
video-only, & subtitle streams from the Metropolitan Opera. But I know I'm only
scratching the surface with this short list. The various ffmpeg parameters are likely
applicable to certain of those functions & irrelevant to others. The problem with their
documentation is that it's hardly ever clear what parameters apply to what functions.
I'm beginning to believe -hwaccel means one thing when ffmpeg is downloading & something
else when it's playing back a video.

We assumed that -hwaccel would offload some CPU usage to the GPU. That appears to not be
the case. But your experiment shows that -hwaccel offloads some CPU usage to something.
Next time I download an opera, I'm going to add -hwaccel auto to the parameters for the
input file, like you showed early on in this thread. I have noticed that during an opera
download, ffmpeg causes the fans in my system to run at a steady, higher speed than is
their norm. I can hear them blowing throughout the download. Maybe adding the -hwaccel
switch will slow the fans down.

Maro Huang

unread,
Sep 29, 2021, 10:42:10 AM9/29/21
to Video DownloadHelper Q&A
Unfortunately, I do not agree with your argument.
Therefore, I extract a piece of performance information when "doing nothing".

We can see that more than half of the GPU time is zero load.

螢幕截圖 2021-09-29 21.44.51.png

In a computer system, only the CPU and GPU have computing power.
If the load of the CPU decreases but it does not go to the GPU, it is magic.

However, one thing is very important, that is, "I have never directly accessed the GPU, but through CUDA."
Therefore, if you refer to the previous screenshots,
I would guess that the graphics card driver detects that certain libraries are enabled,
which will pre-occupy part of the resources. As for whether CUDA has been involved at this time, I'm not sure.

There is a similar situation in the Java Runtime Environment, once it is awakened, some resources will be pre-occupied.
So I personally think that it may be that the GPU performance required for downloading 10 streams is still within the range of pre-occupied resources.

Another point mentioned that "GPU is only enabled during playback", which is not correct.
In fact, the GPU is a processor dedicated to calculating floating numbers.
In video editing software (such as Premiere and After Effects), 3D rendering software (Maya), etc.,
a large number of GPUs are used for visual synthesis or light and shadow simulation.

Even the AI/ML I am developing uses GPU to simulate the architecture of the human brain and nerves, and there is no output picture at all.

I have captured the GPU setting interface of Photoshop and Illustrator for your reference.
Although these two software have "rendered pictures", at least there is no "play video".

I think it should be possible to prove that there are more scenarios in which GPU acceleration can be used than you know.

螢幕擷取畫面 2021-09-29 215832.png

螢幕擷取畫面 2021-09-29 215945.png

Well, as for the speed of the fan, it is controlled by a sensor on the motherboard.
In other words, it just changes with the temperature of your CPU or motherboard.
If your fan slows down, there is only one reason, that is, the temperature drops. (But if it stops, it may be broken, just kidding)

Finally, I want to turn the topic to the next generation of VDH.
It seems ideal to add "-hwaccel auto" directly to Coapp,
but I'm not sure if there is no hardware acceleration driver supported at all.
Will there be a problem.

For example, if the user does not install DirectX or CUDA, etc.,
at this time, regardless of the GPU performance of its graphics card, it can't be used.

In short, can "-hwaccel auto" guarantee the situation that "hardware acceleration can't be used at all"?
Or it can only automatically select one of them when there is hardware acceleration.

Please forgive my laziness (or busy?), I don't want to do these detailed tests.
Of course, I'm looking forward to the next generation of VDH,
if GPU hardware acceleration can be added, it will be better.

But I now have a solution available,
so I think these tests can be handled by paid R&D staff.

Wild Willy 在 2021年9月29日 星期三下午8:20:06 [UTC+8] 的信中寫道:

Wild Willy

unread,
Sep 29, 2021, 4:25:00 PM9/29/21
to Video Download Helper Google Group
Yes. I agree. Your first image does prove the GPU is picking up some processing. It
appears that a mere 5% usage of the GPU results in about a 15% drop in CPU. Is the
processor in your graphics card even more powerful than your CPU?

I asked above whether -hwaccel auto would still allow ffmpeg to work when the hardware in
fact had no acceleration facilities present. That could be an important thing to know.
I agree that the paid developers need to figure that one out.

When it comes to fans blowing, yes, I know they are controlled by sensors on the
motherboard & their speed is variable. But they are a quick estimator of CPU usage. The
fans speed up when there's more heat around & that happens when the CPU gets busier, then
they slow down when the CPU is doing less. I have a case fan in the front of my case, a
fan in the power supply which is located at the back of my case, a fan glued to the CPU
chip, & I think (not sure, I'll have to look next time I open the case) there's a fan in
the NVidia card I added. I would say at least the case fan & the CPU fan react when the
CPU gets busier. It can get pretty windy in there.

As for the GPU being involved only during playback, I meant that in the context of ffmpeg
execution, of course. In the general case, I would not have made any statement since I
know even less about that than I do about ffmpeg, which itself isn't much. I don't speak
or read Chinese & I can't load the text in your 2nd & 3rd images in your latest post into
Google Translate so I have no idea what they are telling me. Text I can copy/paste into
Google Translate but it doesn't takes images. I tried that. When I highlighted the
image & did Ctrl+Insert, then went to Google Translate & did Shift+Insert, it copied the
URL of the image, not the text in the image nor even the image itself. Not very helpful.
Do please enlighten me. The first image has just enough text I can read that I can glean
the important bits of information from it, but the other 2 don't tell me anything.

Wild Willy

unread,
Sep 29, 2021, 4:39:39 PM9/29/21
to Video Download Helper Google Group
One other thing about hardware acceleration. The "weight" of the ffmpeg build is a
factor. When I did -hwaccels against the standard ffmpeg, I got this list: cuda, dxva2,
qsv, d3d11va, opencl, vulkan. When I did the same thing against the customized ffmpeg
supplied with the CoApp, I got only this list: dxva2, d3d11va. I don't know if the
acceleration methods in my copy of standard ffmpeg are all the methods ffmpeg can be
built to support. But clearly there's a consideration here as well.

Maro Huang

unread,
Sep 29, 2021, 9:18:22 PM9/29/21
to Video DownloadHelper Q&A
I'm glad we have a consensus. I'm sorry because my software interface is in Chinese.
I mark the model of the graphics card to express that even flat graphics software can be accelerated by GPU.

But I understand what you mean, the scope you defined is limited to FFMPEG.
Let me talk about my own views, but this is an overview, and the actual situation depends on the software design.
From receiving a stream to writing a file, there are two things that the GPU can handle:

1. Converting the binary data file returned by the stream into data that can be manipulated by the memory is called "decoding".
2. Convert the above data into the specified compression algorithm, that is, the file format,
which usually corresponds to a specific extension, called "encoding".

Of course, in theory, if the compression algorithm format of the stream is the same as that of the stored file,
it may be possible to save the decoding and encoding process, but I am not sure if this is really possible.

Although I have developed a streaming system in the past, it only limited to the integration of FFMPEG, VLC,
Media Server and Helix, etc., without the ability to handle the lower level.

As for the comparison between CPU and GPU, in a strict sense, they are not the same hardware.
The GPU is designed for specific functions, not like the CPU is a general-purpose calculator.

I think of an example. Assuming that a super sports car and a bus are racing together, and the driver is pushing the accelerator to the maximum,
for the purpose of "racing speed", the sports car will win, no doubt.

However, if the "number of people in the car" is in the race,
even if there are people sitting on the roof and hood of the sports car, the sports car is still far behind the bus.

Although both are "cars", they are designed for different purposes.
Just like CPU and GPU are both processors, but they are used for different purposes.

I think there is no doubt about the performance of GPU in specific tasks.
In addition to the examples I mentioned earlier, there are also applications in the blockchain and so on.

In these specific tasks, the performance of GPU seems to be much better than CPU,
you can say, because GPU is designed for these tasks.

Just like a sports car designed to be faster, or a bus designed to carry more people.

As for the hardware acceleration supported by CoApp's built-in FFMPEG is different from the standard version,
I think this is the "lightweight" method of CoApp.

Because the standard FFMPEG can reach 60~100MB, the total capacity of CoApp's FFMPEG plus library files is much smaller!

As you can imagine, the larger the program file, the more space it will take up after being loaded into the memory,
and the more likely it will affect the overall performance.
(Of course, this ruled out obvious problems in programming, which caused its own performance to be very poor)

So I am in favor of the current "custom FFMPEG" such as CoApp,
and even from the fact that the library is independent, I believe CoApp has made its FFMPEG "lightest".

It's just that CoApp is not only FFMPEG, so the system load is the sum of CoApp itself, browser subtasks and FFMPEG.

It is not fair to say that when I enable CUDA, I download 10 streams at the same time, which is almost no burden to the computer.

It is reasonable to estimate that even if 100 streams are downloaded at the same time, the computer will still operate smoothly,
but the network will be blocked.

But this is the price of "convenient interface". Just like the Linux that I use every day,
it can set up a website server only with the hardware required by Windows XP without starting Windows.

But the annoying thing about Linux is that it has to remember a lot of commands.

So for me, Linux and Windows do their jobs, as do sports cars and buses. Of course, the same is true for CPUs and GPUs.

Wild Willy 在 2021年9月30日 星期四上午4:39:39 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages