thanks for your answers!
i made some progress.
just fyi: I got an ubuntu vm running for compiling ffmpeg (3.4 from
http://ffmpeg.org/releases/) and i used the android x86 config files (config.mak) and just ran make.
sadly i was not able to build the default ffmpeg for android x86. (note: i could not find a way to point to libva and vaapi out of the ffmpeg folder so i just placed both inside for build.)
so finally it built, but i ran into link errors trying to use the ffmpeg libs in my app which i could not get fixed.
so i gave up on this tried a few other things.
first i changed SoftRenderer to show me every frame, this is what causes the video to stutter and stop (NuPlayerRenderer sets Int32("render") to 0 if the frame is too late)
obiousily this did not help, as the frames are just too late to be rendered
second thought was to update ffmpeg to a newer version - and i found this repo:
this offers alot but no vaapi libva support.
so i cloned it, built it with the newest ndk and changed my demo app do use this repo at ffmpeg 4.3.1.
first run kinda shocked me as the frames took arround 300ms for 2160p. i did some research and as metioned alot of the methods used are deprecated by now.
but the main point seemed to be that thread_count default value is set to 1.
so i changed it to 4 which resulted my demo app to get me a frame each ~90ms instead of 300ms+.
back to default android x86 SoftFFmpegVideo.cpp i checked the threads aswell.
to do this i got to SoftFFmpegVideo.cpp and logged with ALOGI("mCtx thread_count %i", mCtx->thread_count); before the context was opened with avcodec_open2.
and my log pointed out that thread_count always returned 1!
so i just changed this to mCtx->thread_count = 4; one line before opening the ctx. (which is the core count of my CPU)
first try: this change did not help at all, as hw acceleration still is only using about 60% cpu ....
so i tried to disabled it by returning 0 in ffmpeg_hwaccel_init and use soft decoding.
the result was surprising to me: the change worked and my cpu is now at 300%: but my 2160p video is running fluid!
some of my test videos still have peaks up to 60ms per frame and rendering is sometimes at 30ms but most of the time one frame just takes <1ms to be decoded!
even the 1080p video is dropped from 10ms per frame to <1ms...
im not sure whats behind all this but seems that using hwaccel slows things down compared to using all cpu cores available.
maybe there is a way to use all threads and hardware acceleration?