I've been working on mplayer support for crystalhd over the past few
weeks and I posted
my first reasonably functional patch to the mplayer list:
http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-December/067117.html
I've encountered a few problems, some of which I've been able to work around and
some of which are unavoidably troublesome:
1) PIB information returned on 70015 is dubious in many cases.
There is no timestamp returned even when I set valid values on ProcInput.
AspectRatio seems to be wrong - always 0 or 255.
Interlaced frame reporting is very strange at times. I've got multiple samples
where it returns 0x70 flags (implying UNKNOWN_SRC) and for a couple of them,
it seems that the source is a field pair but instead the TOP_FIELD flag is set
on successive frames, which completely messes up my handling.
On the 70012, these values seem to be much more believable.
2) It seems to take a really long time to get any frames out of the decoder.
I usually have to call ProcInput over 20 times before any frames come back
and simply waiting before ProcOutput doesn't help. Is this just how it is?
3) The 70012 doesn't seem to work very well with my code. it's always grumpy
and slow and can't keep up. But I've seen the gstreamer plugin work well before,
so I'm not sure what's up. Maybe a firmware regression?
I'd love to get a little more insight into these issues; maybe I'm
just doing something
wrong.
--phil
Welcome to the crystalhd club :)
>I've encountered a few problems, some of which I've been able to
>work around and
>some of which are unavoidably troublesome:
>
>1) PIB information returned on 70015 is dubious in many cases.
>
>There is no timestamp returned even when I set valid values on ProcInput.
>AspectRatio seems to be wrong - always 0 or 255.
>Interlaced frame reporting is very strange at times. I've got multiple samples
>where it returns 0x70 flags (implying UNKNOWN_SRC) and for a couple of them,
>it seems that the source is a field pair but instead the TOP_FIELD flag is set
>on successive frames, which completely messes up my handling.
>
>On the 70012, these values seem to be much more believable.
I've not seen this at all for 70015 regarding the input pts. It
passes what I give it and will re-order correctly if needed.
Interlace handlign has always baffled me and that's why xbmc code
returned to just line-doubling it.
>2) It seems to take a really long time to get any frames out of the decoder.
>I usually have to call ProcInput over 20 times before any frames come back
>and simply waiting before ProcOutput doesn't help. Is this just how it is?
Humm, don't see that with 70015 but 70012 will buffer a lot of
demuxer frames before returning a decoded picture. 70015 is much,
much faster at returning decoded frames. One thing I did add in XBMC
is to throttle the returning picture count or I would start consuming
mem like crazy.
>3) The 70012 doesn't seem to work very well with my code. it's always grumpy
>and slow and can't keep up. But I've seen the gstreamer plugin work
>well before,
>so I'm not sure what's up. Maybe a firmware regression?
70012 can be cranky, don't feed it enough and it does not like it,
feed it too fast, and it does not like it.
>I'd love to get a little more insight into these issues; maybe I'm
>just doing something
>wrong.
Looks like you are trying to run singled threaded without enabling
single threaded mode. I'd switch the crystalhd to single threaded
mode. That's the only way it will give you reasonable results when
running single threaded.
Also there are a few areas where the 70012 has to be treated
differently than the 70015. Check the current XBMC svn trunk.
We are currently supporting both versions of the crystalhd lib (old
and new) but now that Dharma is out the door, I will be striping the
old flavor and re-factoring to support some features in the new
library.
P.S. BC_MSUBTYPE_AVC1 is correct naming as the actual hardware only
accepts bytestream and it's only been resent that bit/byte stream
conversion was moved internal to libcrystalhd.
Scott
Thanks :-)
>
> I've not seen this at all for 70015 regarding the input pts. It passes what
> I give it and will re-order correctly if needed.
It's true - I don't know why this wasn't working for me before. I can
verify that
I get my pts back. yay!
> Interlace handlign has always baffled me and that's why xbmc code returned
> to just line-doubling it.
I think it's possible to make it work; after some of the clean up I did based on
Reimar's comments, I was able to get correctly timed interlaced mpeg2 to play -
until I started hitting input-buffer-full and then everything went south.
>
> Humm, don't see that with 70015 but 70012 will buffer a lot of demuxer
> frames before returning a decoded picture. 70015 is much, much faster at
> returning decoded frames. One thing I did add in XBMC is to throttle the
> returning picture count or I would start consuming mem like crazy.
Yes, the 70012 takes much longer than the 70015 but I'm definitely seeing
numbers around 20.
> 70012 can be cranky, don't feed it enough and it does not like it, feed it
> too fast, and it does not like it.
Joy. I'll have to play with this one more.
>
> Looks like you are trying to run singled threaded without enabling single
> threaded mode. I'd switch the crystalhd to single threaded mode. That's the
> only way it will give you reasonable results when running single threaded.
I'd like to know what single threaded mode does beyond forcing scaling for
content over 720p. You definitely can't do 1:1 bluray with a single threaded app
but you can also do better than 720p.
> Also there are a few areas where the 70012 has to be treated differently
> than the 70015. Check the current XBMC svn trunk.
>
> We are currently supporting both versions of the crystalhd lib (old and new)
> but now that Dharma is out the door, I will be striping the old flavor and
> re-factoring to support some features in the new library.
Yes, I've read over the code many times :-) I've noted the 12 vs 15
differences you
have but none of them would lead to the crappy performance I see; I'm
going to try
and make it work once I have the 15 going 100%.
> P.S. BC_MSUBTYPE_AVC1 is correct naming as the actual hardware only accepts
> bytestream and it's only been resent that bit/byte stream conversion was
> moved internal to libcrystalhd.
I think being strongly opinionated is a job requirement for ffmpeg
devs :-) I saw the internal
conversion in the lib when I last looked through it - as well as
seeing yours in xbmc for
older libs.
Thanks for the notes. I'm going to keep poking away. Do you know
what's going on with
enabling NV12 on the 70015? I saw some comments in the archive for
this list but that
was months ago.
--phil
humm, both gimli (Edgar Hucek) and I would like to see what you have
done there.
> > Looks like you are trying to run singled threaded without enabling single
>> threaded mode. I'd switch the crystalhd to single threaded mode. That's the
>> only way it will give you reasonable results when running single threaded.
>
>I'd like to know what single threaded mode does beyond forcing scaling for
>content over 720p. You definitely can't do 1:1 bluray with a single
>threaded app
>but you can also do better than 720p.
That's a bug. Some historical context. Single thread mode was added
to support Adobe Flash which is single threaded. The scaling (70015
only) was something to help out there as pushing around 720p is must
easer than 1080p :) If you poke about in libcrystalhd, you will see
where the scaling force is done, take that out.
>Yes, I've read over the code many times :-) I've noted the 12 vs 15
>differences you
>have but none of them would lead to the crappy performance I see; I'm
>going to try
>and make it work once I have the 15 going 100%.
Haha, it took many, many months to get that code working even with
the gstreamer source code. The one thing to remember is the crystalhd
is pipelined (more so on 70012) where ffmpeg is not. Once the pipe is
filled, crystalhd can outrun the CPU and deliver frames very soon
after demux input with 1 or 2 frame delay. It's the initial pipe
filling that can be troublesome.
>Thanks for the notes. I'm going to keep poking away. Do you know
>what's going on with
>enabling NV12 on the 70015? I saw some comments in the archive for
>this list but that
>was months ago.
the 70015 is native YUYV, outputting NV12 will incur a cpu conversion
to nv12, ouch if you can do it at the GPU level with a shader. 70012
is NV12 (and something else which escapes me right now) native. So
for best possible performance, you really need to handle both output
formats. Also XBMC is pretty much the only app that does the
DtsProcOutputNoCopy. We do that to try and minimize memcpys.
For more example code, check out Edgar's
http://sourceforge.net/apps/trac/archvdr , MythTV, and I think
there's some work going on with VLC. I highly doubt that this would
ever get into ffmpeg proper. The devs there are just too picky about
perfection and crystalhd while very good, it just does not work like
they would want it to work.
Also Naren (from Broadcom) subscribes to this list and should pop in
here at sometime soon.
Scott
I think it's possible to make it work; after some of the clean up I did based on
Reimar's comments, I was able to get correctly timed interlaced mpeg2 to play -
until I started hitting input-buffer-full and then everything went south.
humm, both gimli (Edgar Hucek) and I would like to see what you have done there.
> Looks like you are trying to run singled threaded without enabling single
threaded mode. I'd switch the crystalhd to single threaded mode. That's the
only way it will give you reasonable results when running single threaded.
I'd like to know what single threaded mode does beyond forcing scaling for
content over 720p. You definitely can't do 1:1 bluray with a single threaded app
but you can also do better than 720p.
That's a bug. Some historical context. Single thread mode was added to support Adobe Flash which is single threaded. The scaling (70015 only) was something to help out there as pushing around 720p is must easer than 1080p :) If you poke about in libcrystalhd, you will see where the scaling force is done, take that out.
Yes, I've read over the code many times :-) I've noted the 12 vs 15
differences you
have but none of them would lead to the crappy performance I see; I'm
going to try
and make it work once I have the 15 going 100%.
Haha, it took many, many months to get that code working even with the gstreamer source code. The one thing to remember is the crystalhd is pipelined (more so on 70012) where ffmpeg is not. Once the pipe is filled, crystalhd can outrun the CPU and deliver frames very soon after demux input with 1 or 2 frame delay. It's the initial pipe filling that can be troublesome.
Thanks for the notes. I'm going to keep poking away. Do you know
what's going on with
enabling NV12 on the 70015? I saw some comments in the archive for
this list but that
was months ago.
the 70015 is native YUYV, outputting NV12 will incur a cpu conversion to nv12, ouch if you can do it at the GPU level with a shader. 70012 is NV12 (and something else which escapes me right now) native. So for best possible performance, you really need to handle both output formats. Also XBMC is pretty much the only app that does the DtsProcOutputNoCopy. We do that to try and minimize memcpys.
For more example code, check out Edgar's http://sourceforge.net/apps/trac/archvdr , MythTV, and I think there's some work going on with VLC. I highly doubt that this would ever get into ffmpeg proper. The devs there are just too picky about perfection and crystalhd while very good, it just does not work like they would want it to work.
Also Naren (from Broadcom) subscribes to this list and should pop in here at sometime soon.
Scott
--
To post to this group, send email to
crystalhd-...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/crystalhd-development?hl=en
I'll post an updated diff shortly.
>
> That's a bug. Some historical context. Single thread mode was added to
> support Adobe Flash which is single threaded. The scaling (70015 only) was
> something to help out there as pushing around 720p is must easer than 1080p
> :) If you poke about in libcrystalhd, you will see where the scaling force
> is done, take that out.
I've turned on single threaded mode (do I need both the param flag and the
init flag?) and turned off scaling in the library but no immediate difference; I
still get input-full and no-frames-ready at the same places as the same times.
>
> Haha, it took many, many months to get that code working even with the
> gstreamer source code. The one thing to remember is the crystalhd is
> pipelined (more so on 70012) where ffmpeg is not. Once the pipe is filled,
> crystalhd can outrun the CPU and deliver frames very soon after demux input
> with 1 or 2 frame delay. It's the initial pipe filling that can be
> troublesome.
Yeah, although I think I've beaten mplayer into submission with respect to
filling the pipe - I had to jack up the number of 'unseen' frames that it allows
but it does work.
>
> the 70015 is native YUYV, outputting NV12 will incur a cpu conversion to
> nv12, ouch if you can do it at the GPU level with a shader. 70012 is NV12
> (and something else which escapes me right now) native. So for best possible
> performance, you really need to handle both output formats. Also XBMC is
> pretty much the only app that does the DtsProcOutputNoCopy. We do that to
> try and minimize memcpys.
I'm using NoCopy as well. In fact, I could never get the internal copy
to actually
avoid turning the frames into garbage, so I had no choice. Given that mplayer
and ffmpeg have pretty rich and performant colourspace conversions, I should
probably stick to NoCopy and let them sort the mess out. I must admit I don't
understand why the hardware is native YUYV - I assume the hardware only support
4:2:0 encoded content so it's pointlessly going to YUYV and wasting bandwidth.
> For more example code, check out Edgar's
> http://sourceforge.net/apps/trac/archvdr , MythTV, and I think there's some
> work going on with VLC. I highly doubt that this would ever get into ffmpeg
> proper. The devs there are just too picky about perfection and crystalhd
> while very good, it just does not work like they would want it to work.
I'll take a look at archvdr and myth - I've already seen the VLC patch
and it looks
more naive than mine. The initial response from the ffmpeg guys (admittedly on
the mplayer list) has not been terminal so I think there's hope.
> Also Naren (from Broadcom) subscribes to this list and should pop in here at
> sometime soon.
He has. Will reply to him too.
--phil
Unless I am very mistaken, this code isn't in Jarod's git repository;
I see code in the lib that says NV12 is not supported.
> I am on the road and will look at the code in the next couple of days.
>
> Interlaced support on linux was always a problem because of the lack of a standard way to signal it.
> But some of the issues being seen here are strange.
Yeah - it's definitely weird. It would be good to know what flags I
should expect to see for the various cases:
* mpeg2/PAFF
* MBAFF
* progressive fieldpairs
> In windows what we do is to connect to the renderer as both P and I and then to mark each individual sample as a frame or as interleaved field as appropriate.
It's going to really vary with application. ffmpeg/mplayer expects
whole frames, so I have to stitch the
fieldpair together myself. if the hardware can do that automatically,
it would be great to expose it.
> With single threaded mode of operation it is important for the application to check if there enough room in the HW before sending it data.
>
> As Scott mentioned once the pipeline fills up the hw can run many times faster than realtime. For example for 720p it can run as much as 6xRT. So the demux can run way faster than normal sw decoders. And if the input fills up you can't block on the demux since otherwise the output won't run.
>
> The only time you should see input buffer full is when the output is being pulled slower than it is being generated or if the bitrate of the input clip is very low and so the TX to the hw runs many times faster than RT.
Yeah. I've tried my best to do this - I check for space before
submitting input, but because mplayer doesn't support that error
condition,
I have to loop inside the decode call, waiting for space to free up or
I'll lose the input. Right now it loops and hopes for the best;
perhaps
I can optimise it by extracting a frame and then retrying the input -
that might reduce the amount of time I spend waiting.
And to go back to interlaced content. Right now I extract a field,
then hold it and return to mplayer saying that I don't have a full
frame yet - then it comes back with more input and I get the second
field. Would it be safe to try and extract both fields back to back
and then return the frame immediately? That would probably help avoid
input-full conditions.
My test case is some DVD video where it plays great for a few seconds
and then goes input-full and then lags and av sync goes out
the window.
Thanks!
--phil
Yes, using existing colorspace conversion code in FFmpeg is preferred.
> > For more example code, check out Edgar's
> > http://sourceforge.net/apps/trac/archvdr , MythTV, and I think there's some
> > work going on with VLC. I highly doubt that this would ever get into ffmpeg
> > proper. The devs there are just too picky about perfection and crystalhd
> > while very good, it just does not work like they would want it to work.
>
> I'll take a look at archvdr and myth - I've already seen the VLC patch
> and it looks
AFAIK MythTV support is more similar to XBMC
> more naive than mine. The initial response from the ffmpeg guys (admittedly on
> the mplayer list) has not been terminal so I think there's hope.
I haven't catched up on my mail over the holidays, but I'll help to get
crystalhd support into FFmpeg.
Janne
I guess that 5% is in addition to normal operation without colorspace
conversion? I'm seeing here 30-40% cpu utilization alone from Copying
the data from the internal DMA buffers on an Atom N270.
Janne
Hi Janne,
Much appreciated! On the FFmpeg side, the main challenge is getting it
work with ffmpeg and ffplay; these lack the buffered_pts concept that
Mplayer has so they can't cope with the pipelining properly.
On the crystalhd side, I still have some edge cases around pipeline
maintenance and interlacing. I've found a way to distinguish paff from
mbaff (checking the next frame's picture_number and if it's the same
as the current one, then we're dealing with a PAFF field - if not, it's
MBAFF). I'll push relevant changes to my github tree , hopefully tonight.
https://github.com/philipl/ffmpeg-crystalhd
--phil
Is this something that could be faked in the pipeline using fps to interpolate pts?
Naren Sankar (+1 408 218 6327)
Architect/PLM
Broadcom Corp.
-----Original Message-----
From: crystalhd-...@googlegroups.com [mailto:crystalhd-...@googlegroups.com] On Behalf Of Philip Langdale
Sent: Monday, January 03, 2011 9:08 AM
To: crystalhd-...@googlegroups.com
Subject: Re: [crystalhd-development] Issues encountered while writing mplayer support
Hi Janne,
https://github.com/philipl/ffmpeg-crystalhd
--phil
--
They have two mechanisms. One is proper pts from the container that
are passed through
the decoder. I can support this and it kind-of works. The other is to
fall back to dts, and
this assumes that the frame submitted for decode is the one returned.
This is obviously
not true for the crystalhd. Even the pts mechanism doesn't really work
because of the
same assumption - the ff programs freak out and go through a painful
sync process for
a few seconds before it settles down and works.
So, we really need to introduce pts buffering like Mplayer has.
--phil