Improved MPlayer: At the performance level of Omapfbplay

715 views
Skip to first unread message

Gregoire Gentil

unread,
Nov 19, 2008, 6:01:54 PM11/19/08
to beagl...@googlegroups.com, Måns Rullgård, Siarhei Siamashka, Koen Kooi
Hello,

Here are four files to put the performance of mplayer at the level of
omapfbplay on the beagleboard platform. It's a first pass, the code
needs refinement for bugs and the known issues. Feed-back is welcome,

Grégoire


/*

Copyright (C) 2008 Gregoire Gentil <greg...@gentil.com>
This file adds an optimized vo output to mplayer for the OMAP platform.
This is a first pass and an attempt to help to improve
media playing on the OMAP platform. The usual disclaimer comes here:
this code is provided without any warranty.
Many bugs and issues still exist. Feed-back is welcome.

This output uses the yuv420_to_yuv422 conversion from Mans Rullgard, and
is heavily inspired from the work of Siarhei Siamashka.
I would like to thank those two persons here, without them this code
would certainly not exist.

Two options of the output are available:
fb_overlay_only (disabled by default): only the overlay is drawn. X11
stuff is ignored.
dbl_buffer (disabled by default): add double buffering. Some tearsync
flags are probably missing in the code.

Syntax is the following:
mplayer -ao alsa -vo omapfb test.avi
mplayer -nosound -vo omapfb:fb_overlay_only:dbl_buffer test.avi

You need to have two planes on your system. On beagleboard, it means
something like: video=omapfb:vram:2M,vram:4M

Known issues:
1) A green line or some vertical lines (if mplayer decides to draw bands
instead of frame) may appear.
It's an interpolation bug in the color conversion that needs to be fixed

2) The color conversion accepts only 16-pixel multiple for width and
height.

3) The scaling down is disabled as the scaling down kernel patch for the
OMAP3 platform doesn't seem to work yet.

4) As the video is written to the upper plane, overlapping of window is
not working.
Currently, I disable the video in such case. Any suggestion of roadmap
would be appreciated.
DirectFB could be used though I have not understood if it's a
replacement of fbdev or an abstraction layer.

* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
USA
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/fb.h>

#include "config.h"
#include "video_out.h"
#include "video_out_internal.h"
#include "fastmemcpy.h"
#include "sub.h"
#include "mp_msg.h"

#include "omapfb.h"

#include "libswscale/swscale.h"
#include "libmpcodecs/vf_scale.h"
#include "libavcodec/avcodec.h"

#include "aspect.h"

#include "subopt-helper.h"

#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <X11/Xatom.h>
//#define XA_CARDINAL ((Atom) 6)
#include "wskeys.h"

static vo_info_t info = {
"omapfb video driver",
"omapfb",
"",
""
};

LIBVO_EXTERN(omapfb)

static int fb_overlay_only = 0; // if set, we need only framebuffer
overlay, but do not need any x11 code
static int dbl_buffer = 0;
static int fullscreen_flag = 0;
static int plane_ready = 0;

extern void yuv420_to_yuv422(uint8_t *yuv, uint8_t *y, uint8_t *u,
uint8_t *v, int w, int h, int yw, int cw, int dw);
static struct fb_var_screeninfo sinfo_p0;
static struct fb_var_screeninfo sinfo;
static struct omapfb_mem_info minfo;
static struct omapfb_plane_info pinfo;
static struct {
unsigned x;
unsigned y;
uint8_t *buf;
} fb_pages[2];
static int dev_fd = -1;
static int fb_page_flip = 0;
static int page = 0;
static void omapfb_update(int x, int y, int out_w, int out_h, int show);

extern void mplayer_put_key( int code );
#include "osdep/keycodes.h"

static Display *display = NULL; // pointer to X Display structure.
static int screen_num; // number of screen to place the window on.
static Window win = 0;
static Window parent = 0; // pointer to the newly created window.

/* This is used to intercept window closing requests. */
static Atom wm_delete_window;

/**
* Function to get the offset to be used when in windowed mode
* or when using -wid option
*/
static void x11_get_window_abs_position(Display *display, Window window,
int *wx, int *wy, int *ww,
int *wh)
{
Window root, parent;
Window *child;
unsigned int n_children;
XWindowAttributes attribs;

/* Get window attributes */
XGetWindowAttributes(display, window, &attribs);

/* Get relative position of given window */
*wx = attribs.x;
*wy = attribs.y;
if (ww)
*ww = attribs.width;
if (wh)
*wh = attribs.height;

/* Query window tree information */
XQueryTree(display, window, &root, &parent, &child, &n_children);
if (parent)
{
int x, y;
/* If we have a parent we must go there and discover his
position*/
x11_get_window_abs_position(display, parent, &x, &y, NULL, NULL);
*wx += x;
*wy += y;
}

/* If we had children, free it */
if(n_children)
XFree(child);
}


/**
* Function that controls fullscreen state for x11 window
* action = 1 (set fullscreen)
* action = 0 (set windowed mode)
*/
static void x11_set_fullscreen_state(Display *display, Window window,
int action)
{
XEvent xev;

/* init X event structure for _NET_WM_FULLSCREEN client msg */
xev.xclient.type = ClientMessage;
xev.xclient.serial = 0;
xev.xclient.send_event = True;
xev.xclient.message_type = XInternAtom(display, "_NET_WM_STATE",
False);
xev.xclient.window = window;
xev.xclient.format = 32;
xev.xclient.data.l[0] = action;
xev.xclient.data.l[1] = XInternAtom(display,
"_NET_WM_STATE_FULLSCREEN", False);
xev.xclient.data.l[2] = 0;
xev.xclient.data.l[3] = 0;
xev.xclient.data.l[4] = 0;

/* finally send that damn thing */
if (!XSendEvent(display, DefaultRootWindow(display), False,
SubstructureRedirectMask | SubstructureNotifyMask, &xev)) {
mp_msg(MSGT_VO, MSGL_ERR, "[omapfb] failure in
x11_set_fullscreen_state\n");
exit(1);
}
XSync(display, False);
}


XClassHint classhint = {"mediaplayer-ui", "mediaplayer-ui"};


/**
* Initialize x11 window (it is used to allocate some screen area for
framebuffer overlay)
*/
static void x11_init()
{
display = XOpenDisplay(getenv("DISPLAY"));
if (display == NULL) {
mp_msg(MSGT_VO, MSGL_ERR, "[omapfb] failure in x11_init, can't
open display\n");
exit(1);
}

screen_num = DefaultScreen(display);

if (WinID > 0)
{
Window root;
Window *child;
unsigned int n_children;

win = WinID;

/* Query window tree information */
XQueryTree(display, win, &root, &parent, &child, &n_children);
if (n_children)
XFree(child);

XUnmapWindow(display, win);
if (parent)
XSelectInput(display, parent, StructureNotifyMask);
XSelectInput(display, win, VisibilityChangeMask);
XMapWindow(display, win);

wm_delete_window = XInternAtom(display, "WM_DELETE_WINDOW",
False);
XSetWMProtocols(display, win, &wm_delete_window, 1);
} else {
win = XCreateSimpleWindow(display, RootWindow(display,
screen_num),
sinfo_p0.xres / 2 - sinfo.xres / 2,
sinfo_p0.yres / 2 - sinfo.yres / 2, sinfo.xres, sinfo.yres, 0,
WhitePixel(display, screen_num),
BlackPixel(display, screen_num));

XSetClassHint(display, win, &classhint);

XStoreName(display, win, "MPlayer");
XMapWindow(display, win);

/* Set WM_DELETE_WINDOW atom in WM_PROTOCOLS property (to get
window_delete requests). */
wm_delete_window = XInternAtom(display, "WM_DELETE_WINDOW",
False);
XSetWMProtocols(display, win, &wm_delete_window, 1);
XSelectInput(display, win, StructureNotifyMask |
VisibilityChangeMask | KeyPressMask);
}
}


void print_properties(Window win2)
{
Atom *p;
int num, j;
char *aname;
Atom type;
int format;
unsigned long nitems, bytes_after;
unsigned char *ret = NULL;

p = XListProperties(display, win2, &num);
printf("found %d properties for window %d\n", num, (int)win2);
for (j = 0; j < num; j++) {
aname = XGetAtomName(display, p[j]);
if (aname) {
if(Success == XGetWindowProperty(display, win2, XInternAtom(display,
aname, False),
0L, ~0L, False, XA_STRING,
&type, &format, &nitems,
&bytes_after, &ret))
{
/* printf("format = %d, nitems = %d, bytes_after = %d\n", format,
nitems, bytes_after);*/
printf("%s = %s\n", aname, ret);
XFree(ret);
}
XFree(aname);
} else printf("NULL\n");
}
XFree(p);
}


static int x11_check_events()
{
if (!display) {
mp_msg(MSGT_VO, MSGL_ERR, "[omapfb] 'x11_check_events' called
out of sequence\n");
exit(1);
}

int ret = 0;
XEvent Event;
while (XPending(display)) {
XNextEvent(display, &Event);
if (Event.type == UnmapNotify)
omapfb_update(0, 0, 0, 0, 0);
else if (Event.type == ConfigureNotify)
omapfb_update(0, 0, 0, 0, 1);
else if (Event.type == VisibilityNotify) {
if (Event.xvisibility.state == VisibilityUnobscured)
omapfb_update(0, 0, 0, 0, 1);
else
omapfb_update(0, 0, 0, 0, 0);
} else if (Event.type == KeyPress) {
int key;
KeySym keySym = XKeycodeToKeysym(display,
Event.xkey.keycode, 0);
key = ((keySym & 0xff00) != 0 ? ((keySym & 0x00ff) + 256) :
(keySym));
ret |= VO_EVENT_KEYPRESS;
vo_x11_putkey(key);
} else if (Event.type == ClientMessage) {
if ((Atom)Event.xclient.data.l[0] == wm_delete_window) {
mplayer_put_key(KEY_ESC);
}
}
}
return ret;
}


static void x11_uninit()
{
if (display) {
XCloseDisplay(display);
display = NULL;
}
}


/**
* Initialize framebuffer
*/
static int preinit(const char *arg)
{

opt_t subopts[] = {
{"fb_overlay_only", OPT_ARG_BOOL, &fb_overlay_only, NULL},
{"dbl_buffer", OPT_ARG_BOOL, &dbl_buffer, NULL},
{NULL}
};

if (subopt_parse(arg, subopts) != 0) {
mp_msg(MSGT_VO, MSGL_FATAL, "[omapfb] unknown suboptions: %s\n",
arg);
return -1;
}

dev_fd = open("/dev/fb0", O_RDWR);

if (dev_fd == -1) {
mp_msg(MSGT_VO, MSGL_FATAL, "[omapfb] Error /dev/fb0\n");
return -1;
}

ioctl(dev_fd, FBIOGET_VSCREENINFO, &sinfo_p0);
close(dev_fd);

dev_fd = open("/dev/fb1", O_RDWR);

if (dev_fd == -1) {
mp_msg(MSGT_VO, MSGL_FATAL, "[omapfb] Error /dev/fb1\n");
return -1;
}

ioctl(dev_fd, FBIOGET_VSCREENINFO, &sinfo);
ioctl(dev_fd, OMAPFB_QUERY_PLANE, &pinfo);
ioctl(dev_fd, OMAPFB_QUERY_MEM, &minfo);

if (!fb_overlay_only)
x11_init();

return 0;
}


static void omapfb_update(int x, int y, int out_w, int out_h, int show)
{
if (!fb_overlay_only)
x11_get_window_abs_position(display, win, &x, &y, &out_w,
&out_h);

if ((x < 0) || (y < 0)

// If you develop the right scaling-down patch in kernel, uncomment the
line below and comment the next one
// || (out_w < sinfo.xres / 4) || (out_h < sinfo.yres / 4)
|| (out_w < sinfo.xres) || (out_h < sinfo.yres)

// If you don't have the right scaling-up patch in kernel, comment the
line below and uncomment the next one
/* Kernel patch to enable scaling up on the omap3
--- a/drivers/video/omap/dispc.c 2008-11-01 20:08:04.000000000 -0700
+++ b/drivers/video/omap/dispc.c 2008-11-01 20:09:02.000000000 -0700
@@ -523,9 +523,6 @@
if ((unsigned)plane > OMAPFB_PLANE_NUM)
return -ENODEV;

- if (out_width != orig_width || out_height != orig_height)
- return -EINVAL;
-
enable_lcd_clocks(1);
if (orig_width < out_width) {
/*
*/
|| (out_w > sinfo.xres * 8) || (out_h > sinfo.yres * 8)
// || (out_w > sinfo.xres) || (out_h > sinfo.yres)

|| (x + out_w > sinfo_p0.xres) || (y + out_h > sinfo_p0.yres)) {
pinfo.enabled = 0;
pinfo.pos_x = 0;
pinfo.pos_y = 0;
ioctl(dev_fd, OMAPFB_SETUP_PLANE, &pinfo);
return;
}

pinfo.enabled = show;
pinfo.pos_x = x;
pinfo.pos_y = y;
pinfo.out_width = out_w;
pinfo.out_height = out_h;
ioctl(dev_fd, OMAPFB_SETUP_PLANE, &pinfo);
}


static int config(uint32_t width, uint32_t height, uint32_t d_width,
uint32_t d_height, uint32_t flags, char *title,
uint32_t format)
{
fullscreen_flag = flags & VOFLAG_FULLSCREEN;

uint8_t *fbmem;
int i;

fbmem = mmap(NULL, minfo.size, PROT_READ|PROT_WRITE, MAP_SHARED,
dev_fd, 0);
if (fbmem == MAP_FAILED) {
mp_msg(MSGT_VO, MSGL_FATAL, "[omapfb] Error mmap");
return -1;
}

for (i = 0; i < minfo.size / 4; i++)
((uint32_t*)fbmem)[i] = 0x80008000;

sinfo.xres = FFMIN(sinfo_p0.xres, width) & ~15;
sinfo.yres = FFMIN(sinfo_p0.yres, height) & ~15;
sinfo.xoffset = 0;
sinfo.yoffset = 0;
sinfo.nonstd = OMAPFB_COLOR_YUY422;

fb_pages[0].x = 0;
fb_pages[0].y = 0;
fb_pages[0].buf = fbmem;

if (dbl_buffer && minfo.size >= sinfo.xres * sinfo.yres * 2) {
sinfo.xres_virtual = sinfo.xres;
sinfo.yres_virtual = sinfo.yres * 2;
fb_pages[1].x = 0;
fb_pages[1].y = sinfo.yres;
fb_pages[1].buf = fbmem + sinfo.xres * sinfo.yres * 2;
fb_page_flip = 1;
} else {
sinfo.xres_virtual = sinfo.xres;
sinfo.yres_virtual = sinfo.yres;
fb_page_flip = 0;
}

ioctl(dev_fd, FBIOPUT_VSCREENINFO, &sinfo);

if (WinID <= 0) {
if (fullscreen_flag) {
if (!fb_overlay_only)
x11_set_fullscreen_state(display, win, 1);
omapfb_update(0, 0, sinfo_p0.xres, sinfo_p0.yres, 1);
} else {
if (!fb_overlay_only)
x11_set_fullscreen_state(display, win, 0);
omapfb_update(sinfo_p0.xres / 2 - sinfo.xres / 2,
sinfo_p0.yres / 2 - sinfo.yres / 2, sinfo.xres, sinfo.yres, 1);
}
}

plane_ready = 1;
return 0;
}


static void draw_alpha(int x0, int y0, int w, int h, unsigned char *src,
unsigned char *srca, int stride)
{
vo_draw_alpha_yuy2(w, h, src, srca, stride, fb_pages[page].buf +
sinfo.xres * y0 * 2 + x0 * 2, sinfo.xres);
}


static void draw_osd(void)
{
vo_draw_text(sinfo.xres, sinfo.yres, draw_alpha);
}


static int draw_frame(uint8_t *src[])
{
return 1;
}


static int draw_slice(uint8_t *src[], int stride[], int w, int h, int x,
int y)
{
if (x!=0)
return 0;

if (!plane_ready)
return 0;

ioctl(dev_fd, OMAPFB_SYNC_GFX);

yuv420_to_yuv422(fb_pages[page].buf + 2 * sinfo.xres * y, src[0],
src[1], src[2], w & ~15, h, stride[0], stride[1], 2 *
sinfo.xres_virtual);

return 0;
}


static void flip_page(void)
{
if (fb_page_flip) {
sinfo.xoffset = fb_pages[page].x;
sinfo.yoffset = fb_pages[page].y;
ioctl(dev_fd, FBIOPAN_DISPLAY, &sinfo);
page ^= fb_page_flip;
}
}


static int query_format(uint32_t format)
{
// For simplicity pretend that we can only do YV12, support for
// other formats can be added quite easily if/when needed
if (format != IMGFMT_YV12)
return 0;

return VFCAP_CSP_SUPPORTED | VFCAP_CSP_SUPPORTED_BY_HW | VFCAP_OSD |
VFCAP_SWSCALE | VFCAP_ACCEPT_STRIDE;
}


/**
* Uninitialize framebuffer
*/
static void uninit()
{
pinfo.enabled = 0;
ioctl(dev_fd, OMAPFB_SETUP_PLANE, &pinfo);
close(dev_fd);

if (!fb_overlay_only) x11_uninit();
}


static int control(uint32_t request, void *data, ...)
{
switch (request) {
case VOCTRL_QUERY_FORMAT:
return query_format(*((uint32_t*)data));
case VOCTRL_FULLSCREEN: {
if (WinID > 0) return VO_FALSE;
if (fullscreen_flag) {
if (!fb_overlay_only)
x11_set_fullscreen_state(display, win, 0);
fullscreen_flag = 0;
omapfb_update(sinfo_p0.xres / 2 - sinfo.xres / 2,
sinfo_p0.yres / 2 - sinfo.yres / 2, sinfo.xres, sinfo.yres, 1);
} else {
if (!fb_overlay_only)
x11_set_fullscreen_state(display, win, 1);
fullscreen_flag = 1;
omapfb_update(0, 0, sinfo_p0.xres, sinfo_p0.yres, 1);
}
return VO_TRUE;
}
}
return VO_NOTIMPL;
}


static void check_events(void)
{
if (!fb_overlay_only)
x11_check_events();
}


--- a/libvo/video_out.c 2008-11-07 11:59:48.000000000 -0800
+++ b/libvo/video_out.c 2008-11-07 12:01:52.000000000 -0800
@@ -86,6 +86,7 @@
extern vo_functions_t video_out_bl;
extern vo_functions_t video_out_fbdev;
extern vo_functions_t video_out_fbdev2;
+extern vo_functions_t video_out_omapfb;
extern vo_functions_t video_out_svga;
extern vo_functions_t video_out_png;
extern vo_functions_t video_out_ggi;
@@ -172,6 +173,7 @@
#ifdef CONFIG_FBDEV
&video_out_fbdev,
&video_out_fbdev2,
+ &video_out_omapfb,
#endif
#ifdef CONFIG_SVGALIB
&video_out_svga,
--- a/configure 2008-11-07 12:00:32.000000000 -0800
+++ b/configure 2008-11-07 12:13:31.000000000 -0800
@@ -4558,7 +4558,7 @@
fi
if test "$_fbdev" = yes ; then
_def_fbdev='#define CONFIG_FBDEV 1'
- _vosrc="$_vosrc vo_fbdev.c vo_fbdev2.c"
+ _vosrc="$_vosrc vo_fbdev.c vo_fbdev2.c vo_omapfb.c yuv.S"
_vomodules="fbdev $_vomodules"
else
_def_fbdev='#undef CONFIG_FBDEV'



/*
Copyright (C) 2008 Mans Rullgard

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use, copy,
modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
*/

.fpu neon
.text

@ yuv420_to_yuv422(uint8_t *yuv, uint8_t *y, uint8_t *u, uint8_t *v,
@ int w, int h, int yw, int cw, int dw)

#define yuv r0
#define y r1
#define u r2
#define v r3
#define w r4
#define h r5
#define yw r6
#define cw r7
#define dw r8

#define tyuv r9
#define ty r10
#define tu r11
#define tv r12
#define i lr

.global yuv420_to_yuv422
.func yuv420_to_yuv422
yuv420_to_yuv422:
push {r4-r11,lr}
add r4, sp, #36
ldm r4, {r4-r8}
dmb
1:
mov tu, u
mov tv, v
vld1.64 {d2}, [u,:64], cw @ u0
vld1.64 {d3}, [v,:64], cw @ v0
mov tyuv, yuv
mov ty, y
vzip.8 d2, d3 @ u0v0
mov i, #16
2:
pld [y, #64]
vld1.64 {d0, d1}, [y,:128], yw @ y0
pld [u, #64]
subs i, i, #4
vld1.64 {d6}, [u,:64], cw @ u2
pld [y, #64]
vld1.64 {d4, d5}, [y,:128], yw @ y1
pld [v, #64]
vld1.64 {d7}, [v,:64], cw @ v2
pld [y, #64]
vld1.64 {d16,d17}, [y,:128], yw @ y2
vzip.8 d6, d7 @ u2v2
pld [u, #64]
vld1.64 {d22}, [u,:64], cw @ u4
pld [v, #64]
vld1.64 {d23}, [v,:64], cw @ v4
pld [y, #64]
vld1.64 {d20,d21}, [y,:128], yw @ y3
vmov q9, q3 @ u2v2
vzip.8 d22, d23 @ u4v4
vrhadd.u8 q3, q1, q3 @ u1v1
vzip.8 q0, q1 @ y0u0y0v0
vmov q12, q11 @ u4v4
vzip.8 q2, q3 @ y1u1y1v1
vrhadd.u8 q11, q9, q11 @ u3v3
vst1.64 {d0-d3}, [yuv,:128], dw @ y0u0y0v0
vzip.8 q8, q9 @ y2u2y2v2
vst1.64 {d4-d7}, [yuv,:128], dw @ y1u1y1v1
vzip.8 q10, q11 @ y3u3y3v3
vst1.64 {d16-d19}, [yuv,:128], dw @ y2u2y2v2
vmov q1, q12
vst1.64 {d20-d23}, [yuv,:128], dw @ y3u3y3v3
bgt 2b

subs w, w, #16
add yuv, tyuv, #32
add y, ty, #16
add u, tu, #8
add v, tv, #8
bgt 1b

ldr w, [sp, #36]
subs h, h, #16
add yuv, yuv, dw, lsl #4
sub yuv, yuv, w, lsl #1
add y, y, yw, lsl #4
sub y, y, w
add u, u, cw, lsl #3
sub u, u, w, asr #1
add v, v, cw, lsl #3
sub v, v, w, asr #1
bgt 1b

pop {r4-r11,pc}
.endfunc


--- /OE/openembedded/packages/mplayer/mplayer_svn.bb 2008-11-16
02:15:45.000000000 -0800
+++ /OE/openembedded/packages/mplayer/mplayer_svn.bb 2008-11-16
14:32:19.000000000 -0800
@@ -2,7 +2,7 @@
SECTION = "multimedia"
PRIORITY = "optional"
HOMEPAGE = "http://www.mplayerhq.hu/"
-DEPENDS = "virtual/libsdl ffmpeg xsp zlib libpng jpeg freetype
fontconfig alsa-lib lzo ncurses libxv virtual/libx11 \
+DEPENDS = "virtual/libsdl ffmpeg xsp zlib libpng jpeg liba52 freetype
fontconfig alsa-lib lzo ncurses lame libxv virtual/libx11 linux-omap2 \
${@base_conditional('ENTERPRISE_DISTRO', '1', '', 'libmad
liba52 lame', d)}"

RDEPENDS = "mplayer-common"
@@ -24,6 +24,9 @@
file://mru-neon-vector-fmul.diff;patch=1 \
file://configh \
file://configmak \
+ file://omapfb.patch;patch=1 \
+ file://vo_omapfb.c \
+ file://yuv.S \
"

# This is required for the collie machine only as all stacks in that
@@ -176,8 +179,8 @@
EXTRA_OECONF_append_arm = " --disable-decoder=vorbis_decoder \
--disable-encoder=vorbis_encoder"

-EXTRA_OECONF_append_armv6 = " --enable-armv6 "
-EXTRA_OECONF_append_armv7a = "--enable-armv6 "
+EXTRA_OECONF_append_armv6 = " --enable-armv6"
+EXTRA_OECONF_append_armv7a = " --enable-armv6"


#build with support for the iwmmxt instruction and pxa270fb overlay
support (pxa270 and up)
@@ -201,6 +204,10 @@
sed -i 's|/usr/\S*include[\w/]*||g' ${S}/configure
sed -i 's|/usr/\S*lib[\w/]*||g' ${S}/configure

+ cp ${WORKDIR}/yuv.S ${S}/libvo
+ cp ${WORKDIR}/vo_omapfb.c ${S}/libvo
+ cp
${STAGING_DIR}/beagleboard-angstrom-linux-gnueabi/kernel/arch/arm/plat-omap/include/mach/omapfb.h ${S}/libvo/omapfb.h
+
./configure ${EXTRA_OECONF}

cat ${WORKDIR}/configh >> ${S}/config.h


mplayer_svn.bb.ph
yuv.S
vo_omapfb.c
omapfb.patch

Sean D'Epagnier

unread,
Nov 19, 2008, 7:29:36 PM11/19/08
to beagl...@googlegroups.com
Hi,


> 4) As the video is written to the upper plane, overlapping of window is
> not working.
> Currently, I disable the video in such case. Any suggestion of roadmap
> would be appreciated.

Can you write the video in a slower way instead for this case?

> DirectFB could be used though I have not understood if it's a
> replacement of fbdev or an abstraction layer.

If you write an omapfb driver for DirectFB. DirectFB typically uses
the regular framebuffer device, but for each framebuffer driver, it
can use the mmio space to perform hardware accelerated operations.

Sean

Måns Rullgård

unread,
Nov 19, 2008, 7:46:01 PM11/19/08
to beagl...@googlegroups.com
"Sean D'Epagnier" <geckos...@gmail.com> writes:

> Hi,
>
>> 4) As the video is written to the upper plane, overlapping of
>> window is not working. Currently, I disable the video in such
>> case. Any suggestion of roadmap would be appreciated.
>
> Can you write the video in a slower way instead for this case?

The hardware supports colour keying the overlay. The omapfb kernel
driver might even expose it. There's code there for it, but I haven't
tested it.

--
Måns Rullgård
ma...@mansr.com

Gregoire Gentil

unread,
Nov 20, 2008, 4:24:12 AM11/20/08
to beagl...@googlegroups.com, Måns Rullgård, Siarhei Siamashka, Koen Kooi
So as suggested by Mans, the easiest path could be to use the color key.
I see the get/set color key in the omap driver which is encouraging. The
problem is that the X11 events generated (Expose, Visibility...) are the
opposite of what we want. They are basically designed to give areas of
what becomes visible, not the opposite.

I'm sure that such problem has already been solved 10 times. Any
pointer? Any known example?

Another idea: would it work to switch the planes? fb0 for the video and
fb1 for the desktop. Then just set the color key, and draw it in the
mplayer X11 window. Would it make sense? How to tell X to use fb1?

Grégoire

Måns Rullgård

unread,
Nov 20, 2008, 4:30:15 AM11/20/08
to beagl...@googlegroups.com
Gregoire Gentil <greg...@gentil.com> writes:

>> 4) As the video is written to the upper plane, overlapping of window is
>> not working.
>> Currently, I disable the video in such case. Any suggestion of roadmap
>> would be appreciated.
> So as suggested by Mans, the easiest path could be to use the color key.
> I see the get/set color key in the omap driver which is encouraging. The
> problem is that the X11 events generated (Expose, Visibility...) are the
> opposite of what we want. They are basically designed to give areas of
> what becomes visible, not the opposite.

The overlay is only active where the graphics plane has the key
colour. Simply paint the entire video window with the colour key.

--
Måns Rullgård
ma...@mansr.com

Koen Kooi

unread,
Nov 20, 2008, 4:37:44 AM11/20/08
to greg...@gentil.com, Beagle Board, Måns Rullgård, Siarhei Siamashka, z...@iki.fi

Op 20 nov 2008, om 10:24 heeft Gregoire Gentil het volgende geschreven:

> So as suggested by Mans, the easiest path could be to use the color
key.
> I see the get/set color key in the omap driver which is encouraging.
> The
> problem is that the X11 events generated (Expose, Visibility...) are
> the
> opposite of what we want. They are basically designed to give areas of
> what becomes visible, not the opposite.
>
> I'm sure that such problem has already been solved 10 times. Any
> pointer? Any known example?
>
> Another idea: would it work to switch the planes? fb0 for the video
> and
> fb1 for the desktop. Then just set the color key, and draw it in the
> mplayer X11 window. Would it make sense? How to tell X to use fb1?

Kalle's XV driver already does that for you: http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=summary

and as installable binary:

http://www.angstrom-distribution.org/repo/?pkgname=xf86-video-omapfb

The only downside of that driver is that it's using a C based colour
conversion implementation:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD

This driver needs some fixes to omapfb (e.g. Tuomas' downscaling
patch) to work properly, and it has a small glitch when used with
DSS2. Any volunteers for adding the NEON colour conversion to this
driver?

regards,

Koen

PGP.sig

Kalle Vahlman

unread,
Nov 20, 2008, 4:51:05 AM11/20/08
to Koen Kooi, greg...@gentil.com, Beagle Board, Måns Rullgård, Siarhei Siamashka
2008/11/20 Koen Kooi <ko...@beagleboard.org>:

>
> Op 20 nov 2008, om 10:24 heeft Gregoire Gentil het volgende geschreven:
>
>> So as suggested by Mans, the easiest path could be to use the color key.
>>
>> I see the get/set color key in the omap driver which is encouraging. The
>> problem is that the X11 events generated (Expose, Visibility...) are the
>> opposite of what we want. They are basically designed to give areas of
>> what becomes visible, not the opposite.
>>
>> I'm sure that such problem has already been solved 10 times. Any
>> pointer? Any known example?
>>
>> Another idea: would it work to switch the planes? fb0 for the video and
>> fb1 for the desktop. Then just set the color key, and draw it in the
>> mplayer X11 window. Would it make sense? How to tell X to use fb1?
>
> Kalle's XV driver already does that for you:
> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=summary

I'll gladly integrate patches to implement the support for the
XV_COLORKEY attribute ;)

It's even defined already in the code and there's stubs for the set
and get calls:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/omapfb-xv.c;hb=HEAD#l62

So all it needs is to implement the ioctl call I suppose... But I
haven't really investigated that further.

> and as installable binary:
>
> http://www.angstrom-distribution.org/repo/?pkgname=xf86-video-omapfb
>
> The only downside of that driver is that it's using a C based colour
> conversion implementation:
>
> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
>
> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to
> work properly, and it has a small glitch when used with DSS2. Any volunteers
> for adding the NEON colour conversion to this driver?

Yes, please, that'd be awesome. :)

The C version is just a placeholder to verify correctness, there is
some effort done to get an optimized version in, but that's only for
ARMv6. NEON-enabled platforms would most certainly benefit from such
conversion.

--
Kalle Vahlman, z...@iki.fi
Powered by http://movial.fi
Interesting stuff at http://sandbox.movial.com
See also http://syslog.movial.fi

Gregoire Gentil

unread,
Nov 20, 2008, 11:53:45 AM11/20/08
to Koen Kooi, Beagle Board, Måns Rullgård, Siarhei Siamashka, z...@iki.fi
On Thu, 2008-11-20 at 10:37 +0100, Koen Kooi wrote:
> Op 20 nov 2008, om 10:24 heeft Gregoire Gentil het volgende geschreven:
>
> > So as suggested by Mans, the easiest path could be to use the color
> key.
> > I see the get/set color key in the omap driver which is encouraging.
> > The
> > problem is that the X11 events generated (Expose, Visibility...) are
> > the
> > opposite of what we want. They are basically designed to give areas of
> > what becomes visible, not the opposite.
> >
> > I'm sure that such problem has already been solved 10 times. Any
> > pointer? Any known example?
> >
> > Another idea: would it work to switch the planes? fb0 for the video
> > and
> > fb1 for the desktop. Then just set the color key, and draw it in the
> > mplayer X11 window. Would it make sense? How to tell X to use fb1?
>
> Kalle's XV driver already does that for you: http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=summary
>
> and as installable binary:
>
> http://www.angstrom-distribution.org/repo/?pkgname=xf86-video-omapfb
>
> The only downside of that driver is that it's using a C based colour
> conversion implementation:
>
> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
>
> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling
> patch)
Unfortunately, the Tuomas' patch doesn't work on Beagleboard as the
author (=Tuomas) reported it. And I can unfortunately confirm :-(

Grégoire

Gregoire Gentil

unread,
Nov 20, 2008, 12:02:09 PM11/20/08
to Siarhei Siamashka, z...@iki.fi, Koen Kooi, Beagle Board, Måns Rullgård
On Thu, 2008-11-20 at 16:55 +0200, Siarhei Siamashka wrote:
> On Thu, Nov 20, 2008 at 11:51 AM, Kalle Vahlman <kalle....@gmail.com> wrote:
> > 2008/11/20 Koen Kooi <ko...@beagleboard.org>:
> [...]

> >> The only downside of that driver is that it's using a C based colour
> >> conversion implementation:
> >>
> >> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
> >>
> >> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to
> >> work properly, and it has a small glitch when used with DSS2. Any volunteers
> >> for adding the NEON colour conversion to this driver?
> >
> > Yes, please, that'd be awesome. :)
> >
> > The C version is just a placeholder to verify correctness, there is
> > some effort done to get an optimized version in, but that's only for
> > ARMv6. NEON-enabled platforms would most certainly benefit from such
> > conversion.
>
> Just out of curiosity, is it an effort to port existing ARMv6
> optimized color conversion code from Xomap or somebody is going after
> a completely new implementation?
>
> Back to the subject. Having all the color conversion optimizations
> implemented in Xv and using it from MPlayer with direct rendering
> enabled (-dr option) theoretically might provide performance close to
> that of direct framebuffer access. But of course there might be a lot
> of issues to solve too.
I was aware of the XV branch. But Siarhei has a point and it's partly
the reason why I made some work on this fbdev front. The XV branch is a
great effort but before getting something stable and mature will take a
long time, while the framebuffer is already working for a while. And you
will never get the same memory consumption: X takes at least 10 to 20
more MB than Kdrive, which makes a difference on a 128MB system. Without
mentioning some missing features like rotation. On the other side, it's
true that the main advantage of XV is that you will get multiple videos
at the same time,

Grégoire


Måns Rullgård

unread,
Nov 20, 2008, 12:07:06 PM11/20/08
to greg...@gentil.com, Siarhei Siamashka, z...@iki.fi, Koen Kooi, Beagle Board, Måns Rullgård

XV will only give you as many simultaneous videos as you have overlays,
which in the case of the OMAP3 is two.

--
Måns Rullgård
ma...@mansr.com

Koen Kooi

unread,
Nov 20, 2008, 12:31:26 PM11/20/08
to greg...@gentil.com, Siarhei Siamashka, z...@iki.fi, Beagle Board, Måns Rullgård

I didn't see much difference (1 or 2 MiB) between Xorg and kdrive on
the beagle. It all depends on how you build X and libx11 :)

regards,

Koen

PGP.sig

Kalle Vahlman

unread,
Nov 20, 2008, 1:31:53 PM11/20/08
to Koen Kooi, greg...@gentil.com, Siarhei Siamashka, Beagle Board, Måns Rullgård
2008/11/20 Koen Kooi <ko...@beagleboard.org>:

>
> Op 20 nov 2008, om 18:02 heeft Gregoire Gentil het volgende geschreven:
>
>> On Thu, 2008-11-20 at 16:55 +0200, Siarhei Siamashka wrote:
>>>
>>> On Thu, Nov 20, 2008 at 11:51 AM, Kalle Vahlman <kalle....@gmail.com>
>>> wrote:
>>>>
>>>> 2008/11/20 Koen Kooi <ko...@beagleboard.org>:
>>>
>>> [...]
>>>>>
>>>>> The only downside of that driver is that it's using a C based colour
>>>>> conversion implementation:
>>>>>
>>>>>
>>>>> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
>>>>>
>>>>> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch)
>>>>> to
>>>>> work properly, and it has a small glitch when used with DSS2. Any
>>>>> volunteers
>>>>> for adding the NEON colour conversion to this driver?
>>>>
>>>> Yes, please, that'd be awesome. :)
>>>>
>>>> The C version is just a placeholder to verify correctness, there is
>>>> some effort done to get an optimized version in, but that's only for
>>>> ARMv6. NEON-enabled platforms would most certainly benefit from such
>>>> conversion.
>>>
>>> Just out of curiosity, is it an effort to port existing ARMv6
>>> optimized color conversion code from Xomap or somebody is going after
>>> a completely new implementation?

It's something new, but I'm not sure what the status is and whether
it'll realize any time soon... I gave the conversion routine in XOmap
(to the quirky YUV format of blizzard, this driver started out on
N800...) a test, but couldn't get it to work. The whole thing sounded
so silly I tried simply converting planar formats to one of the
supported packed formats and it didn't bog down the performance
completely, even when written in C. And that's what happens on beagle
too. I don't own one, nor do I have too much free time at the office
so the non-N800 side hasn't been that actively developed by me...

I got 512x288@24fps running smoothly on N800 and didn't really miss
the extra performance of 12bit planar format or optimized color
conversion so I left it at that for the time being... :)

>>> Back to the subject. Having all the color conversion optimizations
>>> implemented in Xv and using it from MPlayer with direct rendering
>>> enabled (-dr option) theoretically might provide performance close to
>>> that of direct framebuffer access. But of course there might be a lot
>>> of issues to solve too.
>>
>> I was aware of the XV branch. But Siarhei has a point and it's partly
>> the reason why I made some work on this fbdev front. The XV branch is a
>> great effort but before getting something stable and mature will take a
>> long time, while the framebuffer is already working for a while. And you
>> will never get the same memory consumption: X takes at least 10 to 20
>> more MB than Kdrive, which makes a difference on a 128MB system. Without
>> mentioning some missing features like rotation. On the other side, it's
>> true that the main advantage of XV is that you will get multiple videos
>> at the same time,
>
> I didn't see much difference (1 or 2 MiB) between Xorg and kdrive on the
> beagle. It all depends on how you build X and libx11 :)

Yeah, the "X.Org is bloated, use kdrive" argument has been moot for
some time now. This was recently discussed on the X.Org mailing list
and this comment should be convincing enough:

http://lists.freedesktop.org/archives/xorg/2008-October/039377.html

And for the record, xf86-video-omapfb is not a branch of anything,
it's a whole new driver for the OMAP framebuffer kernel driver. The
idea is to support some tricks that the basic fbdev driver doesn't.

The advantage of XV is that you don't have to optimize your *client*
software for a specific board (which naturally yields the optimal
solution), instead you optimize the driver. Thus instead of one
program working nicely, you have N programs working nicely for the
same effort. Don't get me wrong, using the framebuffer directly is
fine and dandy for a number of use cases, but if X is going to be
running, XV is the only decent way to interact with it really.

Gregoire Gentil

unread,
Nov 20, 2008, 3:11:13 PM11/20/08
to z...@iki.fi, Koen Kooi, Siarhei Siamashka, Beagle Board, Måns Rullgård

> The advantage of XV is that you don't have to optimize your *client*
> software for a specific board (which naturally yields the optimal
> solution), instead you optimize the driver. Thus instead of one
> program working nicely, you have N programs working nicely for the
> same effort. Don't get me wrong, using the framebuffer directly is
> fine and dandy for a number of use cases, but if X is going to be
> running, XV is the only decent way to interact with it really.
I'm not fully convinced on the performance issue but I do agree that XV
is much more polyvalent and powerful than fbdev.

In the mean time, find attached a version that adds color key and hence
fixes the overlap problem. Thanks to Mans for the hint!

It remains the problem of the border conversion interpolation, which is
common to both fbdev and xv,

Grégoire

vo_omapfb.c

Kalle Vahlman

unread,
Nov 21, 2008, 12:57:12 AM11/21/08
to greg...@gentil.com, Koen Kooi, Siarhei Siamashka, Beagle Board, Måns Rullgård
2008/11/20 Gregoire Gentil <greg...@gentil.com>:

>
>> The advantage of XV is that you don't have to optimize your *client*
>> software for a specific board (which naturally yields the optimal
>> solution), instead you optimize the driver. Thus instead of one
>> program working nicely, you have N programs working nicely for the
>> same effort. Don't get me wrong, using the framebuffer directly is
>> fine and dandy for a number of use cases, but if X is going to be
>> running, XV is the only decent way to interact with it really.
> I'm not fully convinced on the performance issue but I do agree that XV
> is much more polyvalent and powerful than fbdev.

Judging from the code you attached, and assuming I'm not totally
wrong, the only part where XV "needs" to be inferior is the data
transfer between the client and the server. And when XSHM is used,
that overhead is bound to be dwarfed by the decoding and color
conversion to a point of not mattering any more.

Koen Kooi

unread,
Nov 21, 2008, 5:54:46 AM11/21/08
to Siarhei Siamashka, z...@iki.fi, greg...@gentil.com, Beagle Board, "Måns Rullgård"

Op 21 nov 2008, om 11:37 heeft Siarhei Siamashka het volgende
geschreven:

> On Friday 21 November 2008, Kalle Vahlman wrote:
>> 2008/11/20 Gregoire Gentil <greg...@gentil.com>:
>>>> The advantage of XV is that you don't have to optimize your
>>>> *client*
>>>> software for a specific board (which naturally yields the optimal
>>>> solution), instead you optimize the driver. Thus instead of one
>>>> program working nicely, you have N programs working nicely for the
>>>> same effort. Don't get me wrong, using the framebuffer directly is
>>>> fine and dandy for a number of use cases, but if X is going to be
>>>> running, XV is the only decent way to interact with it really.
>>>
>>> I'm not fully convinced on the performance issue but I do agree
>>> that XV
>>> is much more polyvalent and powerful than fbdev.
>>
>> Judging from the code you attached, and assuming I'm not totally
>> wrong, the only part where XV "needs" to be inferior is the data
>> transfer between the client and the server. And when XSHM is used,
>> that overhead is bound to be dwarfed by the decoding and color
>> conversion to a point of not mattering any more.
>

> Still XV is harder to use in a video player to get good performance.
> The
> problem is mostly related to OSD and subtitles.
>
> With direct access to framebuffer, video decoding is very simple.
> The client
> just does color format conversion and then can easily draw subtitles
> over the
> image in the framebuffer.
>
> With XV everything gets more complex if we want to avoid any
> redundant memcpy
> operations to copy data around. The client needs to provide a ready
> frame
> with all the subtitles and OSD data drawn over it in a planar format
> to XV.
> But subtitles can be applied only to the frame which is already
> retired from
> video decoding pipeline and is not used as a reference frame for
> decoding next
> frames anymore. So if everything is implemented right, the frame is
> available
> with some delay which needs to be compensated and taken into
> account. As I
> mentioned before, this stuff is implemented in MPlayer using "direct
> rendering" method (-dr option), also see [1]. The problem is that
> the last
> time I checked it (admittedly long ago), direct rendering was not
> working well
> in MPlayer (including not making use of direct rendering for some
> codec/configuration combinations and rendering bugs with subtitles).
> Theoretically, everything should be fixable given enough efforts.
> But in
> practice it may be definitely more complex than just going with direct
> framebuffer rendering hacks :)

Can't the osd+subs be drawn into the second overlay?

regards,

Koen

PGP.sig

Måns Rullgård

unread,
Nov 21, 2008, 6:11:14 AM11/21/08
to Siarhei Siamashka, z...@iki.fi, greg...@gentil.com, Koen Kooi, Beagle Board, Måns Rullgård

Siarhei Siamashka wrote:
> On Friday 21 November 2008, Kalle Vahlman wrote:
>> 2008/11/20 Gregoire Gentil <greg...@gentil.com>:
>> >> The advantage of XV is that you don't have to optimize your *client*
>> >> software for a specific board (which naturally yields the optimal
>> >> solution), instead you optimize the driver. Thus instead of one
>> >> program working nicely, you have N programs working nicely for the
>> >> same effort. Don't get me wrong, using the framebuffer directly is
>> >> fine and dandy for a number of use cases, but if X is going to be
>> >> running, XV is the only decent way to interact with it really.
>> >
>> > I'm not fully convinced on the performance issue but I do agree that XV
>> > is much more polyvalent and powerful than fbdev.
>>
>> Judging from the code you attached, and assuming I'm not totally
>> wrong, the only part where XV "needs" to be inferior is the data
>> transfer between the client and the server. And when XSHM is used,
>> that overhead is bound to be dwarfed by the decoding and color
>> conversion to a point of not mattering any more.
>
> Still XV is harder to use in a video player to get good performance. The
> problem is mostly related to OSD and subtitles.

If the hardware supports the native output format of the decoder, and
there is enough video memory for all delayed frames (3 frames for MPEG2,
16 for H.264), XV imposes an additional copy of each frame from the SHM
segment into the actual video memory. If there is insufficient video
memory or if pixel format conversion is required, there is no reason for
XV to be less efficient than the application accessing the framebuffer
directly.

> With direct access to framebuffer, video decoding is very simple. The client
> just does color format conversion and then can easily draw subtitles over the
> image in the framebuffer.

Unless alpha blending of subtitles with the video frame is required, one
can simply draw the subtitle text directly in the X window used for video
and enable colour keying for the overlay.

> With XV everything gets more complex if we want to avoid any redundant memcpy
> operations to copy data around. The client needs to provide a ready frame
> with all the subtitles and OSD data drawn over it in a planar format to XV.
> But subtitles can be applied only to the frame which is already retired from
> video decoding pipeline and is not used as a reference frame for decoding next
> frames anymore. So if everything is implemented right, the frame is available
> with some delay which needs to be compensated and taken into account.

Any post-decode rendering into the video frames requires either an extra
copy or a delay. When the hardware support the codec-native pixel format
and sufficient video memory is present, decoding directly to video memory
and rendering subtitles after a delay is the most efficient. XV does not
allow this, unfortunately.

If pixel format conversion is required, the player can simply render the
subtitles into the video frames after a safe delay before passing them
to XV, which will do the format conversion while writing the frame to
video memory. No extra copy needed. The delayed subtitle rendering
is trivial to implement.

> As I mentioned before, this stuff is implemented in MPlayer using "direct
> rendering" method (-dr option), also see [1]. The problem is that the last
> time I checked it (admittedly long ago), direct rendering was not working well
> in MPlayer (including not making use of direct rendering for some
> codec/configuration combinations and rendering bugs with subtitles).
> Theoretically, everything should be fixable given enough efforts. But in
> practice it may be definitely more complex than just going with direct
> framebuffer rendering hacks :)

Am I misreading you, or is the above paragraph saying that XV is suboptimal
because mplayer has bugs when *not* using it?

--
Måns Rullgård
ma...@mansr.com

Siarhei Siamashka

unread,
Nov 21, 2008, 5:37:07 AM11/21/08
to z...@iki.fi, greg...@gentil.com, Koen Kooi, Beagle Board, Måns Rullgård
On Friday 21 November 2008, Kalle Vahlman wrote:
> 2008/11/20 Gregoire Gentil <greg...@gentil.com>:
> >> The advantage of XV is that you don't have to optimize your *client*
> >> software for a specific board (which naturally yields the optimal
> >> solution), instead you optimize the driver. Thus instead of one
> >> program working nicely, you have N programs working nicely for the
> >> same effort. Don't get me wrong, using the framebuffer directly is
> >> fine and dandy for a number of use cases, but if X is going to be
> >> running, XV is the only decent way to interact with it really.
> >
> > I'm not fully convinced on the performance issue but I do agree that XV
> > is much more polyvalent and powerful than fbdev.
>
> Judging from the code you attached, and assuming I'm not totally
> wrong, the only part where XV "needs" to be inferior is the data
> transfer between the client and the server. And when XSHM is used,
> that overhead is bound to be dwarfed by the decoding and color
> conversion to a point of not mattering any more.

Still XV is harder to use in a video player to get good performance. The


problem is mostly related to OSD and subtitles.

With direct access to framebuffer, video decoding is very simple. The client


just does color format conversion and then can easily draw subtitles over the
image in the framebuffer.

With XV everything gets more complex if we want to avoid any redundant memcpy


operations to copy data around. The client needs to provide a ready frame
with all the subtitles and OSD data drawn over it in a planar format to XV.
But subtitles can be applied only to the frame which is already retired from
video decoding pipeline and is not used as a reference frame for decoding next
frames anymore. So if everything is implemented right, the frame is available

with some delay which needs to be compensated and taken into account. As I


mentioned before, this stuff is implemented in MPlayer using "direct
rendering" method (-dr option), also see [1]. The problem is that the last
time I checked it (admittedly long ago), direct rendering was not working well
in MPlayer (including not making use of direct rendering for some
codec/configuration combinations and rendering bugs with subtitles).
Theoretically, everything should be fixable given enough efforts. But in
practice it may be definitely more complex than just going with direct
framebuffer rendering hacks :)

1. http://www.mplayerhq.hu/DOCS/tech/dr-methods.txt

--
Best regards,
Siarhei Siamashka

Siarhei Siamashka

unread,
Nov 20, 2008, 9:55:18 AM11/20/08
to z...@iki.fi, Koen Kooi, greg...@gentil.com, Beagle Board, Måns Rullgård
On Thu, Nov 20, 2008 at 11:51 AM, Kalle Vahlman <kalle....@gmail.com> wrote:
> 2008/11/20 Koen Kooi <ko...@beagleboard.org>:
[...]
>> The only downside of that driver is that it's using a C based colour
>> conversion implementation:
>>
>> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
>>
>> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to
>> work properly, and it has a small glitch when used with DSS2. Any volunteers
>> for adding the NEON colour conversion to this driver?
>
> Yes, please, that'd be awesome. :)
>
> The C version is just a placeholder to verify correctness, there is
> some effort done to get an optimized version in, but that's only for
> ARMv6. NEON-enabled platforms would most certainly benefit from such
> conversion.

Just out of curiosity, is it an effort to port existing ARMv6


optimized color conversion code from Xomap or somebody is going after
a completely new implementation?

Back to the subject. Having all the color conversion optimizations

Siarhei Siamashka

unread,
Nov 29, 2008, 5:21:39 PM11/29/08
to z...@iki.fi, Koen Kooi, greg...@gentil.com, Beagle Board, Måns Rullgård

This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel vs.
16-bit per pixel). Each video frame needs to be pushed to the external LCD
controller for Nokia N800/N8100 devices. The link to external LCD controller
(RFBI) is relatively slow and is one of the weak spots of these devices. It is
barely able to manage tear-free 800x480 screen updates for RGB565 and YUY2
color formats when running at top clock frequency. But for any screen updates,
RFBI bandwidth is a scarce resource, especially when using tearing
synchronization (as the driver occasionally needs to wait for the right moment
to push frame to LCD controller, keeping RFBI idle and reducing overall
efficiency of using its limited bandwidth).

But even without considering RFBI transfers, the color format conversion to
quirky YUV format is faster than conversion to YUY2 just because it needs
to write less data. That's why we went through the trouble of fixing support
for this quirky color format in Nokia 770 omapfb driver:
http://www.mail-archive.com/maemo-de...@maemo.org/msg09979.html
https://garage.maemo.org/tracker/index.php?func=detail&aid=881&group_id=164&atid=683

and it provided something like ~1-2% of *overall* video playback performance
improvement in MPlayer which is quite a good result (of course, the
performance improvement for only color conversion part is much more
impressive).

> and it didn't bog down the performance
> completely, even when written in C. And that's what happens on beagle
> too. I don't own one, nor do I have too much free time at the office
> so the non-N800 side hasn't been that actively developed by me...
>
> I got 512x288@24fps running smoothly on N800 and didn't really miss
> the extra performance of 12bit planar format or optimized color
> conversion so I left it at that for the time being... :)

The problem of C implementation is in heavy cpu usage. Even if it is able to
display static images in a synthetic test with a decent framerate, video
player is a bit different. Every cpu cycle spent in color format conversion
code is stolen from the video decoder. Low overhead in XV is important for any
practical use of it in video players. An old discussion about the color
conversion overhead with some benchmark numbers for N800 can be found here:
http://www.mail-archive.com/maemo-de...@maemo.org/msg09869.html
Nowadays the performance of ARMv6 optimized color format conversion is a lot
more modest because of the disabled hit-under-miss feature in order to
workaround 364296 ARM1136 r0pX errata (one can grep for this workaround in
N800 kernel sources) and nonworking software prefetch as a result - PLD
instructions are now practically useless. Nevertheless ARMv6 assembly still
outperforms C code quite significantly.

The same is of course true for beagleboard (so the above part is not a
complete offtopic). XV really needs to get NEON optimized color format
conversion code integrated. Otherwise it would remain completely
noncompetitive when compared to the media players which are using
direct framebuffer access with all the necessary optimizations.

Siarhei Siamashka

unread,
Nov 29, 2008, 6:00:58 PM11/29/08
to Måns Rullgård, z...@iki.fi, greg...@gentil.com, Koen Kooi, Beagle Board

The overall style of your reply seems a bit strange:

me: Implementing fast video output (on beagleboard) using XV is somewhat
harder than just using direct framebuffer access, because you need to
implement delayed subtitle rendering (handled by "direct rendering" in
MPlayer) in order to avoid extra data copies.
you: The delayed subtitle rendering is trivial to implement (plus some
additional useful details about delayed subtitle rendering).

Are you trying to question something? You have also taken an easy way with
omapfbplay instead of investing efforts in tweaking one of the full-fledged
media players to get it work well ;)

Thanks anyway for the additional details. Gregoire and Kalle may find all this
information useful and they are the ones who are *actually* working
on "Improved MPlayer: At the performance level of Omapfbplay" and XV for
beagleboard.

BTW, the comment about colour keying is quite interesting (though MPlayer is
normally using alpha blending for subtitles). It might be worth implementing
(as it is *really* trivial). Probably you should submit your idea to
mplayer-dev-eng mailing list.

> > As I mentioned before, this stuff is implemented in MPlayer using "direct
> > rendering" method (-dr option), also see [1]. The problem is that the
> > last time I checked it (admittedly long ago), direct rendering was not
> > working well in MPlayer (including not making use of direct rendering for
> > some codec/configuration combinations and rendering bugs with subtitles).
> > Theoretically, everything should be fixable given enough efforts. But in
> > practice it may be definitely more complex than just going with direct
> > framebuffer rendering hacks :)
>
> Am I misreading you, or is the above paragraph saying that XV is suboptimal
> because mplayer has bugs when *not* using it?

Yes, you are definitely misreading me.

Måns Rullgård

unread,
Nov 29, 2008, 8:05:05 PM11/29/08
to beagl...@googlegroups.com, Måns Rullgård, z...@iki.fi, greg...@gentil.com, Koen Kooi
Siarhei Siamashka <siarhei....@gmail.com> writes:

Yes, I am questioning your claim of XV being unsuitable for the Beagle
board because using it optimally would be difficult. XV *is*
inefficient on hardware supporting planar YUV, but not when a
conversion is necessary.

> You have also taken an easy way with omapfbplay instead of investing
> efforts in tweaking one of the full-fledged media players to get it
> work well ;)

I wrote omapfbplay purely for demo purposes. Furthermore, it achieves
better performance than is possible with mplayer due to the aggressive
buffering of decoded frames.

> Thanks anyway for the additional details. Gregoire and Kalle may
> find all this information useful and they are the ones who are
> *actually* working on "Improved MPlayer: At the performance level of
> Omapfbplay" and XV for beagleboard.

Why do you emphasise the word "actually"? Are you implying that I
ought to be doing more? Let me remind you that everything I do for
FFmpeg or the Beagle board is done in my spare time. I have no
obligations towards you or anybody else.

>> > As I mentioned before, this stuff is implemented in MPlayer using "direct
>> > rendering" method (-dr option), also see [1]. The problem is that the
>> > last time I checked it (admittedly long ago), direct rendering was not
>> > working well in MPlayer (including not making use of direct rendering for
>> > some codec/configuration combinations and rendering bugs with subtitles).
>> > Theoretically, everything should be fixable given enough efforts. But in
>> > practice it may be definitely more complex than just going with direct
>> > framebuffer rendering hacks :)
>>
>> Am I misreading you, or is the above paragraph saying that XV is suboptimal
>> because mplayer has bugs when *not* using it?
>
> Yes, you are definitely misreading me.

What *are* you saying?

--
Måns Rullgård
ma...@mansr.com

Kalle Vahlman

unread,
Nov 30, 2008, 4:10:35 AM11/30/08
to Siarhei Siamashka, Koen Kooi, greg...@gentil.com, Beagle Board, Måns Rullgård
2008/11/30 Siarhei Siamashka <siarhei....@gmail.com>:

> On Thursday 20 November 2008, Kalle Vahlman wrote:
>> It's something new, but I'm not sure what the status is and whether
>> it'll realize any time soon... I gave the conversion routine in XOmap
>> (to the quirky YUV format of blizzard, this driver started out on
>> N800...) a test, but couldn't get it to work. The whole thing sounded
>> so silly I tried simply converting planar formats to one of the
>> supported packed formats
>
> This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel vs.
> 16-bit per pixel).
[snip justification]

Yes, I do know and agree that the 12bit format is a definitive win :)

The thing I deemed as silly was all the comments about serious
problems using it and tricks to avoid that in Xomap and the
endianness-issue which makes the conversion code tricky to implement.
As said, I tried putting the conversion routine from Xomap there, but
couldn't get it to show meaningful picture. So I thought that instead
of spending time fixing that and subjecting the driver to all the
problems mentioned in Xomap, I'd just convert to the other supported
formats as a short-term solution.

>> and it didn't bog down the performance
>> completely, even when written in C. And that's what happens on beagle
>> too. I don't own one, nor do I have too much free time at the office
>> so the non-N800 side hasn't been that actively developed by me...
>>
>> I got 512x288@24fps running smoothly on N800 and didn't really miss
>> the extra performance of 12bit planar format or optimized color
>> conversion so I left it at that for the time being... :)
>
> The problem of C implementation is in heavy cpu usage. Even if it is able to
> display static images in a synthetic test with a decent framerate, video
> player is a bit different.

Indeed, and that's why I used gst-launch as a testing tool and
Elephants Dream as the source material. It actually worked out really
well since the first 60 seconds of the clip contain a slow pan and a
high velocity action bit (ie. showing errors smoothness and decoding
speed pretty clearly).

Now, I must admit that my "smooth" might not be the "smooth" of a True
HiFist ;) but it should be ok for mundane people like me.

> The same is of course true for beagleboard (so the above part is not a
> complete offtopic). XV really needs to get NEON optimized color format
> conversion code integrated.

This I agree with and in no point I meant to indicate that the C
implementation would be anything but a correctness test. IIRC it
wasn't fast enough for the Big Buck Bunny clip (480p) on beagle.

> Otherwise it would remain completely
> noncompetitive when compared to the media players which are using
> direct framebuffer access with all the necessary optimizations.

This I don't agree with, since the competition (for me) isn't only
about getting the absolute best framerate. It's also about integration
and code reusability. It all depends on what your goals are, of
course.

Siarhei Siamashka

unread,
Nov 30, 2008, 4:46:35 PM11/30/08
to z...@iki.fi, Koen Kooi, greg...@gentil.com, Beagle Board, Måns Rullgård
On Sunday 30 November 2008, Kalle Vahlman wrote:
> 2008/11/30 Siarhei Siamashka <siarhei....@gmail.com>:
> > On Thursday 20 November 2008, Kalle Vahlman wrote:
> >> It's something new, but I'm not sure what the status is and whether
> >> it'll realize any time soon... I gave the conversion routine in XOmap
> >> (to the quirky YUV format of blizzard, this driver started out on
> >> N800...) a test, but couldn't get it to work. The whole thing sounded
> >> so silly I tried simply converting planar formats to one of the
> >> supported packed formats
> >
> > This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel
> > vs. 16-bit per pixel).
>
> [snip justification]
>
> Yes, I do know and agree that the 12bit format is a definitive win :)
>
> The thing I deemed as silly was all the comments about serious
> problems using it and tricks to avoid that in Xomap and the
> endianness-issue which makes the conversion code tricky to implement.
> As said, I tried putting the conversion routine from Xomap there, but
> couldn't get it to show meaningful picture. So I thought that instead
> of spending time fixing that and subjecting the driver to all the
> problems mentioned in Xomap, I'd just convert to the other supported
> formats as a short-term solution.

Yes, it's a bit tricky to get this quirky YUV format working right. N800 has
omap display controller and external LCD controller chained. Scaling and
YUV->RGB conversion for video can be done on either of them. But this quirky
format is supported on external LCD controller only, which introduces a bit of
difficulties. Everything is fine while the video overlay is unobscured. But in
order to display anything over it (battery status notification for example),
video overlay gets migrated to omap display controller and starts using YUY2
format. You can have a close look at video overlay migration code from Xomap.

In any case, I can understand that you don't see this task as high priority.

[...]

> > The same is of course true for beagleboard (so the above part is not a
> > complete offtopic). XV really needs to get NEON optimized color format
> > conversion code integrated.
>
> This I agree with and in no point I meant to indicate that the C
> implementation would be anything but a correctness test. IIRC it
> wasn't fast enough for the Big Buck Bunny clip (480p) on beagle.
>
> > Otherwise it would remain completely
> > noncompetitive when compared to the media players which are using
> > direct framebuffer access with all the necessary optimizations.
>
> This I don't agree with, since the competition (for me) isn't only
> about getting the absolute best framerate. It's also about integration
> and code reusability. It all depends on what your goals are, of
> course.

I don't see a disagreement here. Everybody understands that the color format
conversion optimization needs to be added to XV eventually. The only question
is about the priority of this task.

I was just getting an impression that you are a bit underestimating the
importance of having this optimization. Surely it is quite reasonable to
get the code working correctly first and applying optimizations to it a
bit later. But this color format conversion code could be the last straw
preventing smooth playback of some heavy video. And the players using direct
framebuffer access (MPlayer patch discussed in this thread) will have a
clear advantage over anything else using nonoptimized XV. That's what I mean
by it being noncompetitive.

You can and probably should aim for both integration/reusability/whatever and
low cpu usage (and that's not difficult actually as the NEON optimized color
format conversion code already exists)

Siarhei Siamashka

unread,
Nov 30, 2008, 6:37:48 PM11/30/08
to beagl...@googlegroups.com, Måns Rullgård, z...@iki.fi, greg...@gentil.com, Koen Kooi
On Sunday 30 November 2008, Måns Rullgård wrote:
> Siarhei Siamashka <siarhei....@gmail.com> writes:
> > On Friday 21 November 2008, Måns Rullgård wrote:
> >> Siarhei Siamashka wrote:
[...]

> > The overall style of your reply seems a bit strange:
> >
> > me: Implementing fast video output (on beagleboard) using XV is somewhat
> > harder than just using direct framebuffer access, because you need to
> > implement delayed subtitle rendering (handled by "direct rendering" in
> > MPlayer) in order to avoid extra data copies.
> > you: The delayed subtitle rendering is trivial to implement (plus some
> > additional useful details about delayed subtitle rendering).
> >
> > Are you trying to question something?
>
> Yes, I am questioning your claim of XV being unsuitable for the Beagle
> board because using it optimally would be difficult.

Which claim? Please provide a relevant quote (the one where I supposedly say
something about XV being "unsuitable", "impossible" or whatever) or stop
trolling.

I only mentioned that the use of XV (in beagleboard port of MPlayer as implied
by this topic) in an efficient way is more complex than just using
framebuffer, but surely possible. And I'm well aware of what needs to be done
in order to achieve this. That's all.

> XV *is* inefficient on hardware supporting planar YUV, but not when a
> conversion is necessary.

It's not a news for me for sure. Just for your information, it was me who
mentioned this fact about XV performance first in this thread:
http://groups.google.com/group/beagleboard/msg/571cae7993c95f6a?hl=en

> > You have also taken an easy way with omapfbplay instead of investing
> > efforts in tweaking one of the full-fledged media players to get it
> > work well ;)
>
> I wrote omapfbplay purely for demo purposes.

That is what I call "taking an easy way". Integrating such stuff in a real
media player (MPlayer) is a bit more complex practical task, as one needs
to work with an arguably messy codebase, solve the technical issue itself and
come through the real challenge of having to please the maintainers.

> Furthermore, it achieves
> better performance than is possible with mplayer due to the aggressive
> buffering of decoded frames.

Strictly speaking, this is not quite correct. First, it does not provide
better performance on average. Whenever video decoding in MPlayer is late
(can't keep synchronized with audio), it tries to catch up having no delay
between frames and fully utilizing cpu. If video is too late and exceeds a
certain limit, framedropping comes into action. So what you get is a more
consistent framerate and better a/v sync, but not exactly performance.

Second, regarding this being impossible with MPlayer. See
http://mplayerxp.sourceforge.net/ (more specifically, you can look
into their FAQ, "Howto to improve quality of playback with MPlayerXP"
section). Just because MPlayer core developers are generally as "friendly"
as FFmpeg ones, none of such improvements got into official MPlayer tree
though.

I also considered trying to use this MPlayer fork and build a package based on
it for maemo long ago, but did not find it worth the efforts in the end
(but maybe it was a good idea after all).

> > Thanks anyway for the additional details. Gregoire and Kalle may
> > find all this information useful and they are the ones who are
> > *actually* working on "Improved MPlayer: At the performance level of
> > Omapfbplay" and XV for beagleboard.
>
> Why do you emphasise the word "actually"?

You see, Gregoire started this thread here. Probably he considered us both
as experts in the area of video and multimedia for OMAP devices and added
us to CC with the hope that we may add some useful comments.

You are apparently not happy for some reason and try really hard
to "misunderstand" me, though I don't see any contradiction regarding
"delayed subtitles rendering". You know, the core FFmpeg developers
are not the bearers of the sacred knowledge or something. We, mere
mortals, can understand MPlayer and FFmpeg code pretty well too. That
was sarcasm by the way.

Nevertheless, whatever you reply to me, try to argue or flame is pointless,
because I'm not working on the beagleboard MPlayer or XV myself.

> Are you implying that I ought to be doing more?

Not really.

> Let me remind you that everything I do for FFmpeg or the Beagle board is
> done in my spare time. I have no obligations towards you or anybody else.

So what? Whatever I post or contribute using my private gmail address is also


done in my spare time.

And it's not quite relevant to this discussion, but I dare to remind that you
actually *do* have some obligations now as ARM port maintainer of FFmpeg,
that's the responsibility you have voluntarily taken upon yourself not so long
ago...

> >> > As I mentioned before, this stuff is implemented in MPlayer using
> >> > "direct rendering" method (-dr option), also see [1]. The problem is
> >> > that the last time I checked it (admittedly long ago), direct
> >> > rendering was not working well in MPlayer (including not making use of
> >> > direct rendering for some codec/configuration combinations and
> >> > rendering bugs with subtitles). Theoretically, everything should be
> >> > fixable given enough efforts. But in practice it may be definitely
> >> > more complex than just going with direct framebuffer rendering hacks
> >> > :)
> >>
> >> Am I misreading you, or is the above paragraph saying that XV is
> >> suboptimal because mplayer has bugs when *not* using it?
> >
> > Yes, you are definitely misreading me.
>
> What *are* you saying?

Just try to read the following, paying a bit more attention:

1. MPlayer implements "direct rendering" method, which is specifically used to
avoid excessive data copies to improve performance, it is expected to
provide delayed subtitles rendering.
2. The implementation of "direct rendering" in MPlayer is not very good, and
it is even not enabled by default.
3. More specifically, the first problem is that sometimes "direct rendering"
is internally disabled and naturally does not provide expected speedup (when
used with H264 for example)
4. The other problem is that even when "direct rendering" works, it may
sometimes corrupt subtitles or OSD. I just checked and don't see it anymore in
the latest MPlayer (it was a lot worse 1 or 2 years ago). Anyway, this problem
is still mentioned in MPlayer man page.
5. It is definitely possible to fix "direct rendering" in MPlayer, but people
seem prefer to prefer direct framebuffer access hacks as they are much easier
to implement.

Also don't forget the subject of this topic (that's why mplayer bugs are
relevant here). Anything else you want to know?

PS. If you really want to flame, it is better to continue this on IRC :)

Reply all
Reply to author
Forward
0 new messages