[A20] sunxi framebuffer overlay help needed

1,336 views
Skip to first unread message

Ivan Kozic

unread,
Mar 24, 2014, 10:00:39 AM3/24/14
to linux...@googlegroups.com
Hi all,

Up to now, I have successfully debugged and fixed CSI issues in 3.4 kernel so that it works more-less closer to the spec of sun7i (driver is only sun4i compatible by default - for more advanced features, you'll need some changes in the code). For more info, you can visit:

https://groups.google.com/forum/#!searchin/linux-sunxi/A20$20csi/linux-sunxi/vU5-3Pc3iOs/aVpmpfb1FkAJ

This is all for A20 or sun7i (as I have Olinuxino A20).

Right now I'd need some help regarding overlay framebuffer - my initial idea was to have a full screen video, while having a small functional GUI (more like a widget) on the overlay channel to use for controls (this would be done using Qt).
However, this seems to be much harder than on i.MX6 for instance (I have previous experience with i.MX6), mainly because Freescale is using 2 separate framebuffers for one screen. So to sum up:
1. fb0 is BG (video for instance),
2. fb1 is FG (overlay, ideal for GUI).

When I look at the HW layout (especially page 414 of the A20 user manual), I see that the Allwinner's intent was to make something similar, as DEBE does the mixing of the overlay/background. However, the driver does not seem to have such an option (or I am not familiar with this).

At the end, I can make video show up in Qt - this is not a huge problem (I have taken libv4l2 made for Qt and I get the output) - the problem is that it's painfully slow (like barely 3fps), as it doesn't use HW mixer - it only copies the data from V4L2 buffers into the userland and into QImage object, which is quite slow.

So the way it would have worked on i.MX6 is that video is simply driven into fb0 via small library using DMA, while Qt would be configured to use fb1 only - I'm after something like this on A20.
Maybe I'm missing something obvious here, but still I couldn't find a good solution up to now.

Of course, if I find something out, I'll post back.

All help greatly appreciated!

Luc Verhaegen

unread,
Mar 24, 2014, 10:17:42 AM3/24/14
to linux...@googlegroups.com
You only need DEFE for your CSI captured information, as that is in some
non RGB colour format.

You can happily attach RGB32 to any layer directly, and achieve your
goal that way.

Luc Verhaegen.


Ivan Kozic

unread,
Mar 25, 2014, 4:55:40 AM3/25/14
to linux...@googlegroups.com
Hi Luc and thanks for replying,

Not sure I follow - I went deeper into the Qt structure yesterday. Basically, Qt uses just a normal linux fb access (opens /dev/fb0 directly), while my current no-GUI application (only used to display video from CSI) is using more "advanced" way - it opens /dev/disp first and then requests a layer from it, eventually opening /dev/fb just to execute FBIOGET_LAYER_HDL_0 ioctl and then closes it. Afterwards, I just have an endless loop in the program in which buffers from V4L2 exchange addresses with buffers from display.

To my understanding (I'm a bit fresh with all this), Qt should actually also open /dev/disp and request a GUI layer (think it's called YUV layer in the user manual for A20) for it, while my underlying V4L2 library should do the same, but only requesting video layer instead of a GUI layer. This way, underlying lib would do the video and provide controls, while overlay would be in a different layer providing GUI which is linked with the controls. Is this true?

If so, there is no easy way to do it, as I would have to implement a different display driver for Qt which would use layers instead of stupidly opening /dev/fb0 (this is quite some work) + update my underlying library to actually use display, again with layering. Just saying - compared to Freescale kernel, this is far from walk in the park. As I said before, Freescale provides a separate /dev/fb for every layer of the screen, which is much easier to work with.

But as I said, I might be completely wrong - what did you have in mind?

Luc Verhaegen

unread,
Mar 25, 2014, 10:00:07 AM3/25/14
to linux...@googlegroups.com
You should use the hw differently, i am not sure whether disp allows
that though.

Just wait until i finally deliver on my KMS driver, i still am too
lethargic atm to make proper progress on it, although i have added some
good lcd code in the last week.

Luc Verhaegen.

Ivan Kozic

unread,
Mar 25, 2014, 5:55:22 PM3/25/14
to linux...@googlegroups.com
I think that maybe it can be done - I have already started modifying driver for Qt and had some limited success - it is supposed to be using layers now, but I'm generally having issues with the usage of disp driver. In other words it doesn't work yet, but I get a proper layer handle, so good so far. The big problem with all this Allwinner stuff is that ioctls are almost not documented at all and I usually need to go deep into the driver structure to figure out how I should use what.

Forums are also full of unsolved disp issues - I don't think I've seen a single post on how to use GUI layer.

Also display driver is very buggy, so it's not really an easy task.

I'm also not sure if the display driver can be opened multiple times - my whole idea is based on the fact that it can...

What is this about your KMS driver? Not sure I know what the abbreviation stands for.

Ivan Kozic

unread,
Mar 29, 2014, 4:58:55 PM3/29/14
to linux...@googlegroups.com
Just posting to say that it can be done with the Disp driver. Basically it is a bit difficult to figure out how to properly use ioctls to do it, but in the end it is not so difficult to actually do it.
However, I'm still facing issues :) Right now I have a video overlayed with a Qt window, and even alpha blending is working properly (pipes need to be setup correctly for alpha blender to work), but due to memory allocation issues from kernel, my Qt overlay is very bad, since it looks like either framebuffer is read bad, or written to bad, as when you move the mouse you get something similar to trails, like ghost pixel groups. Quite ugly actually, but I think it's due to memory allocation (same situation with kmalloc) - once I've managed to remove it, but due to all the commenting and still not having a working version control, I forgot what :( There is a tiny mess with this reserved memory in the disp driver - I will check it on Monday and post back if I find the issue.

Anyway, while it's possible to use it properly, as you can see everything is quite buggy, and really unconventional because of these ioctls. I also took a look at your KMS driver presentation - it looks quite cool and it seems that you've given quite some effort to it. In fact, the main goal for this driver should have been a much better structure, as it seems that like for CSI, it is only copied from sun4i and a bit patched, which made the whole thing bloated and very error-prone - there are so many stupid bugs in this driver and I can almost make that most of them came from copy-pasting the old code...

Ivan Kozic

unread,
Mar 31, 2014, 11:33:29 AM3/31/14
to linux...@googlegroups.com
Hi Luc,

Found out why disp driver has choppy overlay - for me overlay comes through DMA from memory. Funny thing - disp_malloc is fetching cached memory, so choppiness or "trailing" is due to caching framebuffer protected memory.
Very silly - I found this out by changing caching method of ARM from WRITEALLOC to WRITETHROUGH, also waited 10 minutes for system to boot up :)

The matter is solved by adding the following line to disp_mmap() function:

vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

Solved. Just wondering how people were using this before...

Siarhei Siamashka

unread,
Mar 31, 2014, 11:53:52 AM3/31/14
to linux...@googlegroups.com, jimm...@gmail.com
On Mon, 31 Mar 2014 08:33:29 -0700 (PDT)
Ivan Kozic <jimm...@gmail.com> wrote:

> Hi Luc,
>
> Found out why disp driver has choppy overlay - for me overlay comes through
> DMA from memory. Funny thing - disp_malloc is fetching cached memory, so
> choppiness or "trailing" is due to caching framebuffer protected memory.
> Very silly - I found this out by changing caching method of ARM from
> WRITEALLOC to WRITETHROUGH, also waited 10 minutes for system to boot up :)
>
> The matter is solved by adding the following line to disp_mmap() function:
>
> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>
> Solved. Just wondering how people were using this before...

As far as I know, nobody is using these bug ridden memory allocators
that Allwinner has implemented in disp and g2d drivers. Except for
maybe Allwinner itself in their Android code.

--
Best regards,
Siarhei Siamashka

Ivan Kozic

unread,
Apr 2, 2014, 6:00:15 AM4/2/14
to linux...@googlegroups.com, jimm...@gmail.com
Well, as far as I can see, the community is actively working on fixing Sunxi kernel, although it seems that interest in 3.4 kernel is somehow descending.
Anyway, I thought that someone would use overlay from original disp driver, which is why I posted fixes for it - same goes for CSI.

As I said before, even though everything is dirty and buggy, abstraction level is much easier to grasp in contrast with Freescale i.MX series kernels, at least for beginner/intermediate in Linux programming...
And honestly, it seems that A20 is suffering from much less HW bugs than i.MX6, at least as far as I can see.

Siarhei Siamashka

unread,
Apr 9, 2014, 7:49:31 PM4/9/14
to linux...@googlegroups.com, jimm...@gmail.com
On Wed, 2 Apr 2014 03:00:15 -0700 (PDT)
Ivan Kozic <jimm...@gmail.com> wrote:

> On Monday, March 31, 2014 5:53:52 PM UTC+2, Siarhei Siamashka wrote:
> >
> > On Mon, 31 Mar 2014 08:33:29 -0700 (PDT)
> > Ivan Kozic <jimm...@gmail.com <javascript:>> wrote:
> >
> > > Hi Luc,
> > >
> > > Found out why disp driver has choppy overlay - for me overlay comes
> > through
> > > DMA from memory. Funny thing - disp_malloc is fetching cached memory, so
> > > choppiness or "trailing" is due to caching framebuffer protected memory.
> > > Very silly - I found this out by changing caching method of ARM from
> > > WRITEALLOC to WRITETHROUGH, also waited 10 minutes for system to boot up
> > :)
> > >
> > > The matter is solved by adding the following line to disp_mmap()
> > function:
> > >
> > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > >
> > > Solved. Just wondering how people were using this before...
> >
> > As far as I know, nobody is using these bug ridden memory allocators
> > that Allwinner has implemented in disp and g2d drivers. Except for
> > maybe Allwinner itself in their Android code.
>
> Well, as far as I can see, the community is actively working on fixing
> Sunxi kernel, although it seems that interest in 3.4 kernel is somehow
> descending.

The sunxi-3.4 kernel is still used for reverse engineering the
undocumented graphics/multimedia hardware accelerators and
prototyping the drivers for them. But I don't think that sunxi-3.4
will last more than 6-12 months even for this purpose. IMHO the
end of life is very near.

> Anyway, I thought that someone would use overlay from original disp driver,

The disp layers are used by a number of sunxi related projects:
https://github.com/linux-sunxi/libvdpau-sunxi
https://github.com/ssvb/xf86-video-fbturbo
http://linux-sunxi.org/XBMC
http://linux-sunxi.org/VLC

What I said is that disp and g2d memory allocators are not really
used by anyone.

The drivers from Allwinner lack any concept of security and are working
with physical addresses. They accept these physical addresses from the
userspace without any validation checks. Because every driver has full
access to any location in physical memory (except for cedar, which can
only address lowest 256MB of RAM), it is really irrelevant who has
allocated/reserved any particular physically contiguous memory buffer.
That's why xf86-video-fbturbo just uses the offscreen part of the
framebuffer at the moment, and can use G2D to do some operations
within it. And that's why libvdpau-sunxi uses the memory area
allocated/reserved by the cedar driver for doing all video decoding
in it and then just passes physical addresses to the disp driver to
use them as overlays. No unnecessary memory copies are done (with or
without DMA they would be just a waste of time and memory bandwidth).

Anyway, the allocator integrated with the sunxi disp driver is so
broken, that I have no non-swear words to express what I think about it.

Assuming that you really need to allocate physically contiguous
memory buffers for your purposes in sunxi-3.4, it would be probably
better to forget about the half-baked buggy allocators in disp and g2d.
And just implement a physically contiguous memory allocation in a
separate small driver. So that no functionality is copy/pasted around
with all the bugs and ugliness. Perhaps you could have a look at this
'sunxi_mem' thing together with Juan and try to make something usable
out of it?
https://www.mail-archive.com/linux...@googlegroups.com/msg03592.html

> which is why I posted fixes for it - same goes for CSI.

Maybe I have missed something. Where can we find the fixes?

> As I said before, even though everything is dirty and buggy, abstraction
> level is much easier to grasp in contrast with Freescale i.MX series
> kernels, at least for beginner/intermediate in Linux programming...

The extreme simplicity comes from totally ignoring security. This has
both good and bad sides.

> And honestly, it seems that A20 is suffering from much less HW bugs than
> i.MX6, at least as far as I can see.

Well, the kernel drivers got cleaned up a little bit and some bugs got
ironed out. But it is still a mess. Especially in the parts of code,
which are not really used by anyone. As for the HW bugs, I don't think
that we have any public errata documents available for A20. But some
HW related oddities exist.

Ivan Kozic

unread,
Apr 11, 2014, 7:49:35 AM4/11/14
to linux...@googlegroups.com, jimm...@gmail.com
Regarding memory - I'm not much of the Linux developer - more of a HW engineer, so Linux concepts are still new to me. This with memory security certainly makes sense - I've already seen that something messes up with Disp driver when I modprobe Mali drivers, pretty sure it's memory related as everything is shared as you said. Basically I can't use layers anymore if I use Mali driver, which is just insane and I think UMP and DRM have something to do with it.

Regarding CSI and DISP: as I said - not much of a Linux developer so I cannot actively be involved with the Sunxi stuff. I meant more in this passive way - I've made a thread regarding CSI in which I described my fixes and findings and also this thread where I did post how I solved my issue with overlay. If you look at them, I think you will also have only swearwords, as my fixes are pretty dirty - maybe in a year or two I will be able to contribute to the actual kernel development (obviously some other SoC, as A20 will be dead by then), but I feel that now I would only introduce more problems with my dirty contributions :)

It's the same with simplicity - as I'm new to all this, Allwinner's simple drivers are much more suited for me to clean them up compared to Freescale, where everything is so abstract that I needed too much time for some rather simple mods.

Regarding HW - I meant that I'm rather pleasantly surprised with the A20, because i.MX6 was full of HW errata and chip revisions. Also a major weak point of i.MX6 is its IPU - it has limitations which renders the image pipeline almost useless (100Mpix / sec max for IPU for instance => no Full-HD @60 fps possible, only 30fps). i.MX6 kernel is full of workarounds for these issues and it's really bloated in some places because of this (IPU split mode for instance - SW workaround for 100 Mpix limit - terrible tearing). In contrast, once something is working on Allwinner, it just works, which I really respect.

Ivan Kozic

unread,
Apr 15, 2014, 11:58:41 AM4/15/14
to linux...@googlegroups.com, jimm...@gmail.com
Ok, I need to ask you for some help, as it seems that you've done a lot on Mali. I don't need any code help or similar, just a few sentences will do if you have time :)
Currently I have OpenGL accelerated Qt, by using normal linux-sunxi with r3p0 Mali drivers - I've noticed that you did your own kernel with newer drivers, but I was already using this one and it's modded a lot, so I opted to continue using it.

The real problem is transparency of this OpenGL Mali layer, which I can only achieve by playing with Alpha channel of the /dev/fb0 (via display ioctls), which is really not useful in Qt (but it works). Ideally, global Alpha for Mali layer should be disabled with Mali rendering in 32 bits, so that I could control transparency from Qt, not from the display driver. However, when I disable global alpha for Mali layer and turn on transparency in Qt, everything is just black. If global alpha is set to 0x80, no matter what I do with the fb0 framebuffer, I can see the video underneath and the picture is blackish. Video is going over Pipe 1 of DEBE0, while GUI is using Pipe 0 of DEBE0.

Admittedly, I am not sure if this balckness is coming from Qt or bad disp/mali config, but I'm not really sure how Mali has access to DEBE and where this is handled, but I think in SW it's done over /dev/fb0. I'm also suspecting that something is odd with this "default" layer, but I've searched the kernel code and could not find anything that really pokes me in the eye.

So basically, just a place to look for would do... Or any info really...
Thanks in advance

Ivan Kozic

unread,
Apr 16, 2014, 10:52:32 AM4/16/14
to linux...@googlegroups.com, jimm...@gmail.com
Ok, I figured most of it out from the Mali drivers. Still no solution so far however.

Mali is using regular framebuffer by taking fb_start and fb_size vars from the kernel. However, no matter what I do I still cannot make this fb layer transparent - I'm probably missing out something obvious here due to me being still fresh in all this.

What I've tried (no Mali, just trying to make fb layer transparent):

- Init video layer and run video on it (pipe 1, alpha at 0xff, scaler layer),
- Move the video layer to the top,
- Get the FB layer handle to play with (open /dev/fb0 and use FBIOGET_LAYER_HDL_0 to get the layer handle),
- Use DISP_CMD_LAYER_GET_PARA to get the layer parameters,
- Just in case - set fb layer pipe to 0, mode to interleaved, format to ARGB8888, seq to ARGB, alpha_en and alpha_val to 0,
- Use DISP_CMD_LAYER_SET_PARA to set the new layer parameters (DO NOT TOUCH .fb.addr[x]),
- Write 0x00 or 0x80 to the file handle of /dev/fb0 - HxWx4 times,
- Close /dev/fb0,
- Move the fb layer to the top.

Video is still underneath and I cannot see it, unless I move the Video layer up above FB layer. No alpha blending whatsoever.
My framebuffer_fb0_num is set to 4 currently, dunno if that makes any issues.

If someone knows why is fb layer behaving this way, please share. As I said - to me it seems that I'm missing out on something quite dumb here.
If I make a new layer and reserve a chunk of memory for it (through disp_malloc), it works as expected, but this fb layer is very weird. Also I have disabled FBCON in kernel, thinking it might be that, but it isn't...

Ivan Kozic

unread,
Apr 16, 2014, 11:29:59 AM4/16/14
to linux...@googlegroups.com, jimm...@gmail.com
Got it! Full alpha blending works now:

1. Video running in SCALER layer under (720p60 scaled to LCD size),
2. Mali test running over (modified so that background opacity is set to 0.1, while triangle is at 0.5).

Works like a charm. I forgot the most important thing (well, in my case):

layer_para_fb.mode = DISP_LAYER_WORK_MODE_NORMAL;

For some reason it was working in SCALER mode (probably because workmode for fb0 is set to scaler in my fex file), which is not possible I guess (although both DEFEs can be routed to a single DEBE - maybe it's a disp limitation, or I haven't configured something right). My intention was not to have SCALER for GUI however, only for video...
This is the second time my own .fex file is playing on me. Next time I'll really triple check.

ryangh...@gmail.com

unread,
Jun 23, 2014, 2:43:32 AM6/23/14
to linux...@googlegroups.com, jimm...@gmail.com
Hello Sir.
I have a similar problem, but I'm using the A10 allwinner.
Can you explain about how to get the transparency layer above video(v4l2) layer?

Kind Regards,
Ryang.

Ivan Kozic

unread,
Jun 25, 2014, 11:50:27 AM6/25/14
to linux...@googlegroups.com, jimm...@gmail.com, ryangh...@gmail.com
Hi,

Transparency layer is in fact fb layer (it's already open and init once you start the system). In contrast, V4L2 layer is opened and initialized in your application.

In order for everything to work there are a few things you'd need to do (this is for fb0 - for fb1 everything is the same, just with /dev/fb1 and FBIOGET_LAYER_HDL_1):
1. You need to set the fex file not to use scaler - fb0_scaler_mode_enable = 0. Also set fb0_format = 10 (this is ARGB8888 format which you need for alpha blending to work).
2. The fb layer is using PIPE 0 by default, so for alpha blender to work properly, you need to set the V4L2 layer to use PIPE 1.
3. You need to set the global alpha enable for V4L2 layer to 1 and set the global alpha value to 0xff (I did all this with DISP_CMD_LAYER_SET_PARA when opening the layer).
4. Move the V4L2 layer to the top (DISP_CMD_LAYER_TOP).
5. Get the file descriptor for framebuffer (fb_fd = open("/dev/fb0", O_RDWR);) and fill the complete layer with zeros (malloc a WxHx4 memory chunk, memset it to 0x00 and write the chunk to file descriptor).
6. Get the fb layer handle (FBIOGET_LAYER_HDL_0) and move the fb layer to the top (DISP_CMD_LAYER_TOP). Now fb layer is on top with V4L2 layer just under it.
7. Turn off global alpha for fb layer (DISP_CMD_LAYER_ALPHA_OFF).
8. Set the global alpha value for fb layer to 0x00 (DISP_CMD_LAYER_SET_ALPHA_VALUE).

This is basically all you need to do. If testing with Mali, you could run mali_drv/test/test and a triangle should appear just over the video. You can also edit test.c and put opacity to 0.5, making the triangle 50% transparent.
For testing without Mali, you can just allocate a memory chunk (fill it with let's say 0x80 - this is light gray, 50% transparent) and write it to fb layer file descriptor - you will get a light gray overlay over the video.

bfalz...@gmail.com

unread,
Jul 7, 2014, 9:03:04 PM7/7/14
to linux...@googlegroups.com, jimm...@gmail.com, ryangh...@gmail.com
Hi Ivan,

I'm working with an A20 and trying to get fb1 overlay over fb0 and following your steps below, however can't get alpha blending to work, instead fb0 shows on screen0 and fb1 shows on screen1
Here is the steps I am following for configuring fb1

1. Framebuffer mode is set to screen1 and set mode to DISP_LAYER_WORK_MODE_NORMAL (I've also tried other fb modes like FB_MODE_DUAL_DIFF_SCREEN_SAME_CONTENTS)
2. Request the framebuffer for screen1 using DISP_CMD_FB_REQUEST
3. Open fb1 for reading/writing fbfd = open("/dev/fb1",O_RDWR);
4. Get the layer id for fb1 ioctl(fbfd, FBIOGET_LAYER_HDL_1, &layer_id);
5. Map the framebuffer device to memory fbp = (char*)mmap(0, finfo.smem_len, PROT_READ | PROT_WRITE, MAP_SHARED, fbfd, 0);
6. For screen1 I turn off global alpha and and set global alpha value to 0 using DISP_CMD_LAYER_ALPHA_OFF and DISP_CMD_LAYER_SET_ALPHA_VALUE
7. For screen1 I set the layer to pipe1 using DISP_CMD_LAYER_SET_PIPE
8. For screen 1 I set the layer to top using DISP_CMD_LAYER_TOP
9. I then get the layer parameters to make sure everything looks ok using DISP_CMD_LAYER_GET_PARA
10. Then draw a blue rectangle and a red rectangle on fb1 with alpha set to 127 (half transparency)
11. draw a green rectangle on fb0 (alpha 255) whose edges would overlap the edges of the other rectangles drawn in step 10
12. wait for 10 seconds
13. release the framebuffer fb1 using DISP_CMD_FB_RELEASE before ending program. (if I don't do this I noticed the layer handle keeps incrementing and will eventually crash if the program is run again and saw this documented somewhere as a bug)
Result: If I view screen1 I see the blue and red rectangles but not the green rectangle drawn on fb0. If I view screen0 I can see the green rectangle drawn on fb0 but not the rectangles on fb1

Any suggestions on what I am missing? Is it possible that scalar mode is turned on for fb1? I'm not sure how to check or disable scalar mode.

Best regards,
Bart

Ryang HaeDong

unread,
Jul 9, 2014, 11:02:02 PM7/9/14
to Bart Falzarano, linux...@googlegroups.com, Ivan Kozic
Hi, All

My QT application is very slow on first startup.
My board is a Allwinner A10 running from NAND flash.
When I first try to start my application (with -qws) the screen remains blank for about 2~3s.
I then briefly get a mouse pointer before my application starts and hides the mouse.
If I quite my application and start it again, it start instantly (again with mouse pointer).

Any ideas why the first start would take so long!

I also tried with a simple ‘Hello World’ test and it does the same.

I’m using Qt 4.7.1.

Any suggestion?

Thanks a lot!


Julian Calaby

unread,
Jul 9, 2014, 11:24:40 PM7/9/14
to linux...@googlegroups.com, Bart Falzarano, Ivan Kozic
Hi Ryang,
Linux does a lot of behind-the-scenes caching, particularly with data
on block devices (like NAND flash).

If I recall correctly, QT is fairly large, could it be that the first
run has to load it from NAND and then it's cached for subsequent runs?

Thanks,

--
Julian Calaby

Email: julian...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/
Reply all
Reply to author
Forward
0 new messages