BUG: no video on teres-i and kernel 6.5 and later

168 views
Skip to first unread message

Diego Roversi

unread,
Jan 7, 2024, 11:38:57 AMJan 7
to linux...@googlegroups.com
Hi,

I've noticed that on allwinner A64 based teres laptop, display doesn't work anymore with kernel 6.5 and later. I get this error:

[ 37.727927] [CRTC:49:crtc-0] vblank wait timed out
[ 37.728103] WARNING: CPU: 2 PID: 588 at drivers/gpu/drm/drm_atomic_helper.c:1679 drm_atomic_helper_wait_for_vblanks.part.0+0x24c/0x278 [drm_kms_helper]
[ 37.728161] Modules linked in: qrtr aes_ce_blk aes_ce_cipher polyval_ce sun50i_codec_analog snd_soc_simple_amplifier polyval_generic snd_soc_simple_card snd_soc_simple_card_utils sun4i_i2s ghash_ce sunxi_cedrus(C) sun8i_codec sun8i_adda_pr_regmap uvcvideo snd_soc_core gf128mul videobuf2_dma_contig axp20x_battery axp20x_ac_power sha2_ce axp20x_adc sha256_arm64 videobuf2_vmalloc r8723bs(C) v4l2_mem2mem snd_compress sha1_ce uvc industrialio videobuf2_memops snd_pcm_dmaengine axp20x_pek videobuf2_v4l2 libarc4 snd_pcm videodev sun8i_thermal binfmt_misc cfg80211 snd_timer snd sunxi_wdt rfkill des_generic libdes videobuf2_common mc nvmem_sunxi_sid soundcore sun8i_ce crypto_engine sun6i_dma leds_gpio cpufreq_dt evdev dm_mod fuse loop efi_pstore dax configfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 ax88796b hid_generic usbhid hid asix usbnet selftests phylink mii libphy pinctrl_axp209 axp20x_regulator governor_simpleondemand axp20x_rsb axp20x_i2c lima analogix_anx6345 sun4i_drm analogix_dp
[ 37.730542] drm_display_helper sun8i_mixer gpu_sched crct10dif_ce sun4i_tcon crct10dif_common axp20x fixed drm_shmem_helper sun8i_tcon_top drm_dma_helper ohci_platform ehci_platform ohci_hcd i2c_mv64xxx drm_kms_helper ehci_hcd phy_sun4i_usb drm pwm_sun4i usbcore usb_common sunxi_mmc gpio_keys pwm_bl
[ 37.730654] CPU: 2 PID: 588 Comm: Xorg Tainted: G WC 6.7.0-rc8+ #4
[ 37.730667] Hardware name: Olimex A64 Teres-I (DT)
[ 37.730674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 37.730685] pc : drm_atomic_helper_wait_for_vblanks.part.0+0x24c/0x278 [drm_kms_helper]
[ 37.730719] lr : drm_atomic_helper_wait_for_vblanks.part.0+0x24c/0x278 [drm_kms_helper]
[ 37.730748] sp : ffff800082b2b8d0
[ 37.730754] x29: ffff800082b2b8d0 x28: 000000000000009d x27: 0000000000000000
[ 37.730770] x26: 0000000000000001 x25: 0000000000000038 x24: 0000000000000000
[ 37.730786] x23: ffff000010bbb800 x22: 0000000000000001 x21: ffff000010bbd080
[ 37.730802] x20: ffff00000ee74600 x19: 0000000000000000 x18: ffffffffffffffff
[ 37.730819] x17: 0000000000001b18 x16: 0000000000002218 x15: ffff800082b2b4e0
[ 37.730835] x14: 0000000000000000 x13: 74756f2064656d69 x12: 742074696177206b
[ 37.730851] x11: 00000000ffffefff x10: ffff800081a7f168 x9 : ffff80008012fd88
[ 37.730867] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
[ 37.730883] x5 : ffff00007db79e08 x4 : 0000000000000000 x3 : 0000000000000027
[ 37.730899] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00000e475140
[ 37.730915] Call trace:
[ 37.730921] drm_atomic_helper_wait_for_vblanks.part.0+0x24c/0x278 [drm_kms_helper]
[ 37.730952] drm_atomic_helper_commit_tail_rpm+0x8c/0xb0 [drm_kms_helper]
[ 37.730980] commit_tail+0xac/0x1a0 [drm_kms_helper]
[ 37.731008] drm_atomic_helper_commit+0x16c/0x188 [drm_kms_helper]
[ 37.731035] drm_atomic_commit+0xb0/0xf0 [drm]
[ 37.731108] drm_atomic_connector_commit_dpms+0xe8/0x118 [drm]
[ 37.731163] drm_mode_obj_set_property_ioctl+0x1c0/0x420 [drm]
[ 37.731219] drm_connector_property_set_ioctl+0x48/0x78 [drm]
[ 37.731274] drm_ioctl_kernel+0xd8/0x190 [drm]
[ 37.731330] drm_ioctl+0x270/0x518 [drm]
[ 37.731386] __arm64_sys_ioctl+0xb4/0x100
[ 37.731405] invoke_syscall+0x78/0x108
[ 37.731421] el0_svc_common.constprop.0+0x48/0xf0
[ 37.731434] do_el0_svc+0x24/0x38
[ 37.731446] el0_svc+0x3c/0x108
[ 37.731458] el0t_64_sync_handler+0x100/0x130
[ 37.731468] el0t_64_sync+0x190/0x198
[ 37.731479] ---[ end trace 0000000000000000 ]---

Kernel up to 6.3 works fine, and kernel 6.4 don't even boot, just panic. Anybody does know if it is a known problem and if there is a fix? I'm trying to git-bisecting the exact version, but some kernel version don't even boot so I'm going on, hoping to find out. But it takes time.

Thanks in advance,
Diego.




--
Diego Roversi <die...@tiscali.it>

Diego Roversi

unread,
Feb 5, 2024, 3:52:40 AMFeb 5
to die...@tiscali.it, linux...@googlegroups.com
Hello,

I've use git bisect, and I found that the problem is introduced with this commit:

commit 1b722407a13b7f8658d2e26917791f32805980a2 (HEAD)
Merge: f8824e151fbf 5ff2977b1976
Author: Linus Torvalds <torv...@linux-foundation.org>
Date: Thu Jun 29 11:00:17 2023 -0700

Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm


But this is a lot of stuff, so I'm not sure what is the real issue.

Do you have any idea on how to find out?

Thanks,

Frank Oltmanns

unread,
Feb 5, 2024, 5:51:27 AMFeb 5
to die...@tiscali.it, linux...@googlegroups.com
Hi Diego,
While I don't know how to continue the bisect for this merge, the
content of the pull request that led to this merge commint is here:
https://lore.kernel.org/all/CAPM=9twGy8jVci0iPwdFUpePVPf...@mail.gmail.com/

One can see that there are three commits that touch sunxi-ng:

Roman Beranek (2):
drm: sun4i: rename sun4i_dotclock to sun4i_tcon_dclk
drm: sun4i: calculate proper DCLK rate for DSI

XuDong Liu (1):
drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks`

As I wrote in IRC, I highly suspect
4795c78768bcbd58d4ffab650674d314dc6dd77 "drm: sun4i: calculate proper
DCLK rate for DSI" to be the culprit here. To confirm, you could try
reverting it or the other two.

Thank you for digging into this,
Frank

>
>
> --
> Diego Roversi <die...@tiscali.it>

Diego Roversi

unread,
Feb 9, 2024, 10:17:19 AMFeb 9
to fr...@oltmanns.dev, linux...@googlegroups.com
On Mon, 05 Feb 2024 11:17:34 +0100
Frank Oltmanns <fr...@oltmanns.dev> wrote:

> Hi Diego,
>
> On 2024-02-05 at 09:52:33 +0100, Diego Roversi <die...@tiscali.it> wrote:
> > Hello,
> >
> > I've use git bisect, and I found that the problem is introduced with this commit:
> >
> > commit 1b722407a13b7f8658d2e26917791f32805980a2 (HEAD)
> > Merge: f8824e151fbf 5ff2977b1976
> > Author: Linus Torvalds <torv...@linux-foundation.org>
> > Date: Thu Jun 29 11:00:17 2023 -0700
> >
> > Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm
> >
> >
> > But this is a lot of stuff, so I'm not sure what is the real issue.
> >
>
> While I don't know how to continue the bisect for this merge, the
> content of the pull request that led to this merge commint is here:
> https://lore.kernel.org/all/CAPM=9twGy8jVci0iPwdFUpePVPf...@mail.gmail.com/
>
> One can see that there are three commits that touch sunxi-ng:
>
> Roman Beranek (2):
> drm: sun4i: rename sun4i_dotclock to sun4i_tcon_dclk
> drm: sun4i: calculate proper DCLK rate for DSI
>
> XuDong Liu (1):
> drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks`
>
> As I wrote in IRC, I highly suspect
> 4795c78768bcbd58d4ffab650674d314dc6dd77 "drm: sun4i: calculate proper
> DCLK rate for DSI" to be the culprit here. To confirm, you could try
> reverting it or the other two.
>

Thanks for advices. I found that the bug was introduced in the kernel pre-6.5 after the rc1 with this patch:

https://lore.kernel.org/all/202305050521...@crly.cz/

For newer kernel the workaround you suggested, disabling CLK_SET_RATE_PARENT, works fine.

Thanks again,

Frank Oltmanns

unread,
Feb 11, 2024, 8:19:58 AMFeb 11
to Diego Roversi, linux...@googlegroups.com
Hi Diego,
What you are saying means that in 6.5 the Teres-I doesn't work with
pll-mipi, but with 6.6 + disabling CLK_SET_RATE_PARENT it does. That is
quite confusing for me. I should try to get one of those on my desk for
testing.

Thank you for your analysis!

Best regards,
Frank

>
> Thanks again,
> Diego.

Frank Oltmanns

unread,
Feb 12, 2024, 8:58:30 AMFeb 12
to Diego Roversi, linux...@googlegroups.com

Hi Diego,

On 2024-02-11 at 14:19:49 +0100, Frank Oltmanns <fr...@oltmanns.dev> wrote:
> Hi Diego,
>
Ok, as I wrote in IRC this is my current hypothesis:
Probably in pre-6.5 the SoC was able to closely fulfill the panels
request for a specific rate (because tcon was using pll-video0-2x as
it's parent). 6.5 only allowed pll-mipi as the parent and pll-mipi was
probably not able to give the rate as closely as requested by the panel.

In 6.6 I added support for pll-mipi to set a closer rate by
a) overshooting and
b) setting the parent rate.

It seems that allowing to overshoot fixes the issues on teres-I but the
fact that it can set the parent rate, breaks it.

Could you please send the content of
/sys/kernel/debug/clk/clk_summary
for
- 6.4 or earlier,
- 6.5 (assuming you can log in via ssh/serial console)
- 6.6 (same)

I noticed that the anx6345 is only used in the pinebook and Teres-I
which are both based on the A64.

So, maybe there is something in that driver that fixes the broken
calculation of the tcon-pixel-clock that was fixed in 6.5 as well.

Thank you and best regards,
Frank

Diego Roversi

unread,
Feb 13, 2024, 12:59:10 AMFeb 13
to fr...@oltmanns.dev, linux...@googlegroups.com
On Mon, 12 Feb 2024 14:58:22 +0100
Frank Oltmanns <fr...@oltmanns.dev> wrote:

> Could you please send the content of
> /sys/kernel/debug/clk/clk_summary
> for
> - 6.4 or earlier,
> - 6.5 (assuming you can log in via ssh/serial console)
> - 6.6 (same)

I've not tested 6.6, but only 6.7.2 with and without CLK_SET_RATE_PARENT. I've some other problem with 6.6 kernels atm, so I can't check them.

Right now I can give you clk_summary from this kernel:

6.1.0-17-arm64 - standard debian kernel
6.5.13 - compiled from mainline repo
6.7.2 - compiled from mainline repo
6.7.2-dirty - compiled from mainline repo without CLK_SET_RATE_PARENT

Also I noticed that in 6.5.13 serial port have some issue, may be a clock problem too. I need to make some more tests, because it looks that some time the problem it's still there if I reboot in 6.1.0 kernel. But no problem if I poweroff, before booting with the same kernel.


Regards,
report.tgz

Frank Oltmanns

unread,
Feb 17, 2024, 3:17:50 AMFeb 17
to Diego Roversi, linux...@googlegroups.com
Hi Diego,

On 2024-02-13 at 06:59:05 +0100, Diego Roversi <die...@tiscali.it> wrote:
> On Mon, 12 Feb 2024 14:58:22 +0100
> Frank Oltmanns <fr...@oltmanns.dev> wrote:
>
>> Could you please send the content of
>> /sys/kernel/debug/clk/clk_summary
>> for
>> - 6.4 or earlier,
>> - 6.5 (assuming you can log in via ssh/serial console)
>> - 6.6 (same)
>
> I've not tested 6.6, but only 6.7.2 with and without CLK_SET_RATE_PARENT. I've some other problem with 6.6 kernels atm, so I can't check them.
>
> Right now I can give you clk_summary from this kernel:
>
> 6.1.0-17-arm64 - standard debian kernel
> 6.5.13 - compiled from mainline repo
> 6.7.2 - compiled from mainline repo
> 6.7.2-dirty - compiled from mainline repo without CLK_SET_RATE_PARENT

Thank you so much for the data. It shows that PLL-MIPI is set to the
following rates:
- 6.5.13: 9.702 GHz.
- 6.7.2: 7.7184 GHz
- 6.7.2-dirty: 764.4 MHz

So the kernel tries to set a rate that PLL-MIPI does not support. Adding
an upper (and while we're at it also lower) limit should be the real fix
here. That the removal of the CLK_SET_RATE_PARENT flag fixes the issue
for you is just a lucky coincidence.

Fortunately, I already have a patch series [1] in the works, that also
covers this part (thanks to Jernej's persistence and foresight).

The relevant patches are PATCH 3 and 4, but they do not apply cleanly
without patch 1 and 2 (but those have already been reviewed by Jernej).
There are some open remarks about the patches 3 and 4, but those are
more of an architectural nature and shouldn't affect the functionality.

Therefore, could you please try to apply patches 1 through 4 to an
otherwise clean 6.7 kernel to see if the TERES-I works then. (Actually,
you could apply the whole series, because PATCH 5 is pinephone-specific
and PATCH 6 touches the GPU and is unrelated to your specific issue.)

I'd very much appreciate if you could report back here with your results
including the content of /sys/kernel/debug/clk/clk_summary.

[1]: https://lore.kernel.org/all/20240205-pinephone-pll-...@oltmanns.dev/

Cheers,
Frank

>
> Also I noticed that in 6.5.13 serial port have some issue, may be a clock
> problem too. I need to make some more tests, because it looks that some time the
> problem it's still there if I reboot in 6.1.0 kernel. But no problem if I
> poweroff, before booting with the same kernel.

Unfortunately, I have no explanation for that.

>
>
> Regards,
> Diego.

Diego Roversi

unread,
Feb 18, 2024, 5:08:18 AMFeb 18
to Frank Oltmanns, linux...@googlegroups.com
Hi Frank,

On Sat, 17 Feb 2024 09:17:36 +0100
Frank Oltmanns <fr...@oltmanns.dev> wrote:

>
> Fortunately, I already have a patch series [1] in the works, that also
> covers this part (thanks to Jernej's persistence and foresight).
>
> The relevant patches are PATCH 3 and 4, but they do not apply cleanly
> without patch 1 and 2 (but those have already been reviewed by Jernej).
> There are some open remarks about the patches 3 and 4, but those are
> more of an architectural nature and shouldn't affect the functionality.
>
> Therefore, could you please try to apply patches 1 through 4 to an
> otherwise clean 6.7 kernel to see if the TERES-I works then. (Actually,
> you could apply the whole series, because PATCH 5 is pinephone-specific
> and PATCH 6 touches the GPU and is unrelated to your specific issue.)

I've applyed patches 1-4 over clean mainline kernel 6.7.5, and indeed it fix the issue.

>
> I'd very much appreciate if you could report back here with your results
> including the content of /sys/kernel/debug/clk/clk_summary.
>

I've attached the output with the new patched kernel.

Thanks,
report_6.7.5-dirty.txt
Reply all
Reply to author
Forward
0 new messages