[PATCH] drm/vkms: add missing drm_crtc_vblank_put to the get/put pair on flush

0 views
Skip to first unread message

Melissa Wen

unread,
Jul 22, 2020, 7:04:22 AM7/22/20
to Rodrigo Siqueira, Haneen Mohammed, Daniel Vetter, David Airlie, Rodrigo Siqueira, dri-...@lists.freedesktop.org, linux-...@vger.kernel.org, kerne...@googlegroups.com, twoe...@gmail.com
This patch adds a missing drm_crtc_vblank_put op to the pair
drm_crtc_vblank_get/put (inc/decrement counter to guarantee vblanks).

It clears the execution of the following kms_cursor_crc subtests:
1. pipe-A-cursor-[size,alpha-opaque, NxN-(on-screen, off-screen, sliding,
random, fast-moving])] - successful when running individually.
2. pipe-A-cursor-dpms passes again
3. pipe-A-cursor-suspend also passes

The issue was initially tracked in the sequential execution of IGT
kms_cursor_crc subtest: when running the test sequence or one of its
subtests twice, the odd execs complete and the pairs get stuck in an
endless wait. In the IGT code, calling a wait_for_vblank before the start
of CRC capture prevented the busy-wait. But the problem persisted in the
pipe-A-cursor-dpms and -suspend subtests.

Checking the history, the pipe-A-cursor-dpms subtest was successful when,
in vkms_atomic_commit_tail, instead of using the flip_done op, it used
wait_for_vblanks. Another way to prevent blocking was wait_one_vblank when
enabling crtc. However, in both cases, pipe-A-cursor-suspend persisted
blocking in the 2nd start of CRC capture, which may indicate that
something got stuck in the step of CRC setup. Indeed, wait_one_vblank in
the crc setup was able to sync things and free all kms_cursor_crc
subtests.

Tracing and comparing a clean run with a blocked one:
- in a clean one, vkms_crtc_atomic_flush enables vblanks;
- when blocked, only in next op, vkms_crtc_atomic_enable, the vblanks
started. Moreover, a series of vkms_vblank_simulate flow out until
disabling vblanks.
Also watching the steps of vkms_crtc_atomic_flush, when the very first
drm_crtc_vblank_get returned an error, the subtest crashed. On the other
hand, when vblank_get succeeded, the subtest completed. Finally, checking
the flush steps: it increases counter to hold a vblank reference (get),
but there isn't a op to decreased it and release vblanks (put).

Cc: Daniel Vetter <dan...@ffwll.ch>
Cc: Rodrigo Siqueira <rodrigosi...@gmail.com>
Cc: Haneen Mohammed <hamoha...@gmail.com>
Signed-off-by: Melissa Wen <melis...@gmail.com>
---
drivers/gpu/drm/vkms/vkms_crtc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index ac85e17428f8..a99d6b4a92dd 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -246,6 +246,7 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,

spin_unlock(&crtc->dev->event_lock);

+ drm_crtc_vblank_put(crtc);
crtc->state->event = NULL;
}

--
2.27.0

dan...@ffwll.ch

unread,
Jul 22, 2020, 8:05:07 AM7/22/20
to Rodrigo Siqueira, Haneen Mohammed, Daniel Vetter, David Airlie, Rodrigo Siqueira, dri-...@lists.freedesktop.org, linux-...@vger.kernel.org, kerne...@googlegroups.com, twoe...@gmail.com
Uh so I reviewed this a bit more carefully now, and I dont think this is
the correct bugfix. From the kerneldoc of drm_crtc_arm_vblank_event():

* Caller must hold a vblank reference for the event @e acquired by a
* drm_crtc_vblank_get(), which will be dropped when the next vblank arrives.

So when we call drm_crtc_arm_vblank_event then the vblank_put gets called
for us. And that's the only case where we successfully acquired a vblank
interrupt reference since on failure of drm_crtc_vblank_get (0 indicates
success for that function, failure negative error number) we directly send
out the event.

So something else fishy is going on, and now I'm totally confused why this
even happens.

We also have a pile of WARN_ON checks in drm_crtc_vblank_put to make sure
we don't underflow the refcount, so it's also not that I think (except if
this patch creates more WARNING backtraces).

But clearly it changes behaviour somehow ... can you try to figure out
what changes? Maybe print out the vblank->refcount at various points in
the driver, and maybe also trace when exactly the fake vkms vblank hrtimer
is enabled/disabled ...

I'm totally confused about what's going on here now.
-Daniel

> crtc->state->event = NULL;
> }
>
> --
> 2.27.0
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Melissa Wen

unread,
Jul 22, 2020, 10:06:13 AM7/22/20
to Daniel Vetter, Rodrigo Siqueira, Haneen Mohammed, David Airlie, Rodrigo Siqueira, dri-...@lists.freedesktop.org, linux-...@vger.kernel.org, kerne...@googlegroups.com, twoe...@gmail.com
:(

I can check these, but I also have other suspicions. When I place the
drm_crct_vblank_put out of the if (at the end of flush), it not only solve
the issue of blocking on kms_cursor_crc, but also the WARN_ON on kms_flip
doesn't appear anymore (a total cleanup). Just after:

vkms_output->composer_state = to_vkms_crtc_state(crtc->state);

looks like there is something stuck around here.

Besides, there is a lock at atomic_begin:

/* This lock is held across the atomic commit to block vblank timer
* from scheduling vkms_composer_worker until the composer is updated
*/
spin_lock_irq(&vkms_output->lock);

that seems to be released on atomic_flush and make me suspect something
missing on the composer update.

I'll check all these things and come back with news (hope) :)

Thanks,

Melissa

Daniel Vetter

unread,
Jul 22, 2020, 11:17:18 AM7/22/20
to Melissa Wen, Rodrigo Siqueira, Haneen Mohammed, David Airlie, Rodrigo Siqueira, dri-devel, Linux Kernel Mailing List, kerne...@googlegroups.com, Trevor Woerner
Hm do you have the full WARNING for this? Maybe this gives me an idea
what's going wrong.

> Besides, there is a lock at atomic_begin:
>
> /* This lock is held across the atomic commit to block vblank timer
> * from scheduling vkms_composer_worker until the composer is updated
> */
> spin_lock_irq(&vkms_output->lock);
>
> that seems to be released on atomic_flush and make me suspect something
> missing on the composer update.

atomic_begin/atomic_flush are symmetric functions an always called
around all the plane updates. So having the spin_lock in _begin and
the spin_unlock in _flush should be symmetric and correct.

If you want to make sure, recompile with CONFIG_PROVE_LOCKING, which
should immmediately give you a huge splat in dmesg if there's anything
unbalanced with locking.

> I'll check all these things and come back with news (hope) :)

Have fun chasing stuff :-)

Cheers, Daniel

Sidong Yang

unread,
Jul 24, 2020, 11:12:52 PM7/24/20
to Daniel Vetter, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hi Daniel, Melissa.
I found something about this problem.
I traced vblank->refcount that it's important in the problem.
In normal case, first test run calls commit_tail() and enable vblank in
atomic_flush(). in drm_vblank_get(), it enable vblank when refcount was zero.

in first test run, it disable crtc for cleanup test. drm_crtc_vblank_off() was
called by atomic_disable. in this function vblank's refcount was increased for
prevent subsequent drm_vblank_get() from re-enabling the vblank interrupt.
and refcount goes one not zero for next test run.

and next test run, drm_vblank_get() was called but it didn't enable vblank
because refcount was already one. drm_crtc_vblank_on() was called in next. but
it didn't enable vblank but just increase refcount only.

I think this is why this problem happen. don't know how to fix this correctly.
should we force to enable vblank after enabling crtc?

Thanks
-Sidong

> > >
> > > > crtc->state->event = NULL;
> > > > }
> > > >
> > > > --
> > > > 2.27.0
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter

unread,
Jul 25, 2020, 11:58:09 AM7/25/20
to Sidong Yang, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hm, between drm_crtc_vblank_off and drm_crtc_vblank_on
drm_crtc_vblank_get should fail (and leave the refcount unchanged).
It's convoluted logic, but the check for vblank->enabled should catch
that and return -EINVAL for this case. Does that not happen?

It would indeed explain the bug (I think, I've been wrong way too many
times with this).
-Daniel

Melissa Wen

unread,
Jul 25, 2020, 1:45:24 PM7/25/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hi Daniel and Sidong,

I don't know if it will be confusing, but I will try to explain in a
little more detail (and newbie way) what I saw in this behavior of the
refcount (similar to what Sidong evaluated).

1. Starting with the loading of vkms is:
In vkms_init:
After drm_vblank_init (refcount=0), it calls:
vkms_modeset_init
--> vkms_output_init
----> drm_mode_config_reset
-------> vkms_atomic_crtc_reset
(even more inside)--> drm_crtc_vblank_reset that bumps the refcount to
prevent vblank_get to enable vblank (refcount=1)

2. So, when we start a subtest, vblank is still disabled and in
commit_tail, commit_planes triggers a atomic_begin/flush->vblank_get that
return -EINVAL because !vblank->enabled (refcount ends 1) and send_vblank;
however the test fails before atomic_enable decrements refcount to 0 and
reset timestamp.
** This warning also appears in this very first running:
WARNING: CPU: 0 PID: 708 at drivers/gpu/drm/vkms/vkms_crtc.c:91 vkms_get_vblank_timestamp+0x41/0x50 [vkms]

In the end, this sequence modeset_disable -> atomic_begin ->
atomic_flush: refcount going from 0 to 1 and than drm_vblank_enable
prepares to everything going well in the next subtest (because
atomic_disable is not called).

3. It could be nice, but in the next subtest (with success), as refcount +
vblank_enabled ok, after doind its job, it calls
atomic_disable->vblank_off and here refcount ends 1 and vblank disabled
(the problem returns).
So, we have a kind of good turn and bad turn.

I tried different things, but the only relatively stable result was
putting the sequence modeset_disable + modeset_enables + commit_planes in
the commit_tail. That didn't convince me and then I keep trying things.

Daniel Vetter

unread,
Jul 25, 2020, 2:12:31 PM7/25/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hm yeah I guess that's something we should paper over a bit, but maybe
the bugfix will take care of that.

> In the end, this sequence modeset_disable -> atomic_begin ->
> atomic_flush: refcount going from 0 to 1 and than drm_vblank_enable
> prepares to everything going well in the next subtest (because
> atomic_disable is not called).
>
> 3. It could be nice, but in the next subtest (with success), as refcount +
> vblank_enabled ok, after doind its job, it calls
> atomic_disable->vblank_off and here refcount ends 1 and vblank disabled
> (the problem returns).
> So, we have a kind of good turn and bad turn.
>
> I tried different things, but the only relatively stable result was
> putting the sequence modeset_disable + modeset_enables + commit_planes in
> the commit_tail. That didn't convince me and then I keep trying things.

This actually sounds like a good idea, I had the same one. Doing it
this way should also resolve the WARNING you've pointed out I think?

But I'm still wondering why after step 3 we don't get -EINVAL from
vblank_get() - after vblank_off() vblank->enabled should be false
again, getting us back to the same state as after 1. Is that not
happening?
-Daniel

Melissa Wen

unread,
Jul 25, 2020, 2:49:27 PM7/25/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
No, this very first warning continues (only once) :(
From here (drm_crtc_vblank_on):
if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));

> But I'm still wondering why after step 3 we don't get -EINVAL from
> vblank_get() - after vblank_off() vblank->enabled should be false
> again, getting us back to the same state as after 1. Is that not
> happening?

Yes (sorry if it got confused), we got -EINVAL after setp 3:

In step 3, at the end of the 2nd running, we have:
atomic_disable
--> vblank_off [!vblank->inmodeset + refcount going 0->1 + inmodeset=1]
and then in next vblank_get: -EINVAL (!vblank->enabled + refcount ends 1)
as in the first step.

Melissa

Melissa Wen

unread,
Jul 25, 2020, 3:19:58 PM7/25/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
> No, this very first warning continues (only once) :(
> From here (drm_crtc_vblank_on):
> if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
> drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));

Sorry, not sure when this warning is triggered.

Melissa Wen

unread,
Jul 25, 2020, 3:29:47 PM7/25/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
On Sat, Jul 25, 2020 at 4:19 PM Melissa Wen <melis...@gmail.com> wrote:
>
> > No, this very first warning continues (only once) :(
> > From here (drm_crtc_vblank_on):
> > if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
> > drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));
>
> Sorry, not sure when this warning is triggered.

Again, I just had to look at the trace:
[ 52.299388] drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
[ 52.299389] drm_reset_vblank_timestamp+0x5b/0xd0 [drm]
[ 52.299389] drm_crtc_vblank_on.cold+0x37/0x103 [drm]
[ 52.299390] drm_atomic_helper_commit_modeset_enable

Daniel Vetter

unread,
Jul 26, 2020, 6:26:21 AM7/26/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
On Sat, Jul 25, 2020 at 9:29 PM Melissa Wen <melis...@gmail.com> wrote:
>
> On Sat, Jul 25, 2020 at 4:19 PM Melissa Wen <melis...@gmail.com> wrote:
> >
> > > No, this very first warning continues (only once) :(
> > > From here (drm_crtc_vblank_on):
> > > if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
> > > drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));
> >
> > Sorry, not sure when this warning is triggered.
>
> Again, I just had to look at the trace:
> [ 52.299388] drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
> [ 52.299389] drm_reset_vblank_timestamp+0x5b/0xd0 [drm]
> [ 52.299389] drm_crtc_vblank_on.cold+0x37/0x103 [drm]
> [ 52.299390] drm_atomic_helper_commit_modeset_enable

Yeah I think vkms can't generate a reasonable timestamp when the
hrtimer is off. I thought the warning comes from a different
callchain, but seems to be a general problem.

I guess in the vkms timestamp function we should check whether the
timer is running, and if it's not running, then we just grab the
current time and done.
-Daniel

Sidong Yang

unread,
Jul 28, 2020, 12:16:44 PM7/28/20
to Daniel Vetter, Melissa Wen, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
On Sun, Jul 26, 2020 at 12:26:08PM +0200, Daniel Vetter wrote:
> On Sat, Jul 25, 2020 at 9:29 PM Melissa Wen <melis...@gmail.com> wrote:
> >
> > On Sat, Jul 25, 2020 at 4:19 PM Melissa Wen <melis...@gmail.com> wrote:
> > >
> > > > No, this very first warning continues (only once) :(
> > > > From here (drm_crtc_vblank_on):
> > > > if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
> > > > drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));
> > >
> > > Sorry, not sure when this warning is triggered.
> >
> > Again, I just had to look at the trace:
> > [ 52.299388] drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
> > [ 52.299389] drm_reset_vblank_timestamp+0x5b/0xd0 [drm]
> > [ 52.299389] drm_crtc_vblank_on.cold+0x37/0x103 [drm]
> > [ 52.299390] drm_atomic_helper_commit_modeset_enable
>
> Yeah I think vkms can't generate a reasonable timestamp when the
> hrtimer is off. I thought the warning comes from a different
> callchain, but seems to be a general problem.
>
> I guess in the vkms timestamp function we should check whether the
> timer is running, and if it's not running, then we just grab the
> current time and done.

I tried some test about this scenario that commit_tail calls in sequence disable
- enable - commit.
In a first test. there was a warning and found out that it raised from
vkms_get_vblank_timestamp() the code checking vblank_hrtimer's expire time and
vblank_time. In first run, vblank_time and hrtimer's expire time was both zero.
because vblank wasn't happened yet. this warning wasn't happend since second run
that vblank time was set from first run.

I don't know it's good way to solve the problem. Is there no problem in other
drm modules?

-Sidong

dan...@ffwll.ch

unread,
Jul 28, 2020, 5:55:18 PM7/28/20
to Daniel Vetter, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Generally real hw drivers always have working clocks, not like the fake
ones we have here :-) The idea behind the timestamp callback is that when
vblank interrupts aren't enabled, the timestamp will help us keep track of
how many vblanks have happened.

So I think (but might be wrong) correct fix for this issue would be to
check whether vblanks are enabled, and if not, simply pass back the
current system time. That's a lie, but much better than whatever value was
set last time around the hrtimer fired- e.g. similar problem can happen
later on when the vblank interrupt was off for a very long time.
-Daniel

Melissa Wen

unread,
Jul 29, 2020, 3:09:17 PM7/29/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Melissa Wen
Hi Daniel,

My uncertainty in this idea was related to a subtest, the cursor-suspend.
Although the reordering solves most of the blocking in kms-cursor-crc, the
suspend subtest fails because when vkms suspends, it disables vblank,
and when it resumes, vblank is not enabled in time. In this subtest, there is
a pipe-crc-start and adding a igt_wait_for_vblank solves... but again,
I know it is not the real fix.

Would be the case to develop a specific feature of suspend/resume in vkms?
I mean, something to enable vblank when resume. I am trying to figure out how
to develop it, but still without success.

Melissa

Daniel Vetter

unread,
Jul 29, 2020, 5:48:14 PM7/29/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hm since it's all software I expected that the hrtimer will simply
continue to run as if nothing happened. For real hw we'd need to use
drm_mode_config_helper_suspend/resume, but for vkms I dont think
that's required. Is the vblank hrtimer not working after resume? Or is
it simply reporting a garbage timestamp and that's why the testcase
fails?

Not sure how to wire it up for fake drivers like vkms, but maybe doing
the suspend/resume like for real drivers helps. I think ideally we'd
try to attach a platform driver to our platform device we create (but
not sure how to do that).
-Daniel

Melissa Wen

unread,
Jul 30, 2020, 6:09:35 AM7/30/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, twoe...@gmail.com, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
The testcase fails for the same timeout in waiting the first crc
(already applying the change in the commit_tail sequence):

(kms_cursor_crc:732) igt_aux-DEBUG: Test requirement passed: (power_dir = open("/sys/power", O_RDONLY)) >= 0
(kms_cursor_crc:732) igt_aux-DEBUG: Test requirement passed: get_supported_suspend_states(power_dir) & (1 << state)
(kms_cursor_crc:732) igt_aux-DEBUG: Test requirement passed: test == SUSPEND_TEST_NONE || faccessat(power_dir, "pm_test", R_OK | W_OK, 0) == 0
(kms_cursor_crc:732) igt_aux-DEBUG: Test requirement passed: !(state == SUSPEND_STATE_DISK && !intel_get_total_swap_mb())
(kms_cursor_crc:732) igt_aux-DEBUG: Test requirement passed: ret == 0
(kms_cursor_crc:732) igt_core-INFO: [cmd] rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jul 30 09:23:59 2020
(kms_cursor_crc:732) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(kms_cursor_crc:732) igt_core-INFO: Timed out: Opening crc fd, and poll for first CRC.

What I could check was, when suspend, vblanks are disabled (calling
vkms_disable_vblank), and when resume, the testcase fails and only after
the failure vblanks are enabled (vkms_enable_vblank) and
hrtimer_init/starts.

If I "force" enabling vblanks via testcase (adding a
igt_wait_for_vblank before igt_pipe_crc_start), things work fine.
This is why I thought about anticipating the restarting of activities by
placing a vblank "wakeup" in a resume function. Or perhaps prevent vblank
from being disabled when suspended, since that last attempt to place a
vblank_put at the end of the flush made this test case (suspend) work,
because it prevented the vblank from being disabled.

This failure in suspend subtest is so closer to the previous ones, that I
was unsure if my attempt in fix by reordering commit_tail would be enough.
But maybe they are different situations that deserve different treats.
Do you think restarting a vblank in resume can make sense for vkms?

Melissa

dan...@ffwll.ch

unread,
Jul 31, 2020, 5:08:39 AM7/31/20
to Daniel Vetter, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, twoe...@gmail.com, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hm, what is disabling the vblank there? Can you grab a full backtrace for
that? I have no idea why that's even happening ...

> If I "force" enabling vblanks via testcase (adding a
> igt_wait_for_vblank before igt_pipe_crc_start), things work fine.
> This is why I thought about anticipating the restarting of activities by
> placing a vblank "wakeup" in a resume function. Or perhaps prevent vblank
> from being disabled when suspended, since that last attempt to place a
> vblank_put at the end of the flush made this test case (suspend) work,
> because it prevented the vblank from being disabled.
>
> This failure in suspend subtest is so closer to the previous ones, that I
> was unsure if my attempt in fix by reordering commit_tail would be enough.
> But maybe they are different situations that deserve different treats.
> Do you think restarting a vblank in resume can make sense for vkms?

tbh I'm just really confused what's going on :-/
-Daniel

Sidong Yang

unread,
Jul 31, 2020, 12:13:26 PM7/31/20
to Melissa Wen, Daniel Vetter, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, twoe...@gmail.com, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hi Daniel, Mellisa.
I made up the situation Melissa said, and thought about what's going on now.
What pipe-A-cursor-suspend different from size-change is that it has some
suspend command just before disabling cursor in test. usually vblank is enabled
by drm_vblank_get() and it's handled by vblank interrupt (vkms_vblank_simulate
in vkms). by calling drm_crtc_handle_vblank(), drm_vblank_put() is called and it
disable vblank with disable timer. the timer will disable vblank after 5 seconds
(drm_vblank_offdelay) later in default. that time is enough to execute crc
command in simple size-change test. but in suspending situation, if suspend and
wakup command takes more than 5 seconds, vblank will disabled and also crc
command is failed from polling crc file. the test are passed if I make my
environment wake up earlier.
In the same way, if there is the code delaying crc command in igt test, it make
same problem even if it's in simple size-change.

-Sidong

Melissa Wen

unread,
Jul 31, 2020, 12:47:16 PM7/31/20
to Sidong Yang, Daniel Vetter, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, twoe...@gmail.com, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Oh, nice! I tested what you say extending the offdelay, and the test goes well.
But what would be the right way to fix?

Melissa

Leandro Ribeiro

unread,
Jul 31, 2020, 1:28:05 PM7/31/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hello everybody!

I'm currently working on a writeback connector screenshooter for Weston. In order
to test it, I'm using VKMS with Rodrigo's writeback connector patch. Here is the
link with the MR in Weston
with more details of how I've tested it.


The reason why I'm writing this is that in the first writeback connector screenshot
VKMS
gets stuck. And I believe (from what I've tried to debug) that what happens is
that the
writeback job gets stuck in the queue waiting for a vsync signal. Then from
the second screenshot on everything works fine. So I believe this is related to this
issue somehow.

Melissa's idea to add drm_crtc_vblank_put(crtc) made it work, although VKMS started
to print this warn message:

WARNING: CPU: 0 PID: 168 at drivers/gpu/drm/vkms/vkms_crtc.c:21 vkms_vblank_simulate+0x101/0x110

 
I've decided to share this info with you, as it may help you somehow. I'm also
investigating to help understand what is happening.

Thanks,
Leandro Ribeiro

Leandro Ribeiro

unread,
Jul 31, 2020, 1:35:09 PM7/31/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hello everybody!

I'm currently working on a writeback connector screenshooter for Weston.
In order
to test it, I'm using VKMS with Rodrigo's writeback connector patch:
https://lkml.org/lkml/2020/5/11/449

Here is the link with the MR in Weston with more details of how I've
tested it:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/458

The reason why I'm writing this is that in the first writeback connector
screenshot
VKMSgets stuck. And I believe (from what I've tried to debug) that what
happens is
that thewriteback job gets stuck in the queue waiting for a vsync
signal. Then from
the second screenshot on everything works fine. So I believe this is
related to this
issue somehow.

Melissa's idea to add drm_crtc_vblank_put(crtc) made it work, although
VKMS started
to print this warn message:

WARNING: CPU: 0 PID: 168 at drivers/gpu/drm/vkms/vkms_crtc.c:21
vkms_vblank_simulate+0x101/0x110

I've decided to share this info with you, as it may help you somehow.
I'm also
investigating to help understand what is happening.

Thanks,
Leandro Ribeiro

On 7/31/20 1:47 PM, Melissa Wen wrote:

Leandro Ribeiro

unread,
Jul 31, 2020, 2:09:12 PM7/31/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Hello everybody!

I'm currently working on a writeback connector screenshooter for Weston.
In order to test it, I'm using VKMS with Rodrigo's writeback connector
patch: https://lkml.org/lkml/2020/5/11/449

Here is the link with the MR in Weston with more details of how I've
tested it:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/458

The reason why I'm writing this is that in the first writeback connector
screenshot VKMS gets stuck. And I believe (from what I've tried to
debug) that what happens is that the writeback job gets stuck in the
queue waiting for a vsync signal. Then from the second screenshot on
everything works fine. So I believe this is related to this issue somehow.

Melissa's idea to add `drm_crtc_vblank_put(crtc)` made it work, although
VKMS started to print this warn message:

WARNING: CPU: 0 PID: 168 at drivers/gpu/drm/vkms/vkms_crtc.c:21
vkms_vblank_simulate+0x101/0x110

From what I've read from this thread it seems like this is not the
right fix, but I've decided to share this info with you anyway, as it
may help. I'm also trying to understand what is happening.

Thanks,
Leandro Ribeiro

On 7/31/20 1:47 PM, Melissa Wen wrote:

Daniel Vetter

unread,
Jul 31, 2020, 2:33:38 PM7/31/20
to Melissa Wen, Sidong Yang, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Trevor Woerner, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
Oh, that makes tons of sense and explains (I think at least) all the
bugs seen around vblank. Nice debugging!

btw you can also set offdelay=0, then it should never disable.

> Oh, nice! I tested what you say extending the offdelay, and the test goes well.
> But what would be the right way to fix?

Well vkms crc code needs a working vblank, so when we enable crc we
need to call drm_crtc_vblank_get, and when we disable crc, we need to
call drm_crtc_vblank_put. That should ensure that vblanks are always
happening, and crcs are captured no matter how much time (e.g.
suspend/resume) passes meanwhile. On real hw usually the crc is always
generated as long as the crtc is on, and don't need the vblank
interrupt enabled. But vkms is all fake, so works a bit different.

Cheers, Daniel

Leandro Ribeiro

unread,
Jul 31, 2020, 2:38:02 PM7/31/20
to Daniel Vetter, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com, Sidong Yang
Hello everybody!

I'm currently working on a writeback connector screenshooter for Weston.
In order to test it, I'm using VKMS with Rodrigo's writeback connector
patch: https://lkml.org/lkml/2020/5/11/449

Here is the link with the MR in Weston with more details of how I've
tested it:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/458

The reason why I'm writing this is that in the first writeback connector
screenshot VKMS gets stuck. And I believe (from what I've tried to
debug) that what happens is that the writeback job gets stuck in the
queue waiting for a vsync signal. Then from the second screenshot on
everything works fine. So I believe this is related to this issue somehow.

Melissa's idea to add `drm_crtc_vblank_put(crtc)` made it work, although
VKMS started to print this warn message:

WARNING: CPU: 0 PID: 168 at drivers/gpu/drm/vkms/vkms_crtc.c:21
vkms_vblank_simulate+0x101/0x110

From what I've read from this thread it seems like this is not the
right fix, but I've decided to share this info with you anyway, as it
may help. I'm also trying to understand what is happening.

Thanks,
Leandro Ribeiro

Sidong Yang

unread,
Aug 1, 2020, 12:06:16 PM8/1/20
to Daniel Vetter, Melissa Wen, Haneen Mohammed, Rodrigo Siqueira, David Airlie, Rodrigo Siqueira, Trevor Woerner, Linux Kernel Mailing List, dri-devel, kerne...@googlegroups.com
I think it's good. I tested some code that insert drm_crtc_vblank_get/put in
vkms_set_crc_source and it make successful with igt test. with this modified
code, when user code opens crc, vkms_set_crc_source is called with enabling crc.
On the contrary, when closing crc, set_crc_source is called with disabling
src_name and composer_enabled goes false.

-Sidong
Reply all
Reply to author
Forward
0 new messages