[PATCH] aarch64: mask interrupt and disable timer in the handler

161 views
Skip to first unread message

Waldemar Kozaczuk

unread,
Mar 19, 2021, 5:50:07 PM3/19/21
to osv...@googlegroups.com, Waldemar Kozaczuk
The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10)
states following:

"The interrupts generated by the timer behave in a level-sensitive manner.
This means that, once the timer firing condition is reached,
the timer will continue to signal an interrupt until one of the following situations occurs:
- IMASK is set to one, which masks the interrupt.
- ENABLE is cleared to 0, which disables the timer.
- TVAL or CVAL is written, so that firing condition is no longer met."

This indicates that the timer interrupt handler needs to mask
or disable the timer, otherwise the interrupt will be re-delivered
and in some scenarios might cause underlying VMM (like QEMU in TCG mode)
stop delivering subsequent interrupts. In any case per the description
from the guide above, it is correct to mask the interrupt and/or disable
the timer in the timer interrupt handler. This is also what Linux does -
https://github.com/torvalds/linux/blob/edd7ab76847442e299af64a761febd180d71f98d/drivers/clocksource/arm_arch_timer.c#L638-L652.

Besides the above, the patch also sligthly optimizes the method set()
to make it set new timer only if calculated number of ticks (tval)
is greater than 0.

Fixes #1127

Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
---
arch/aarch64/arm-clock.cc | 39 +++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc
index 7d11e454..d32f58b1 100644
--- a/arch/aarch64/arm-clock.cc
+++ b/arch/aarch64/arm-clock.cc
@@ -98,6 +98,10 @@ u64 arm_clock::processor_to_nano(u64 ticks)
return cntvct;
}

+#define TIMER_CTL_ISTATUS_BIT 4
+#define TIMER_CTL_IMASK_BIT 2
+#define TIMER_CTL_ENABLE_BIT 1
+
class arm_clock_events : public clock_event_driver {
public:
arm_clock_events();
@@ -120,7 +124,25 @@ arm_clock_events::arm_clock_events()
res = 16 + 11; /* default PPI 11 */
}
_irq.reset(new ppi_interrupt(gic::irq_type::IRQ_TYPE_EDGE, res,
- [this] { this->_callback->fired(); }));
+ [this] {
+ /* From AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10):
+ * The interrupts generated by the timer behave in a level-sensitive manner.
+ * This means that, once the timer firing condition is reached,
+ * the timer will continue to signal an interrupt until one of the following situations occurs:
+ - IMASK is set to one, which masks the interrupt.
+ - ENABLE is cleared to 0, which disables the timer.
+ - TVAL or CVAL is written, so that firing condition is no longer met.
+ When writing an interrupt handler for the timers, it is important
+ that software clears the interrupt before deactivating the interrupt in the GIC.
+ Otherwise the GIC will re-signal the same interrupt again. */
+ u32 ctl = this->read_ctl();
+ if (ctl & TIMER_CTL_ISTATUS_BIT) { // check if timer condition met
+ ctl |= TIMER_CTL_IMASK_BIT; // mask timer interrupt
+ ctl &= ~TIMER_CTL_ENABLE_BIT; // disable timer
+ this->write_ctl(ctl);
+ this->_callback->fired();
+ }
+ }));
}

arm_clock_events::~arm_clock_events()
@@ -171,11 +193,16 @@ void arm_clock_events::set(std::chrono::nanoseconds nanos)
class arm_clock *c = static_cast<arm_clock *>(clock::get());
tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;

- u32 ctl = this->read_ctl();
- ctl |= 1; /* set enable */
- ctl &= ~2; /* unmask timer interrupt */
- this->write_tval(tval);
- this->write_ctl(ctl);
+ if (tval) {
+ u32 ctl = this->read_ctl();
+ ctl |= TIMER_CTL_ENABLE_BIT; // set enable
+ ctl &= ~TIMER_CTL_IMASK_BIT; // unmask timer interrupt
+ this->write_tval(tval);
+ this->write_ctl(ctl);
+ } else {
+ //No point to set timer if tval is 0
+ _callback->fired();
+ }
}
}

--
2.30.2

Pekka Enberg

unread,
Mar 20, 2021, 2:20:47 AM3/20/21
to Waldemar Kozaczuk, OSv Development
On Fri, Mar 19, 2021 at 11:50 PM Waldemar Kozaczuk <jwkoz...@gmail.com> wrote:
The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10)
states following:

"The interrupts generated by the timer behave in a level-sensitive manner.
 This means that, once the timer firing condition is reached,
 the timer will continue to signal an interrupt until one of the following situations occurs:
 - IMASK is set to one, which masks the interrupt.
 - ENABLE is cleared to 0, which disables the timer.
 - TVAL or CVAL is written, so that firing condition is no longer met."

This indicates that the timer interrupt handler needs to mask
or disable the timer, otherwise the interrupt will be re-delivered
and in some scenarios might cause underlying VMM (like QEMU in TCG mode)
stop delivering subsequent interrupts. In any case per the description
from the guide above, it is correct to mask the interrupt and/or disable
the timer in the timer interrupt handler. This is also what Linux does -
https://github.com/torvalds/linux/blob/edd7ab76847442e299af64a761febd180d71f98d/drivers/clocksource/arm_arch_timer.c#L638-L652.

Reviewed-by: Pekka Enberg <pen...@scylladb.com>
 

Besides the above, the patch also sligthly optimizes the method set()
to make it set new timer only if calculated number of ticks (tval)
is greater than 0.

Why is this optimization useful/needed?

- Pekka 

Waldek Kozaczuk

unread,
Mar 20, 2021, 10:05:27 AM3/20/21
to OSv Development
It saves us some unnecessary interruptions - not a big win. 

- Pekka 

Nadav Har'El

unread,
Mar 21, 2021, 4:34:50 AM3/21/21
to Waldemar Kozaczuk, Osv Dev
On Fri, Mar 19, 2021 at 11:50 PM Waldemar Kozaczuk <jwkoz...@gmail.com> wrote:
The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10)
states following:

"The interrupts generated by the timer behave in a level-sensitive manner.
 This means that, once the timer firing condition is reached,
 the timer will continue to signal an interrupt until one of the following situations occurs:
 - IMASK is set to one, which masks the interrupt.
 - ENABLE is cleared to 0, which disables the timer.
 - TVAL or CVAL is written, so that firing condition is no longer met."

This indicates that the timer interrupt handler needs to mask
or disable the timer, otherwise the interrupt will be re-delivered
and in some scenarios might cause underlying VMM (like QEMU in TCG mode)
stop delivering subsequent interrupts.

While I understand why the interrupt needs to be cleared, this description seems to indicate that
this is a bug in the VMM, not in OSv, no? The above description says that if the timer fired and
OSv did not clear it, it should re-fire immediately, and if it doesn't it sounds like a (virtual) processor bug.

That being said, it indeed sounds better to disable the timer, not because of this bug but because if
we don't, we'll get spurious timer events, no? So this fix is good. Perhaps it works around the QEMU
bug because nobody thought of testing QEMU in the unexpected case where the timer is not disabled
after receiving it?

I didn't understand what the code below does, so I have some questions below - but it's likely it
reflects more on the quality of my understanding than that of the code :-)
We didn't have this ctl & TIMER_CTL_ISTATUS_BIT before. Why do we need to add it now?
Could we get a timer interrupt without this bit?

+            ctl |= TIMER_CTL_IMASK_BIT;    // mask timer interrupt
+            ctl &= ~TIMER_CTL_ENABLE_BIT;  // disable timer
+            this->write_ctl(ctl);

I'm afraid I don't understand this - when we handle one timer interrupt, we mask the timer interrupt and disable the timer,
First, why do we need to do both? (what's the difference between disabling and masking)?

But thinking of the performance of this, what happens in the likely case that there is a *next* timer interrupt?
I see below in set() that we re-enable the timer, and un-mask it?
Maybe instead of disabling and immediately re-enabling the timer, we could avoid doing both and save a few nanoseconds?

Shouldn't we need to do something like:

1. If the timer list has a next timer, reconfigure the timer to that next time
2. Otherwise, disable the timer.
3. However, the "otherwise" in 2 also includes the case that the next timer happens right now or in the past. In that case, we need to manually run the callbacks, because after we disable the timer we won't get more interrupts for the same timer.

I'm thinking that maybe missing 3 is the bug we had with QEMU - if our code did not have a special case for a timer 0 nanoseconds in the future (or even in the past) than it would have relied on the processor serving the same interrupt again, which QEMU probably didn't (because of a bug).
You now added this special case (the !tval case), so I wonder if that change alone would have fixed the bug.

It's been ages since I looked at the details here, so maybe I'm just being confused.

 
+            this->_callback->fired();
+        }
+    }));
 }

 arm_clock_events::~arm_clock_events()
@@ -171,11 +193,16 @@ void arm_clock_events::set(std::chrono::nanoseconds nanos)
         class arm_clock *c = static_cast<arm_clock *>(clock::get());
         tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;

-        u32 ctl = this->read_ctl();
-        ctl |= 1;  /* set enable */
-        ctl &= ~2; /* unmask timer interrupt */
-        this->write_tval(tval);
-        this->write_ctl(ctl);
+        if (tval) {
+            u32 ctl = this->read_ctl();
+            ctl |= TIMER_CTL_ENABLE_BIT; // set enable
+            ctl &= ~TIMER_CTL_IMASK_BIT; // unmask timer interrupt
+            this->write_tval(tval);
+            this->write_ctl(ctl);
+        } else {
+            //No point to set timer if tval is 0
+            _callback->fired();

As I noted above, I wonder if this change alone (the !tval case where you run fired() immediately) would have been enough to fix this QEMU bug.
Which doesn't mean, of course, that the rest of the changes aren't good as well. But I'm just trying to understand.

+        }
     }
 }

--
2.30.2

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/20210319214955.93417-1-jwkozaczuk%40gmail.com.

Waldek Kozaczuk

unread,
Mar 22, 2021, 12:05:54 PM3/22/21
to OSv Development
On Sunday, March 21, 2021 at 4:34:50 AM UTC-4 Nadav Har'El wrote:
On Fri, Mar 19, 2021 at 11:50 PM Waldemar Kozaczuk <jwkoz...@gmail.com> wrote:
The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10)
states following:

"The interrupts generated by the timer behave in a level-sensitive manner.
 This means that, once the timer firing condition is reached,
 the timer will continue to signal an interrupt until one of the following situations occurs:
 - IMASK is set to one, which masks the interrupt.
 - ENABLE is cleared to 0, which disables the timer.
 - TVAL or CVAL is written, so that firing condition is no longer met."

This indicates that the timer interrupt handler needs to mask
or disable the timer, otherwise the interrupt will be re-delivered
and in some scenarios might cause underlying VMM (like QEMU in TCG mode)
stop delivering subsequent interrupts.

While I understand why the interrupt needs to be cleared, this description seems to indicate that
this is a bug in the VMM, not in OSv, no? The above description says that if the timer fired and
OSv did not clear it, it should re-fire immediately, and if it doesn't it sounds like a (virtual) processor bug.

It very well could be a QEMU bug. With the original code,  OSv would not get any interrupts for tiny tval values, not just zero.

 Having said that, it is sad we do not understand what exactly is going on on the QEMU side in TCG mode and why masking the interrupt fixes the problem.

That being said, it indeed sounds better to disable the timer, not because of this bug but because if
we don't, we'll get spurious timer events, no? So this fix is good. Perhaps it works around the QEMU
bug because nobody thought of testing QEMU in the unexpected case where the timer is not disabled
after receiving it?
Very likely because Linux does mask timer interrupt (set TIMER_CTL_IMASK_BIT) so the interrupt does not get re-delivered.
In theory, we could get a timer interrupt without this bit set. In any case, I am following what Linux does (see the github link above).

+            ctl |= TIMER_CTL_IMASK_BIT;    // mask timer interrupt
+            ctl &= ~TIMER_CTL_ENABLE_BIT;  // disable timer
+            this->write_ctl(ctl);

I'm afraid I don't understand this - when we handle one timer interrupt, we mask the timer interrupt and disable the timer,
First, why do we need to do both? (what's the difference between disabling and masking)?
Linux only masks the timer interrupt and this was enough to fix the problem with QEMU/TCG. I added disabling of timer to make armclock disable the timer output signal which I found somewhere saves cpu energy. But you are right it is not necessary. Maybe we should remove it. On other hand, it makes this code symmetrical to what we do in set.

But thinking of the performance of this, what happens in the likely case that there is a *next* timer interrupt?
I see below in set() that we re-enable the timer, and un-mask it?
Maybe instead of disabling and immediately re-enabling the timer, we could avoid doing both and save a few nanoseconds?
I wonder if would save any as regardless of how many bits of CTL you enable/disable, it just requires one msr instruction (write to CTL register).

Shouldn't we need to do something like:

1. If the timer list has a next timer, reconfigure the timer to that next time
2. Otherwise, disable the timer.
3. However, the "otherwise" in 2 also includes the case that the next timer happens right now or in the past. In that case, we need to manually run the callbacks, because after we disable the timer we won't get more interrupts for the same timer.

You are right. My original patch proposal had this:

    class arm_clock *c = static_cast<arm_clock *>(clock::get());
         tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;
 
-        u32 ctl = this->read_ctl();
-        ctl |= 1;  /* set enable */
-        ctl &= ~2; /* unmask timer interrupt */
-        this->write_tval(tval);
-        this->write_ctl(ctl);
+        if (tval) {
+           u32 ctl = this->read_ctl();
+           ctl |= 1;  /* set enable */
+           ctl &= ~2; /* unmask timer interrupt */
+           this->write_tval(tval);
+           this->write_ctl(ctl);
+           if (this->read_ctl() & 4) { // is timer condition met?
+               ctl |= 2; /* mask */
+               this->write_ctl(ctl);
+              _callback->fired();
+           }
+        } else {
+           _callback->fired();
+        }
     } 

And this was enough (based on my empirical tests) to deal with the QEMU bug (which we think there is one but cannot prove 100%). But since then I saw Linux code and discovered, that it masks the interrupt in the handler and that adding code to mask the interrupt in OSv arm clock interrupt handler would alone work around QEMU bug. So either of the two would help us.

But if we do what I proposed originally in the set (see above), it would require reading CTL registers twice - one before writing tval and one after to check if the timer condition is already met to disable it. Which would be more expensive, right?


I'm thinking that maybe missing 3 is the bug we had with QEMU - if our code did not have a special case for a timer 0 nanoseconds in the future (or even in the past) than it would have relied on the processor serving the same interrupt again, which QEMU probably didn't (because of a bug).
You now added this special case (the !tval case), so I wonder if that change alone would have fixed the bug.
!tval alone was not enough. So the !tval in this patch is only meant to save us from setting the timer (not big saving).

It's been ages since I looked at the details here, so maybe I'm just being confused.

 
+            this->_callback->fired();
+        }
+    }));
 }

 arm_clock_events::~arm_clock_events()
@@ -171,11 +193,16 @@ void arm_clock_events::set(std::chrono::nanoseconds nanos)
         class arm_clock *c = static_cast<arm_clock *>(clock::get());
         tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;

-        u32 ctl = this->read_ctl();
-        ctl |= 1;  /* set enable */
-        ctl &= ~2; /* unmask timer interrupt */
-        this->write_tval(tval);
-        this->write_ctl(ctl);
+        if (tval) {
+            u32 ctl = this->read_ctl();
+            ctl |= TIMER_CTL_ENABLE_BIT; // set enable
+            ctl &= ~TIMER_CTL_IMASK_BIT; // unmask timer interrupt
+            this->write_tval(tval);
+            this->write_ctl(ctl);
+        } else {
+            //No point to set timer if tval is 0
+            _callback->fired();

As I noted above, I wonder if this change alone (the !tval case where you run fired() immediately) would have been enough to fix this QEMU bug.
Which doesn't mean, of course, that the rest of the changes aren't good as well. But I'm just trying to understand.
No, it was not enough. 

Waldemar Kozaczuk

unread,
Mar 29, 2021, 11:00:00 PM3/29/21
to osv...@googlegroups.com, Waldemar Kozaczuk
Please note this patch is code-wise identical to the V1 version,
but it adds some extra comments to better clarify motivation
for specific changes.

The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10)
states following:

"The interrupts generated by the timer behave in a level-sensitive manner.
This means that, once the timer firing condition is reached,
the timer will continue to signal an interrupt until one of the following situations occurs:
- IMASK is set to one, which masks the interrupt.
- ENABLE is cleared to 0, which disables the timer.
- TVAL or CVAL is written, so that firing condition is no longer met."

This indicates that the timer interrupt handler needs to mask the timer interrupt
or disable the timer, otherwise the interrupt will be re-delivered
and in some scenarios might cause underlying VMM (like QEMU in TCG mode)
to stop delivering subsequent interrupts. In any case per the description
from the guide above, it is correct to mask the interrupt and/or disable
the timer in the timer interrupt handler. This is also what Linux does -
https://github.com/torvalds/linux/blob/edd7ab76847442e299af64a761febd180d71f98d/drivers/clocksource/arm_arch_timer.c#L638-L652.

So this patch also works around a likely bug in QEMU where in TCG mode
QEMU stops delivering timer interrupts when the timer was not disabled
or interrupt was not masked AND the TVAL value is 0 or tiny. Related
experiments seemed to indicate that alternatively we could have changed
the set() method to re-read CTL right after setting TVAL to see if the
timer condition was met and cancel it if so to work around the same
QEMU bug. But the former solution seems to be more correct and follows
what Linux does.

Besides the above, the patch also sligthly optimizes the method set()
to make it set new timer only if calculated number of ticks (tval)
is greater than 0.

Fixes #1127

Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
---
arch/aarch64/arm-clock.cc | 49 ++++++++++++++++++++++++++++++++++-----
1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc
index c1f4b277..20c35528 100644
--- a/arch/aarch64/arm-clock.cc
+++ b/arch/aarch64/arm-clock.cc
@@ -84,6 +84,10 @@ u64 arm_clock::processor_to_nano(u64 ticks)
return cntvct;
}

+#define TIMER_CTL_ISTATUS_BIT 4
+#define TIMER_CTL_IMASK_BIT 2
+#define TIMER_CTL_ENABLE_BIT 1
+
class arm_clock_events : public clock_event_driver {
public:
arm_clock_events();
@@ -106,7 +110,32 @@ arm_clock_events::arm_clock_events()
res = 16 + 11; /* default PPI 11 */
}
_irq.reset(new ppi_interrupt(gic::irq_type::IRQ_TYPE_EDGE, res,
- [this] { this->_callback->fired(); }));
+ [this] {
+ /* From AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10):
+ * The interrupts generated by the timer behave in a level-sensitive manner.
+ * This means that, once the timer firing condition is reached,
+ * the timer will continue to signal an interrupt until one of
+ * the following situations occurs:
+ - IMASK is set to one, which masks the interrupt.
+ - ENABLE is cleared to 0, which disables the timer.
+ - TVAL or CVAL is written, so that firing condition is no longer met. */
+ // Please note, we are following what Linux does by reading the CTL register
+ // and checking if the timer condition (CVAL <= system counter) is actually
+ // met. On top of masking the timer interrupt, which Linux does, we also
+ // disable the timer which most likely is not necessary but makes this
+ // logic symmetrical to the one when setting the timer in the set()
+ // method.
+ // This is just a theory, but we think this also works around a likely bug
+ // in QEMU TCG mode (full emulation), where it does not correctly assert timer
+ // interrupts if value of TVAL is 0 or very tiny.
+ u32 ctl = this->read_ctl();
+ if (ctl & TIMER_CTL_ISTATUS_BIT) { // check if timer condition met
+ ctl |= TIMER_CTL_IMASK_BIT; // mask timer interrupt
+ ctl &= ~TIMER_CTL_ENABLE_BIT; // disable timer
+ this->write_ctl(ctl);
+ this->_callback->fired();
+ }
+ }));
}

arm_clock_events::~arm_clock_events()
@@ -157,11 +186,19 @@ void arm_clock_events::set(std::chrono::nanoseconds nanos)
class arm_clock *c = static_cast<arm_clock *>(clock::get());
tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;

- u32 ctl = this->read_ctl();
- ctl |= 1; /* set enable */
- ctl &= ~2; /* unmask timer interrupt */
- this->write_tval(tval);
- this->write_ctl(ctl);
+ if (tval) {
+ //This 'if (tval)' optimization is not likely to save us much
+ //as the tiny 'nanos' values are rare but is so cheap
+ //to do it, so why not?
+ u32 ctl = this->read_ctl();
+ ctl |= TIMER_CTL_ENABLE_BIT; // set enable
+ ctl &= ~TIMER_CTL_IMASK_BIT; // unmask timer interrupt
+ this->write_tval(tval);
+ this->write_ctl(ctl);
+ } else {
+ //No point to set timer if tval is 0
+ _callback->fired();
+ }
}
}

--
2.30.2

Commit Bot

unread,
Mar 30, 2021, 12:38:11 PM3/30/21
to osv...@googlegroups.com, Waldemar Kozaczuk
From: Waldemar Kozaczuk <jwkoz...@gmail.com>
Committer: Waldemar Kozaczuk <jwkoz...@gmail.com>
Branch: master

aarch64: mask interrupt and disable timer in the handler
diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc
Reply all
Reply to author
Forward
0 new messages