HLT instruction

s_dub...@nospicedham.yahoo.com

unread,

Jun 8, 2013, 10:02:35 AM6/8/13

to

My 486 instruction set reference describes the HLT instruction saying:

The HLT instruction stops instruction execution and places the
processor in a HALT state, An enabled interrupt, NMI, or a reset will
resume execution. If an interrupt (including NMI) is used to resume
execution after a HLT instruction, the saved CS:IP (or CS:EIP) value
points to the instruction following the HLT instruction.

The HLT instruction is a privileged instruction.

My question is: has anyone had use for this instruction or seen it
used, and for what purpose?

ISTM that it is kind of a software wait state to help match up slow,
low level, I/O with fast instruction execution, but I'm not sure.
Comments?

Steve

hopcode

unread,

Jun 8, 2013, 11:02:15 AM6/8/13

to

some years ago i read it in app on windows running threads at ring 3
in which an HLT was there in order to idle the CPU. results were CPU
10%-15% cooler. the app the name was something like cpuidle

--
.:mrk[hopcode]
.:x64lab:.
board http://board.x64lab.net

hopcode

unread,

Jun 8, 2013, 11:17:20 AM6/8/13

to

Il 08.06.2013 17:02, hopcode ha scritto:
> some years ago i read it in app on windows running threads at ring 3
> in which an HLT was there in order to idle the CPU. results were CPU
> 10%-15% cooler. the app the name was something like cpuidle
>

ehhm, the core was in a vxd (ring 0)
http://win32assembly.programminghorizon.com/files/cpice112.zip

Cheers,

Robert Wessel

unread,

Jun 8, 2013, 1:42:40 PM6/8/13

to

On Sat, 8 Jun 2013 07:02:35 -0700 (PDT),
"s_dub...@nospicedham.yahoo.com"

It's usually used in OS code, and executed when the OS has no work to
dispatch. For example, it's common in the middle of the OS's idle
loop. These days the main thing this does is drop the CPU into a
power saving state. As someone else mentioned, it's useful for
virtual machines as well, where it allows the hypervisor to stop
wasting resources emulating the guest's idle loop.

Even before the serious power management these days, halted CPUs often
consumed less power.

On multi-threaded CPUs it stops wasting resources on idle hardware
threads.

In the (pre-cache) past the reduction in memory accesses by the idle
CPU was useful, although that usually was not much of an x86 issue.

Halt states have been implemented on many, perhaps most,
architectures. S/360 (and its descendents) used a bit set in the PSW
to enter the halt state, and used (the inverse of) that bit to drive
the CPU usage meter, which controlled billing for rented systems
(literally, your IBM guy would come out every month and read the
meter).

It has nothing really to do with I/O.

JJ

unread,

Jun 8, 2013, 1:42:09 PM6/8/13

to

On Sat, 8 Jun 2013 07:02:35 -0700 (PDT), s_dub...@nospicedham.yahoo.com

wrote:

The current usage is for power management. I've seen it's used by Windows 9x
programs called "Rain" and its successor "Waterfall" to halt the CPU in
order to cool it down, since Windows 9x doesn't support CPU idle state yet.
There's also a DOS CPU cooler program that execute HLT after every timer
IRQ. I first found it from the VM addition of original Connectix Virtual PC,
then found another from FreeDOS package.

However, the HLT already exists since 8086 where power management is not of
concern yet. HLT is originally designed just to halt the CPU, and only that.
The intended purpose (of stopping the CPU) never seem to be documented
oficially, or at least, not that I know of. It's probably used to aid
"debugging" the CPU hardware when it was designed, but been made as an
official instruction. Similar like some undocumented instructions at that
time.

Rod Pemberton

unread,

Jun 8, 2013, 5:26:36 PM6/8/13

to

<s_dub...@nospicedham.yahoo.com> wrote in message
news:8d35fe77-52cf-49d7...@9g2000yqq.googlegroups.com...

> My 486 instruction set reference describes the HLT instruction
> saying:
>
> The HLT instruction stops instruction execution and places the
> processor in a HALT state, An enabled interrupt, NMI, or a
> reset will resume execution. If an interrupt (including NMI) is
> used to resume execution after a HLT instruction, the saved
> CS:IP (or CS:EIP) value points to the instruction following
> the HLT instruction.
>
> The HLT instruction is a privileged instruction.
>
> My question is: has anyone had use for this instruction or seen
> it used, and for what purpose?
>

Yes.

The most common usage is probably to just prevent the processor
from free running...

If your code has good interrupt control, it could be used as a
temporary wait too, e.g., wait until designated interrupt, perhaps
a timer interrupt.

The most common place I've seen it used is at the end of some code
that can't recover, e.g., boot code at 7C00h which can't transfer
control to an OS, or simple demonstration code that requires a
reboot when done, like demo PM setup code. It's used the same way
"jmp $" is used:

again:
hlt
jmp again

Of course, "jmp $" will free-run, and this won't. This will run
intermittently. The jmp instruction will execute if an interrupt
or NMI occurs.

I happen to use hlt in my OS as a way to force the user to
manually power down the computer, when no other software reset
methods worked. I do this:

; other system reset methods have failed
; force use to power down

"disable NMI routine"
cli
hlt

With both NMIs and maskable interrupts disabled, the hlt should
never continue execution at the following instruction, i.e., it
can't be interrupted or un-halted. I.e., this acts like an
uninterruptable "jmp $" without having the processor execute the
instruction repeatedly. Of course, if paranoid, you could add the
jump just in case some processor out there is "broken".

You could also use it to trap NMI's (hardware) only:

again:
cli
hlt
; handle NMI
jmp again

see section "How do I disable NMI on a PC?"
http://aodfaq.wikispaces.com/boot

halt
http://codewiki.wikispaces.com/x86_stop

HTH,

Rod Pemberton

wolfgang kern

unread,

Jun 8, 2013, 7:03:53 PM6/8/13

to

Steve wrote:

> My 486 instruction set reference describes the HLT instruction saying:
>
> The HLT instruction stops instruction execution and places the
> processor in a HALT state, An enabled interrupt, NMI, or a reset will
> resume execution. If an interrupt (including NMI) is used to resume
> execution after a HLT instruction, the saved CS:IP (or CS:EIP) value
> points to the instruction following the HLT instruction.
>
> The HLT instruction is a privileged instruction.
>
> My question is: has anyone had use for this instruction or seen it
> used, and for what purpose?

I use two consecutive HLT to measure the actual CPU-clock-frequency
and one third occurance of HLT is found in my main-idle.

main:
...check for pendind jobs here
sti
hlt ;if nothing to do: wait for IRQ (at least timer wakes it up)
jmp main

> ISTM that it is kind of a software wait state to help match up slow,
> low level, I/O with fast instruction execution, but I'm not sure.
> Comments?

The HALT-state may save some power, mobiles may even fall asleep on HLT.
__
wolfgang

Tim Roberts

unread,

Jun 9, 2013, 12:27:18 AM6/9/13

to

"s_dub...@nospicedham.yahoo.com" <s_dub...@nospicedham.yahoo.com>
wrote:

>
>My question is: has anyone had use for this instruction or seen it
>used, and for what purpose?

Here is the Windows idle loop on a 486:

idleLoop: HLT
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Robert Wessel

unread,

Jun 9, 2013, 12:47:58 AM6/9/13

to

On Sat, 08 Jun 2013 21:27:18 -0700, Tim Roberts
<ti...@nospicedham.probo.com> wrote:

>"s_dub...@nospicedham.yahoo.com" <s_dub...@nospicedham.yahoo.com>
>wrote:
>>
>>My question is: has anyone had use for this instruction or seen it
>>used, and for what purpose?
>
>Here is the Windows idle loop on a 486:
>
> idleLoop: HLT

At the very least, that would need to be followed by a "jmp idleLoop",
as execution will continue with the instruction after the HLT after an
interrupt. There could, of course, be considerably more code in the
idle loop.

Steve

unread,

Jun 9, 2013, 9:19:13 AM6/9/13

to

Hi,

I used HLT in a program I wrote for the HP 200LX battery powered
palmtop computer, an 80186. It was used in a delay routine. It saved
a significant amount of power, as opposed to a tight polling loop looking
at the clock tick value.

Regards,

Steve N.

s_dub...@nospicedham.yahoo.com

unread,

Jun 9, 2013, 12:44:36 PM6/9/13

to

On Jun 8, 9:02 am, "s_dubrov...@nospicedham.yahoo.com"

Thanks for all the comments, informative to me.

Coincidently, this morning I found an app note AP-949 on the PAUSE
instruction for spin locks:

http://software.intel.com/en-us/articles/ap949-using-spin-loops-on-intel-pentiumr-4-processor-and-intel-xeonr-processor/

which also mentions the use HLT for the similar purpose, and the
rational for it.

---

I meant to mention that one place I saw this instruction used was in
CP/M-86 for the ibm-pc to halt the processor if the serial numbers
didn't match in two modules, the CCP and BDOS of the OS code. AIR,
the halt was handled on the original ibm-pc as a cpu halt that
required pressing the red reset button to restart the system. On
subsequent systems HLT was wired so that execution would restart after
the next interrupt occured. -AIR.

Steve

wolfgang kern

unread,

Jun 9, 2013, 1:39:52 PM6/9/13

to

Steve wrote:
...

|Thanks for all the comments, informative to me.

|Coincidently, this morning I found an app note AP-949 on the PAUSE
|instruction for spin locks:

http://software.intel.com/en-us/articles/ap949-using-spin-loops-on-intel-pentiumr-4-processor-and-intel-xeonr-processor/

PAUSE (this new '66 90') is quite different to wait ...

|which also mentions the use HLT for the similar purpose, and the
|rational for it.

HLT were also a vital instruction on Z-80, I actually connected
a LED to this HALT-pin just to show that the system is ready and
is waiting for somthing to occure ...

__
wolfgang
remember now good old times when we were able to calculate speed easy :)

hopcode

unread,

Jun 11, 2013, 1:35:57 AM6/11/13

to

Il 09.06.2013 18:44, s_dub...@nospicedham.yahoo.com ha scritto:
> Coincidently, this morning I found an app note AP-949 on the PAUSE
> instruction for spin locks:
>
> http://software.intel.com/en-us/articles/ap949-using-spin-loops-on-intel-pentiumr-4-processor-and-intel-xeonr-processor/
>
> which also mentions the use HLT for the similar purpose, and the
> rational for it.

that's the conventional way to smooth thread's aggressivity
on locked resources. it translates to a NOP on older processors.
PAUSE is essentially an hint to the processor before handling a locked
resource.
i experienced some speed improvements by using it, 15% and more.
i tested it on 32bit and QuadCore, but i cannot confirm it
for other CPU.
back to the HLT, i dont know much about privilege level 0 on x86
but if i interpret right this paper on int01

http://www.rcollins.org/secrets/opcodes/ICEBP.html

i suppose the HALT state may be activated using the HLT instruction too.
also it seems Intel implemented the HLT instruction for debugging
purpose

s_dub...@yahoo.com

unread,

Jun 15, 2013, 8:02:56 PM6/15/13

to

On Jun 9, 12:39 pm, "wolfgang kern" <nowh...@never.at> wrote:
> Steve wrote:
>
> ...
> |Thanks for all the comments, informative to me.
>
> |Coincidently, this morning I found an app note AP-949 on the PAUSE
> |instruction for spin locks:
>

> http://software.intel.com/en-us/articles/ap949-using-spin-loops-on-in...

>
> PAUSE (this new '66 90') is quite different to wait ...
>
> |which also mentions the use HLT for the similar purpose, and the
> |rational for it.
>
> HLT were also a vital instruction on Z-80, I actually connected
> a LED to this HALT-pin just to show that the system is ready and
> is waiting for somthing to occure ...
>
> __
> wolfgang
> remember now good old times when we were able to calculate speed easy :)

Oh, I remember the z-80 being seen as a huge improvement to the 8080,
with its block move instruction LDIR and the second set of registers.
About the difference in scale between the 32 bit pentium and the 32
bit multi-cores.

As to HLT on the pentium, here's a snippet of some sandbox code which
improved the polling loop count from approximately 0198h to 0002h.

;; I started with code by Chris Giese, fn; kbd:. Its purpose is to
wait
;; until 8042 keyboard controller is ready to accept a command or data
;; byte.
;; This is polling code, not an ISR. in al, 64h returns the status
byte
;; (bit 5 Aux Dev OBF, bit 3 C/D Flg, bit 1 inbufr, bit 0 outbufr).
;; Without hlt, the routine loops around about 0198h times on this
old
;; pentium mmx, 233mhz, machine, until both bit 0 and bit 1 become
clear,
;; indicating that the 8042 is ready to accept. With one hlt
instruction,
;; the loop count drops to 2, with two hlt instructions, the loop
count
;; drops to one. It seems like the hlt halts until a timer tick
interrupt.
;;
;; This is sandbox code, not fit for application, just for exploring
the
;; protocols for communicating with the 8042/keyboard. Currently the
Out-
;; put Buffer (in al,60h) isn't debouncing properly, a couple of
command
;; repeats are required to get the expected output returned (8042
self-
;; test, for example).

[SECTION .code]

clr_OBF: ;; read and discard
in al, 60h

clr_8042_busy: ;; get count, initially 0,

sti ;; entry state is cli, enable else hlt hangs.
hlt ;; -= test reducing loop count =- (~198h->2h)
hlt ;; (2->1)
cli

inc word [statCnt] ;; update it for looping count.

in al, 64h ;; get status, becomes
mov [stat8042], al ;; last recorded 'pass' value.

test al, 01h ;; bit 0 - Output Buffer Full @ 60h
jnz clr_OBF

test al, 02h ;; bit 1 - Input Buffer Full @ 64h, cmd write
;; or @ 60h, data write.
jnz clr_8042_busy ;; not clear yet, loop

;; --- test that Status Register's Command/Data Flag is Zero before
issuing commad ---

;- test al, 08h ;; bit 3 - C/D flag
;- jnz clr_8042_busy

;; --- test aux dev OBF ---

test al, 0010_0000b ;; bit 5
jnz clr_OBF

clc

mov [VidRow], byte 1
mov [VidCol], byte 40

Disp_Stat_byt:
mov al, [stat8042]
call Byte2_Ascii ;; al has byte, rets AH,AL ascii chrs
mov [msgStat8042], ah ;; of byte.
mov [msgStat8042 + 1], al

mov al, [statCnt+1] ;; hi byte of word
call Byte2_Ascii
mov [msgStatCnt], ah
mov [msgStatCnt+1], al

mov al, [statCnt] ;; lo byte of word
call Byte2_Ascii
mov [msgStatCnt+2], ah
mov [msgStatCnt+3], al

mov si, msgStat8042
call DisplayMessage ;; poke to vid mem at VidRow,VidCol

mov word [statCnt], 0 ;; reset for next call

RET

[SECTION .data]

stat8042: db 0
statCnt: dw 0

msgStat8042: db ' h :last stat ',
msgStatCnt: db ' h :loop cnt',0

-----

Steve

wolfgang kern

unread,

Jun 18, 2013, 1:22:23 PM6/18/13

to

Steve wrote:

...

> remember now good old times when we were able to calculate speed easy :)

|Oh, I remember the z-80 being seen as a huge improvement to the 8080,
|with its block move instruction LDIR and the second set of registers.
|About the difference in scale between the 32 bit pentium and the 32
|bit multi-cores.

Yeah, Z80/NSC800 were the ones which needed only one supply-voltage
while 8085 still needed three ...
Unfortunately Zilog couln't convince IBM, so we had to live with Intel
for several decades. AMD with 86_64 reduced this misfortune a bit ...

Z_280+ were Haward oriented (code and data on apart busses), and
we see this also when we look at modern x86 caching methods (I+D).

|As to HLT on the pentium, here's a snippet of some sandbox code which
|improved the polling loop count from approximately 0198h to 0002h.

Yes, HLT were used as easy lazy delays (until next IRQ timer-tick),
but this delayloops often resulted in missed hw-events because ill
CS-theory told to direct connect the event-handlers to IRQ-handlers.
Today we know better, so HLT may be only found in job-aware idle-loops
or in speed-measure-tools.

[code...]
__
wolfgang

Robert Wessel

unread,

Jun 18, 2013, 5:25:33 PM6/18/13

to

On Tue, 18 Jun 2013 19:22:23 +0200, "wolfgang kern" <now...@never.at>
wrote:

>
>Steve wrote:
>
>
>...
>> remember now good old times when we were able to calculate speed easy :)
>
>|Oh, I remember the z-80 being seen as a huge improvement to the 8080,
>|with its block move instruction LDIR and the second set of registers.
>|About the difference in scale between the 32 bit pentium and the 32
>|bit multi-cores.
>
>Yeah, Z80/NSC800 were the ones which needed only one supply-voltage
>while 8085 still needed three ...
>Unfortunately Zilog couln't convince IBM, so we had to live with Intel
>for several decades. AMD with 86_64 reduced this misfortune a bit ...

Zilog had nothing to convince IBM with. The Z-80 was purely 8-bit,
and the Z-8000 was a disaster and so bug ridden at the time that it
was unusable (several years later it did gain some traction in the
*nix world). And while the ISA would have been modestly preferable to
the 8086 (assuming it had actually been working), the segmentation
scheme made the x86's look good.

>Z_280+ were Haward oriented (code and data on apart busses), and
>we see this also when we look at modern x86 caching methods (I+D).

The Z-280 was at least a decade too late. Even its predecessor, the
Z-800, didn't ship until 1985 (and then only sort-of), which would
have been about seven years too late for IBM to select for the PC.

The Z-280 was also not Harvard in any sense. It did have a single
small cache, which you could configure as memory, or allow it to cache
instructions and/or data, and there were (via the MMU) provisions for
distinguishing instruction accesses from data access on the *single*
memory bus, you could do that on the 8086 as well (via the S0/S1/S2
signals)..

The real shame was that IBM didn't pick the 68K (and would have been a
fairly clean 32 bit ISA), which was available, although only in 16-bit
bus form (the 68008 was too late as well, and also very slow). IBM
did use the 68K in some other systems.

wolfgang kern

unread,

Jun 19, 2013, 4:57:20 AM6/19/13

to

Robert Wessel wrote:
...

>>Yeah, Z80/NSC800 were the ones which needed only one supply-voltage
>>while 8085 still needed three ...
>>Unfortunately Zilog couln't convince IBM, so we had to live with Intel
>>for several decades. AMD with 86_64 reduced this misfortune a bit ...

> Zilog had nothing to convince IBM with. The Z-80 was purely 8-bit,
> and the Z-8000 was a disaster and so bug ridden at the time that it
> was unusable (several years later it did gain some traction in the
> *nix world). And while the ISA would have been modestly preferable to
> the 8086 (assuming it had actually been working), the segmentation
> scheme made the x86's look good.

Why I once (1979) decided for Zilog instead of Intel were the more
convenient hardware design, like less required external chips.
I still have one working Z-80 'PC' with 1MB static RAM (built 1983).

>>Z_280+ were Haward oriented (code and data on apart busses), and
>>we see this also when we look at modern x86 caching methods (I+D).

> The Z-280 was at least a decade too late. Even its predecessor, the
> Z-800, didn't ship until 1985 (and then only sort-of), which would
> have been about seven years too late for IBM to select for the PC.

Yes too late for a change then ... how would todays PC-world look like
if IBM would have pumped money into Zilog instead of Intel ?

> The Z-280 was also not Harvard in any sense. It did have a single
> small cache, which you could configure as memory, or allow it to cache
> instructions and/or data, and there were (via the MMU) provisions for
> distinguishing instruction accesses from data access on the *single*
> memory bus, you could do that on the 8086 as well (via the S0/S1/S2
> signals)..

I only produced a handfull Z-280 machines (1987), and yeah you're
right, the code/data split where the MSB on a single bus which I
used to drive apart busses.

> The real shame was that IBM didn't pick the 68K (and would have been a
> fairly clean 32 bit ISA), which was available, although only in 16-bit
> bus form (the 68008 was too late as well, and also very slow). IBM
> did use the 68K in some other systems.

Motorola MCUs weren't bad either, I programmed a lot for HC11.
IIRC PowerPC were an IBM-Motorola design.
__
wolfgang

Robert Wessel

unread,

Jun 19, 2013, 5:26:44 AM6/19/13

to

On Wed, 19 Jun 2013 10:57:20 +0200, "wolfgang kern" <now...@never.at>
wrote:

>

>Robert Wessel wrote:
>...
>>>Yeah, Z80/NSC800 were the ones which needed only one supply-voltage
>>>while 8085 still needed three ...
>>>Unfortunately Zilog couln't convince IBM, so we had to live with Intel
>>>for several decades. AMD with 86_64 reduced this misfortune a bit ...
>
>> Zilog had nothing to convince IBM with. The Z-80 was purely 8-bit,
>> and the Z-8000 was a disaster and so bug ridden at the time that it
>> was unusable (several years later it did gain some traction in the
>> *nix world). And while the ISA would have been modestly preferable to
>> the 8086 (assuming it had actually been working), the segmentation
>> scheme made the x86's look good.
>
>Why I once (1979) decided for Zilog instead of Intel were the more
>convenient hardware design, like less required external chips.
>I still have one working Z-80 'PC' with 1MB static RAM (built 1983).

It's certainly true that building an 8080 into a system was a right
PITA, although the need for external clock generators, multiple supply
voltages, and complex buffering were more artifacts of the very
limited technology available for the 8080. But most of that went away
with the 8085 (which was obviously a response of sorts to the Z-80),
and was never an issue with the 8086/88.

>>>Z_280+ were Haward oriented (code and data on apart busses), and
>>>we see this also when we look at modern x86 caching methods (I+D).
>
>> The Z-280 was at least a decade too late. Even its predecessor, the
>> Z-800, didn't ship until 1985 (and then only sort-of), which would
>> have been about seven years too late for IBM to select for the PC.
>
>Yes too late for a change then ... how would todays PC-world look like
>if IBM would have pumped money into Zilog instead of Intel ?

I suspect they'd look very much like the do now, but with a different
ISA. And if x86 has taught us anything, it's that barring horrible
flaws, the ISA doesn't matter much over the long haul.