Hello all,
I am interested in revisiting the return ABI of _Float16 on i386.
Currently it is returned in xmm0, meaning SSE is required for the type.
This is rather inconvenient when _Float16 is otherwise quite well
supported. Compilers need to pick between hacking together a custom ABI
that works on the baseline, or passing the burden on to users to gate
everything.
Is there any interest in adjusting the specification such that _Float16
is returned in a GPR rather than SSE?
This was brought up before in the thread at [1], with the concern about
efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem
to be relevant, however, given there isn't any reason to have a _Float16
in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and
PEXTRW to/from memory (unless I am missing something?).
A sample patch to the psABI is below. Needless to say there are
compatibility concerns that come from a change but given workarounds
already exist (e.g. in LLVM), it seems worth considering whether
something should be codefied to make this simpler for everyone.
Best regards,
Trevor
[1]:
https://inbox.sourceware.org/gcc-patches/20210701210537.5...@gmail.com/
(some CCs added from the linked discussion)
--- patch follows ---
From 1af72db89f9a10b93569fa0b9f64f65f2dd73334 Mon Sep 17 00:00:00 2001
From: Trevor Gross <
tmg...@umich.edu>
Date: Fri, 23 Jan 2026 21:11:43 +0000
Subject: [PATCH] Return _Float16 and _Complex _Float16 in GPRs
Currently the ABI specifies that _Float16 is to be passed on the stack
and returned in xmm0, meaning SSE is required to support the type.
Adjust both _Float16 and _Complex _Float16 to return in eax, dropping
the SSE requirement.
This has the benefit of making _Float16 ABI-compatible with `short`.
---
low-level-sys-info.tex | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 0015c8c..a2d8d6d 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -384,8 +384,7 @@ of some 64bit return types & No \\
\ESI & callee-saved register & yes \\
\EDI & callee-saved register & yes \\
\reg{xmm0} & scratch register; also used to pass the first \code{__m128}
- parameter and return \code{__m128}, \code{_Float16},
- \code{_Complex _Float16} & No \\
+ parameter and return \code{__m128} & No \\
\reg{ymm0} & scratch register; also used to pass the first \code{__m256}
parameter and return \code{__m256} & No \\
\reg{zmm0} & scratch register; also used to pass the first \code{__m512}
@@ -472,7 +471,11 @@ and \texttt{unions}) are always returned in memory.
& \texttt{\textit{any-type} *} & \EAX \\
& \texttt{\textit{any-type} (*)()} & \\
\hline
- & \texttt{_Float16} & \reg{xmm0} \\
+ & \texttt{_Float16} & \reg{ax} \\
+ & & The upper 16 bits of \EAX are undefined.
+ The caller must not \\
+ & & rely on these being set in a predefined
+ way by the called function. \\
\cline{2-3}
& \texttt{float} & \reg{st0} \\
\cline{2-3}
@@ -484,7 +487,7 @@ and \texttt{unions}) are always returned in memory.
\cline{2-3}
& \texttt{__float128} & memory \\
\hline
- & \texttt{_Complex _Float16} & \reg{xmm0} \\
+ & \texttt{_Complex _Float16} & \reg{eax} \\
& & The real part is returned in bits 0..15. The imaginary part is
returned \\
& & in bits 16..31.\\
--
2.50.1 (Apple Git-155)