[Haskell-cafe] ghc on xeon phi?

12 views
Skip to first unread message

Johannes Waldmann

unread,
Feb 20, 2017, 7:28:56 AM2/20/17
to haskel...@haskell.org
Dear Cafe -

would a program compiled by ghc, or ghc itself,
run as-is on Intel Xeon Phi (KNL)?

I found this reference
http://stackoverflow.com/questions/22253311/running-haskell-on-xeon-phi
but it seems to be about the pre-KNL version.

Thanks - J.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Erik de Castro Lopo

unread,
Feb 21, 2017, 3:36:51 AM2/21/17
to haskel...@haskell.org
Johannes Waldmann wrote:

> would a program compiled by ghc, or ghc itself,
> run as-is on Intel Xeon Phi (KNL)?
>
> I found this reference
> http://stackoverflow.com/questions/22253311/running-haskell-on-xeon-phi
> but it seems to be about the pre-KNL version.

Xeon Phi is effectively a different architecture, in the same way
that the ARM architecture is different from the x86_64 architecture.

Currently GHC does not support the Xeon Phi architecture directly.
It may however (as the SO response suggests) be possible to generate
C code from GHC and compile that C code with with a C compiler that
can generate Xeon Phi binaries. However, I am sure that route would
be a signicant yak shaving exercise.

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Erik de Castro Lopo

unread,
Feb 21, 2017, 3:40:11 AM2/21/17
to haskel...@haskell.org
Erik de Castro Lopo wrote:

> Xeon Phi is effectively a different architecture, in the same way
> that the ARM architecture is different from the x86_64 architecture.

More info on the Xeon Phi here:

https://en.wikipedia.org/wiki/Xeon_Phi

suggests that maybe it isn't a different architecture.

Brandon Allbery

unread,
Feb 21, 2017, 3:57:51 AM2/21/17
to haskell-cafe

On Tue, Feb 21, 2017 at 3:35 AM, Erik de Castro Lopo <mle...@mega-nerd.com> wrote:
It may however (as the SO response suggests) be possible to generate
C code from GHC and compile that C code with with a C compiler that
can generate Xeon Phi binaries.

ghc hasn't generated C code for a while, aside from unregisterised.

It's not truly a different architecture, but a reorganization of the standard architecture. Unfortunately, ghc doesn't currently make good use of the key Xeon Phi components even in the standard architecture; packages that want to make use of them generally use -fllvm because LLVM is better at using them, even given that LLVM isn't very good at understanding what ghc feeds it. This suggests that -fflvm might be useful in taking advantage of Xeon Phi architecture with ghc.

--
brandon s allbery kf8nh                               sine nomine associates
allb...@gmail.com                                  ball...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

Levent Erkok

unread,
Feb 21, 2017, 1:02:35 PM2/21/17
to Johannes Waldmann, Haskell Cafe
Johannes:


KNL supports "up-to" AVX-512 instruction set, making it capable of executing binaries that are compiled for, say, regular Xeon machines. The only notable exception are the TSX instructions (the instructions for transactional memory), which I don't believe is generated via the GHC compile pipeline anyhow.

So, in theory, you can take an arbitrary binary compiled for a modern x86 machine (say any of the Core line), and run it unmodified on the KNL. Of course, the issue is going to be the software stack: Binaries don't exist in isolation, you also need dynamic-loaded libraries. So, you might have issues with, say, GMP or other libs if they are not yet ported to KNL. (Static linking might be a huge headache.)

In practice, however, this would be rather wasteful: The whole point of the Xeon-phi is the availability of large-vector sized floating-point support and many-many cores. If you're running a binary that makes no use of those instructions and is single-threaded, then you will not gain anything. In fact, the single-threaded performance might suffer compared to a regular Xeon machine. Of course, this all depends on what you want to do.

Projects like DPH, however, can take great advantage of the Xeon-phi architecture; by parallelizing number-crunching algorithms and distributing over many cores. (https://wiki.haskell.org/GHC/Data_Parallel_Haskell). However, I'm not familiar with the current status of DPH and related projects to opine weather they aim to target AVX-512 and many cores afforded by the Xeon-phi. I'd love to hear if anyone had more recent info on that.

-Levent.


Johannes Waldmann

unread,
Feb 21, 2017, 1:30:19 PM2/21/17
to Levent Erkok, Haskell Cafe
Hi Levent,

> The whole point of the Xeon-phi is the availability of
> large-vector sized floating-point support and many-many cores.

Sure, that's what I'm contemplating - use the many options
of writing parallel and concurrent Haskell programs.

So, GHC's RTS should "just work"?

I was hoping someone already had actually seen this
on their machine.

Dominic Steinitz

unread,
Feb 22, 2017, 10:38:50 AM2/22/17
to Haskell cafe, johannes...@htwk-leipzig.de
> Hi Levent,
>
>> The whole point of the Xeon-phi is the availability of
>> large-vector sized floating-point support and many-many cores.
>
> Sure, that's what I'm contemplating - use the many options
> of writing parallel and concurrent Haskell programs.
>
> So, GHC's RTS should "just work"?
>
> I was hoping someone already had actually seen this
> on their machine.
>
> - J.

I recall we thought about using a Xeon phi for our application but abandoned the attempt for reasons I cannot recall. I’ll see if I can find out.

But maybe you would better using something like accelerate-llvm? I think you wrote your application was numerical.

BTW I am currently adding a small improvement to ghc floating point code generation but I am beginning to think it’s going to be difficult to get ghc to generate really good numerical / floating point code and I am experimenting with accelerate-llvm.

Dominic

Levent Erkok

unread,
Feb 23, 2017, 6:51:19 PM2/23/17
to Johannes Waldmann, Haskell Cafe
Johannes:

I'm happy to report that I was able to do this experiment, and it indeed worked just fine. I compiled a toy program (along the lines of "hello world") using GHC-8.0.1; took the generated binary to a KNL machine, and ran it without any issues. I then repeated the same with a much bigger interactive Haskell program, and while I didn't test all aspects of it, I was able to start it on the KNL machine as well. (This latter program has quite a bit of dependencies on various Haskell libraries.) So, at least from those two experiments, I think there's a lot of hope that you can just copy over a GHC generated binary and expect it to run unmodified.

The machine I compiled it on have the following characteristics:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.0.1
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               1200.000
BogoMIPS:              5199.87
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15

And the machine I ran it on (which doesn't have ghc installed):

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                68
On-line CPU(s) list:   0-67
Thread(s) per core:    1
Core(s) per socket:    68
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 87
Model name:            Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Stepping:              1
CPU MHz:               1400.000
BogoMIPS:              2793.32
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
NUMA node0 CPU(s):     0-67


So, it does appear that Intel's "binary-compatible" claim is indeed holding up. I'd be happy to do some "small" experiments if you're particularly worried about some particular feature; let me know.

Cheers,

-Levent.
Reply all
Reply to author
Forward
0 new messages