How I installed PyFR 1.6.0 with CUDA backend (and dependancies) on Windows 10

721 views
Skip to first unread message

Nolan Dyck

unread,
Jun 30, 2017, 4:39:58 PM6/30/17
to PyFR Mailing List
Hello PyFR community,

I recently met Brian Vermere at a conference, who told me in conversation about PyFR and all its sleek features. I'm having some issues running large ANSYS CFX simulations on a SHARCNET computing cluster with memory allocation and poor cpu usage, and I wanted to take PyFR for a test drive as a possible substitute. The key here is I want to test out a bunch of stuff on my local machine before I decide to submit a formal request to have it installed on a HPC cluster.

I very quickly got the impression when browsing the website that PyFR is Linux focused (for good reason too!). So I tried installing PyFR (and dependancies) on Ubuntu (Xenial) running as a guest OS in VirtualBox hosted by Windows 7. Since my work PC is equipped with a GTX 460, it made sense to try the CUDA backend. Everything went smoothly until the part where I had to install CUDA. Since the guest OS doesn't have direct access to the underlying hardware, a workaround is needed: PCI-passthrough. There's not a lot of clear information on how this can be applied to the CUDA library, and I didn't see anyone else who has done this with PyFR so I decided to try Windows instead.

I decided to try it on my home PC (because why not ;) ), equipped with AMD Ryzen 1600X and Nvidia GeForce GTX 1060 SC running Windows 10.

Installation Steps
Below is a summary of (from my experience) the installation steps necessary to install PyFR on Windows 10:

KEEP IN MIND:
-install/build 64-bit applications and libraries wherever possible.
-test software / module installations as you go.
-After doing this, I discovered Microsoft now supports a package manager called vcpkg, which can be used like apt-get on Ubuntu. I tested it with box2d, Lua, and METIS all of which downloaded and built without issue! Also on the list of packages is MS-MPI and CUDA (all latest versions). 
  1. Download and install Visual Studio 2015 / 2017 (I already had 2017 installed, community editions will probably suffice).
  2. Download and install the Visual Studio 2015 Redistributable packages (I don't think it's necessary if you installed VS2015). This is necessary because we need to install a 64-bit version of Python (see step []), and Python 3.5 and 3.6 are the first two versions which are distributed in 64-bit flavors on Windows.* This is also necessary for CUDA (and MS-MPI I think).
  3. Download and install Microsoft MPI**. You'll need both the library and the executables. Here is the link to version 8.1
  4. Add the directory where the executables were installed to your PATH.
  5. Download and install CUDA. Make sure that you let CUDA install the its own graphics drivers using the express install (lest you run into the issue in this thread) (I installed version 8.0.61). No need for Visual Studio Integration.
  6. Navigate to %CUDA_PATH%\bin\ and make a copy of the file cublas64_80.dll, name the copy cublas.dll (PyFR looks for cublas.dll, and we don't want to dissapoint :) )
  7. Download and install a 64-bit version of Python 3.5+ (I got 3.6.1 from here).
  8. Add <PYTHON_ROOT>\Scripts to your PATH.
  9. Install the following modules using pip, letting it find and install dependencies as necessary:
    1. Install numpy from here (I also want to use scipy for other projects, but scipy depends on the Intel Math Kernel Library dependent functions in numpy).
    2. Install mpi4py (allow pip to find online). I didn't have any problems on my home PC, but I had to edit the configuration file (C:\Program Files\Python36\Lib\site-packages\mpi4py\mpi.cfg) to point to my MS-MPI library and executable directories on my work pc.
    3. Install pycuda from here (I initially tried to install using pip, but there is a strange issue where `import pycuda.autoinit` causes Python to crash).
    4. Install pyfr (allow pip to find online).
  10. Test the couette_flow_2d example in pyfr! The example cases aren't included in the installed pyfr module, so just download the version from the PyFR website, and follow the instructions at the bottom of the User Guide page. Hopefully it works!
  11. Build METIS:
    1. Download and install cmake (I got version 3.9.0-rc5)
    2. Download and unpack METIS (I got version 5.1.0)
    3. Follow the BUILD-Windows instructions cmake-gui option, and tick the SHARED checkbox before hitting generate (PyFR needs the shared library or .dll file, not the static .lib)
  12. Create a new environment variable called PYFR_METIS_LIBRARY_PATH, and set the value to the fully qualified path of the METIS .dll (e.g. C:/Program Files/METIS/metis.dll). PyFR looks using this environment variable before searching anywhere else.
  13. This step involves editing the PyFR installed source (hopefully it will be obselete soon). For Python >= 3.5 the ctypes module is unable to find the Visual Studio C Runtime Library using the find_msvcrt() function (see this bug report), and it looks like the method for accessing those standard libraries in Windows has changed substantially. I just messed around with the ctypes module until I was able to access the required function (fflush). The result is a tweaked constructor for the `Silence` object in util.py (see below).
  14. Run the euler_vortex_2d and inc_cylinder_2d examples. Visualize them in Paraview if desired.
    def __init__(self, stdout=os.devnull, stderr=os.devnull):
       
import sys


       
self.outfiles = stdout, stderr
       
self.combine = (stdout == stderr)


       
if sys.platform == 'win32':
           
import ctypes
           
self.libc_fflush = ctypes.windll.msvcrt.fflush
       
else:
           
self.libc_fflush = CDLL(find_libc()).fflush


       
self.libc_fflush.argtypes = [c_void_p]

Hopefully the creators can implement this change a little more elegantly than I can.

Notes:
* The PyFR website currently says that a 64-bit version of python is required because of a bug in numpy. I'm not sure what that bug is. Even if that bug is fixed, we still need the 64-bit version of python. This is because, as of CUDA version 7.0 the 32-bit version of cublas is no longer supported on Windows (it's even deprecated on Linux!), and to work with the 64-bit cublas dll in PyFR, the python installation needs to be 64-bit. I guess everything could work with 32-bit if you used CUDA <= 6.X?

** I've looked into several versions of MPI for Windows:
- OpenMPI: Hasn't supported Windows since version 1.6.5, which means it doesn't meet PyFR's requirements. I even downloaded it and tried to see if I could build it myself, but it has a heavy dependence on make+unix commands
- IntelMPI: Starts from $499... Nope nope nope. I got a free trial which I'm going to test out anyway (on my work PC which has an i7)
- IBM Platform MPI is the poor man's IBM Spectrum MPI. Spectrum MPI is cuda-aware, and I'll find out soon enough if Platform MPI is as well
- MS-MPI easy to install, free, MPI 2.? standard according to Wikipedia.

Questions
Anything missing from the instructions or other hangups people encountered?
Has anyone tried the PCI-Passthrough for their VM?

Next Steps
Next I'm going to try and fix the weird command-line printout while running (see below).

Also, I'd like to do some informal benchmarking on my system. All I can say right now is that the first two examples take a few minutes and the last one takes ~20 minutes or so? I'll probably create a separate thread for that.




Brian Vermeire

unread,
Jul 1, 2017, 10:36:51 AM7/1/17
to Nolan Dyck, PyFR Mailing List
Hi Nolan,

Thank you for providing your experience and guide for installing PyFR on Windows. 

One thing to be aware of, the GTX 1060 has low double precision compute capability (as do nearly all of the GeForce series cards). If you have access to a Tesla GPU you will likely see a very significant decrease in runtime for the example cases.

You could also try switching to single precision in the .ini file, which should run much faster on your system for trying PyFR initially (as the 1060 has good single precision compute). However, this comes at the cost of losing double precision accuracy, so I don't recommend it for production simulations.

Cheers,

--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailinglist+unsubscribe@googlegroups.com.
To post to this group, send email to pyfrmailinglist@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.



--

Brian

nnunn

unread,
Jul 1, 2017, 6:51:05 PM7/1/17
to PyFR Mailing List
Hi Nolan Dyck - thanks for the helpful setup instructions!
Re: "weird command-line printout...", I run Visual Studio 2015 on Win7, and saw that sort of output when stepping under Visual Studio debugger.
For normal runs, I get a neat console-mode "progress bar" by adding the following block near the end of pyfr/progress_bar.py:

        # Write the progress bar and pad the remaining columns
        if sys.platform == 'win32':
            #  NN: for windows console
            sys.stdout.write('\b' * 80)
            sys.stdout.write('\r')
            sys.stdout.write(s)
            sys.stdout.flush()
        else:
            sys.stderr.write('\x1b[2K\x1b[G')
            sys.stderr.write(s)
            sys.stderr.flush()

        # Update the last render time
        self._last_wallt = wallt


Nigel



On Saturday, July 1, 2017 at 6:39:58 AM UTC+10, Nolan Dyck wrote:

Hello PyFR community,


Nolan Dyck

unread,
Jul 2, 2017, 6:24:46 PM7/2/17
to PyFR Mailing List
Brian,

Thanks for the info! I changed the precision to 'single' in the .ini file, and the Couette flow took slightly longer than the double precision case (5:39 vs 5:43)! I don't really know what's going on right now so I'm going to start by testing my CUDA installation, pycuda, and pyfr, respectively to see what the issue is. Do you have any ideas off the top of your head?

Considering that you could trade 4 Nvidia Tesla P100 cards for a Tesla Model 3 at current market values, I don't think there will be any just lying around at school. However, if I can get some of my cases working, the new graham cluster is equipped with a bunch of heterogeneous computing nodes with Tesla P100s.

Nolan

Nolan Dyck

unread,
Jul 2, 2017, 6:56:01 PM7/2/17
to PyFR Mailing List
Nigel,

Thanks for the code snippit. I pasted it in, and it didn't work right away. There's an extra space printed at the beginning of each line for some reason, and I couldn't figure out where it was being printed. I ended up just shortening the expected column width by 1 by adding the next couple lines to the initialization function.

        self._ncol = shutil.get_terminal_size()[0] or 80
        if sys.platform == 'win32':
            self._ncol -= 1

Also, I didn't need the sys.stdout.write('\b' * 80) line.

Nolan

Vincent, Peter E

unread,
Jul 2, 2017, 7:09:31 PM7/2/17
to Nolan Dyck, PyFR Mailing List
Hi Nolan,

The (2D) test cases that ship with PyFR are all pathological on terms of their ability to assess performance since they are so small.

You should try using one of the bigger 3D cases from one of our recent papers (setup files should be available as supplementary material)

Thanks for your interest in PyFR!

Peter

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailingli...@googlegroups.com.
To post to this group, send email to pyfrmai...@googlegroups.com.

Freddie Witherden

unread,
Jul 2, 2017, 7:20:29 PM7/2/17
to pyfrmai...@googlegroups.com
Hi all,

On 02/07/2017 15:56, Nolan Dyck wrote:
> Thanks for the code snippit. I pasted it in, and it didn't work right
> away. There's an extra space printed at the beginning of each line for
> some reason, and I couldn't figure out where it was being printed. I
> ended up just shortening the expected column width by 1 by adding the
> next couple lines to the initialization function.


The progress bar requires a console which understands VT100 escape
sequences, for these are the only way to get the bar to redraw reliably
(everything else is somewhat fragile and liable to break). Windows has
traditionally not supported such sequences, although it does appear as
if the most recent builds of Windows 10 do. However, in order for them
to function one needs to call into the Win32 API to enable them. See:

<https://msdn.microsoft.com/en-us/library/windows/desktop/mt638032(v=vs.85).aspx>

I'd be happy to take a patch that takes care of this (but it is
important that this does not cause issues for those who are not yet on
Windows 10 or running PyFR via other means).

Regards, Freddie.

signature.asc
Reply all
Reply to author
Forward
0 new messages