Error running beagle_GPU Transposed

117 views
Skip to first unread message

Atila Iamarino

unread,
Mar 14, 2012, 4:32:34 PM3/14/12
to beagl...@googlegroups.com
Dear Rambaut,

Here is the output of 'nvidiasmi -q':

==============NVSMI LOG==============

Timestamp                       : Wed Mar 14 17:28:39 2012

Driver Version                  : 295.20

Attached GPUs                   : 1

GPU 0000:01:00.0
    Product Name                : GeForce GTX 480
    Display Mode                : N/A
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : N/A
    GPU UUID                    : N/A
    VBIOS Version               : 70.00.35.00.00
    Inforom Version
        OEM Object              : N/A
        ECC Object              : N/A
        Power Management Object : N/A
    PCI
        Bus                     : 0x01
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x06C010DE
        Bus Id                  : 0000:01:00.0
        Sub System Id           : 0x22801462
        GPU Link Info
            PCIe Generation
                Max             : N/A
                Current         : N/A
            Link Width
                Max             : N/A
                Current         : N/A
    Fan Speed                   : 40 %
    Performance State           : N/A
    Memory Usage
        Total                   : 1535 MB
        Used                    : 276 MB
        Free                    : 1259 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : N/A
        Memory                  : N/A
    Ecc Mode
        Current                 : N/A
        Pending                 : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
            Double Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
        Aggregate
            Single Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
            Double Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Total           : N/A
    Temperature
        Gpu                     : 35 C
    Power Readings
        Power Management        : N/A
        Power Draw              : N/A
        Power Limit             : N/A
    Clocks
        Graphics                : N/A
        SM                      : N/A
        Memory                  : N/A
    Max Clocks
        Graphics                : N/A
        SM                      : N/A
        Memory                  : N/A
    Compute Processes           : Not Supported


Thank you,
Atila

Em quarta-feira, 14 de março de 2012 06h08min18s UTC-3, rambaut escreveu:
Dear Atila,

This sounds like one of two things - either an issue with the CUDA driver you have installed or a physical problem with your GPU board. If you have installed CUDA tools, try running 'nvidia-smi -q' to get a detailed probe of your GPU board.

Andrew

On 13 Mar 2012, at 15:09, Atila Iamarino wrote:

> Dear Rambaut,

> Yes, it works with -beagle_cpu and -beagle_sse. Also, my GPU is recognized, but I checked BEAGLE installation and it fails the tests. I haev forwarded the error message to beagle-dev group.

> Thank you,
> Atila

> Em terça-feira, 13 de março de 2012 07h43min51s UTC-3, rambaut escreveu:
> Dear Atila,
> To answer your last question, it looks like it is a problem in BEAGLE. Firstly, does it work if you use -beagle_cpu or -beagle_sse? If so, it may be an issue with CUDA. Have you installed the latest CUDA drivers? If you run the 'nvidia-smi' utility does it recognize your GPU?

> Andrew

> On 12 Mar 2012, at 21:34, Atila wrote:

> > Hi,
> > 
> > I'm trying to run BEAST with CUDA. I have tried  ../../bin/beast -
> > beagle_GPU -overwrite benchmark2.xml both with BEAST1.6.2 and 1.7.0
> > and get the following error:
> > 
> > 
> > Using BEAGLE TreeLikelihood
> >  Branch rate model used: strictClockBranchRates
> >  Using BEAGLE resource 1: GeForce GTX 480
> >    Global memory (MB): 1536
> >    Clock speed (Ghz): 1.50
> >    Number of cores: 480
> >    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH
> > EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_NONE THREADING_NONE
> > PROCESSOR_GPU
> >  Ignoring ambiguities in tree likelihood.
> >  With 5565 unique site patterns.
> >  Using rescaling scheme : dynamic
> > Creating the MCMC chain:
> >  chainLength=100000
> >  autoOptimize=true
> >  autoOptimize delayed for 1000 steps
> >  full evaluation test off
> > #
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007fb9f020d83d, pid=23711,
> > tid=140436735981312
> > #
> > # JRE version: 7.0_03-b04
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (22.1-b02 mixed mode
> > linux-amd64 compressed oops)
> > # Problematic frame:
> > # C  [libhmsbeagle-cuda.so.1+0x1283d]
> > beagle::gpu::BeagleGPUImpl<double>::updateTransitionMatrices(int, int
> > const*, int const*, int const*, double const*, int)+0xed
> > #
> > # Failed to write core dump. Core dumps have been disabled. To enable
> > core dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # An error report file with more information is saved as:
> > # /home/atila/applications/BEASTv1.7.pre/examples/Benchmarks/
> > hs_err_pid23711.log
> > #
> > # If you would like to submit a bug report, please visit:
> > #   http://bugreport.sun.com/bugreport/crash.jsp
> > 
> > I tried to use open-jdk and the error only changes to
> > 
> > CUDA error: "Invalid value" from file <GPUInterfaceCUDA.cpp>, line
> > 485.
> > 
> > and then:
> > 
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007f5871d38b50, pid=23950,
> > tid=140018069825280
> > #
> > # JRE version: 6.0_23-b23
> > # Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64
> > compressed oops)
> > # Derivative: IcedTea6 1.11pre
> > # Distribution: Ubuntu 11.10, package 6b23~pre11-0ubuntu1.11.10.2
> > # Problematic frame:
> > # C  [libhmsbeagle-cuda.so.1+0x11b50]
> > beagle::gpu::BeagleGPUImpl<float>::removeScaleFactors(int const*, int,
> > int)+0x30
> > 
> > Is this a problem from BEAST ou beagle-lib?
> > 
> > Thank you,
> > Atila Iamarino
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "beast-users" group.
> > To post to this group, send email to beast...@googlegroups.com.
> > To unsubscribe from this group, send email to beast-users+unsubscribe@googlegroups.com.
> > For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.
> >







> -- 
> You received this message because you are subscribed to the Google Groups "beast-users" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/IvmQlmZ7qxEJ.
> To post to this group, send email to beast...@googlegroups.com.
> To unsubscribe from this group, send email to beast-users+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.

___________________________________________________________________
  Andrew Rambaut                
  Institute of Evolutionary Biology       University of Edinburgh
  Ashworth Laboratories                         Edinburgh EH9 3JT
  EMAIL - a.ra...@ed.ac.uk                TEL - +44 131 6508624          

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Atila Iamarino

unread,
Mar 15, 2012, 2:06:58 PM3/15/12
to beagl...@googlegroups.com
I am running memtestG80 to see if it is a cuda ou GPU issue, and it seems fine: 

Test iteration 54 (GPU 0, 1024 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (36 ms)
Memtest86 Walking 8-bit: 0 errors (283 ms)
True Walking zeros (8-bit): 0 errors (139 ms)
True Walking ones (8-bit): 0 errors (140 ms)
Moving Inversions (random): 0 errors (36 ms)
Memtest86 Walking zeros (32-bit): 0 errors (558 ms)
Memtest86 Walking ones (32-bit): 0 errors (557 ms)
Random blocks: 0 errors (271 ms)
Memtest86 Modulo-20: 0 errors (1830 ms)
Logic (one iteration): 0 errors (21 ms)
Logic (4 iterations): 0 errors (38 ms)
Logic (shared memory, one iteration): 0 errors (24 ms)
Logic (shared-memory, 4 iterations): 0 errors (52 ms)

Atila Iamarino

unread,
Mar 15, 2012, 4:27:27 PM3/15/12
to beagl...@googlegroups.com
Ok, after many troubles to re-install NVIDIA drivers, dev-drivers, toolkit in Ubuntu 11.10 I got the SDK to work. Yet, beagle-lib fails 'make check' 

./genomictest.sh: line 2: 14892 Segmentation fault      ./genomictest --states 64 --sites 100 --taxa 10

Here is the output of NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery:

 Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce GTX 480"
  CUDA Driver Version / Runtime Version          4.1 / 4.1
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 1536 MBytes (1610153984 bytes)
  (15) Multiprocessors x (32) CUDA Cores/MP:     480 CUDA Cores
  GPU Clock Speed:                               1.50 GHz
  Memory Clock rate:                             2000.00 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.1, CUDA Runtime Version = 4.1, NumDevs = 1, Device = GeForce GTX 480
[deviceQuery] test results...
PASSED

Andrew Rambaut

unread,
Mar 16, 2012, 3:29:18 AM3/16/12
to beagl...@googlegroups.com
Hi Atila,

Do the other programs in the SDK work? Try a simple one ilke 'matrixMul' or 'matrixMulDrv'.

Andrew

Reply all
Reply to author
Forward
0 new messages