Hi guys.
I have the Intel Knights Landing server and I set slurm on my knl server.
but, I have failed to submit test jobs to use high bandwidth memory.
I need your help.
When I run my command, this is my slurmctld.log
#srun --gres=hbm:1 numactl --membind=1 mpirun -np 24 osu_latency
===============================================================
[2017-03-02T17:19:17.426] slurmctld version 17.02.0-0pre4 started on cluster cluster
[2017-03-02T17:19:17.430] AllowMCDRAM=cache,hybrid,flat,equal,auto AllowNUMA=a2a,snc2,hemi,quad
[2017-03-02T17:19:17.430] AllowUserBoot=ALL
[2017-03-02T17:19:17.430] DefaultMCDRAM=flat DefaultNUMA=quad
[2017-03-02T17:19:17.431] McPath=/sys/devices/system/edac/mc
[2017-03-02T17:19:17.431] SyscfgPath=/usr/bin/syscfg/syscfg
[2017-03-02T17:19:17.431] UmeCheckInterval=0
[2017-03-02T17:19:17.434] layouts: no layout to initialize
[2017-03-02T17:19:17.451] layouts: loading entities/relations information
[2017-03-02T17:19:17.451] Recovered state of 1 nodes
[2017-03-02T17:19:17.452] Recovered information about 0 jobs
[2017-03-02T17:19:17.452] gres/hbm: state for knl02
[2017-03-02T17:19:17.452] gres_cnt found:TBD configured:0 avail:0 alloc:0
[2017-03-02T17:19:17.452] gres_bit_alloc:
[2017-03-02T17:19:17.452] gres_used:(null)
[2017-03-02T17:19:17.453] Recovered state of 0 reservations
[2017-03-02T17:19:17.453] _preserve_plugins: backup_controller not specified
[2017-03-02T17:19:17.453] Running as primary controller
[2017-03-02T17:19:17.454] No parameter for mcs plugin, default values set
[2017-03-02T17:19:17.454] mcs: MCSParameters = (null). ondemand set.
[2017-03-02T17:19:20.013] _update_node_avail_features: nodes knl02 available features set to: a2a,hemi,quad,snc2,snc4,cache,flat,hybrid,auto,knl
[2017-03-02T17:19:20.017] _update_node_active_features: nodes knl02 active features set to: quad,flat
[2017-03-02T17:19:20.017] gres/hbm: state for knl02
[2017-03-02T17:19:20.018] gres_cnt found:17179869184 configured:17179869184 avail:17179869184 alloc:0
[2017-03-02T17:19:20.018] gres_bit_alloc:
[2017-03-02T17:19:20.018] gres_used:(null)
[2017-03-02T17:19:20.462] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0
[2017-03-02T17:24:21.535] gres: hbm state for job 171
[2017-03-02T17:24:21.535] gres_cnt:1 node_cnt:0 type:(null)
[2017-03-02T17:24:21.535] error: gres/hbm: node knl02 gres bitmap size bad (0 < 17179869184)
==========================================================================================
slurm.conf
====================================
# LOGGING AND ACCOUNTING
#AccountingStorageType=accounting_storage/none
AccountingStorageType=accounting_storage/filetxt
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
NodeFeaturesPlugins=knl_generic
DebugFlags=NodeFeatures,Gres
GresTypes=hbm
RebootProgram=/sbin/reboot
#Nodes
Nodename=knl02 Sockets=1 CoresPerSocket=68 ThreadsPerCore=4 RealMemory=95891 Feature=knl
PartitionName=hbm Default=YES MaxTime=INFINITE State=UP Nodes=knl02
#Auth
AuthType=auth/none
=======================================================
knl_generic.conf
======================================================
# Sample knl_generic.conf
SyscfgPath=/usr/bin/syscfg/syscfg
DefaultNUMA=quad # NUMA=all2all
AllowNUMA=quad,a2a,snc2,hemi
DefaultMCDRAM=flat # MCDRAM=cache
==========================================================
gres.conf
================================================
# Configure support
Name=hbm File=/dev/shm/mcdram
=================================================
what's the problem?
When I use gres:hbm option, I could run normally with ddr memory not mcdram.
Mobie : +82-10-8849-4001
=====================================