Folks,
HAIL has seemed goofy these past years. Can someone help me out?
Ganeti Version: 3.0.1-1~ubuntu20.04+1
Our typical Ganeti node has:
- 512GB of RAM
- 4x 2TB SSDs
Our "medium" Ganeti instance:
- 16GB RAM
- 8 vcpus
- 128GB disk via drbd
When I run hspace, I consistently get "FailDisk" but ... plenty of disk ... ?! Here's a smaller cluster:
15:23 djh@dr64-tomsk ~> sudo gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
dr64-tomsk 7.0T 5.2T 376.5G 150.0G 220.5G 18 16
dr64-wuhan 7.0T 5.0T 376.5G 202.2G 237.4G 16 18
dr64-yalta 7.0T 5.2T 376.5G 140.4G 231.1G 16 16
15:25 djh@dr64-tomsk ~> sudo hspace --standard-alloc 128G,8G,8 --disk-template drbd -L -p
Initial cluster status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-tomsk 385580 4096 241664 -86022 225842 139820 122880 7151 5333 48 168 18 16 0.3626 0.7457 3.50 18.000 18.000 34.000 18.000
dr64-wuhan 385580 4096 192512 -54101 243073 188972 122880 7151 5157 48 142 16 18 0.4901 0.7211 2.96 16.000 16.000 34.000 16.000
dr64-yalta 385580 4096 217088 -72213 236609 164396 122880 7151 5363 48 154 16 16 0.4264 0.7500 3.21 16.000 16.000 32.000 16.000
The cluster has 3 nodes and the following resources:
MEM 1156740, DSK 21968640, CPU 144, VCPU 576.
There are 50 initial instances on the cluster.
Tiered (initial size) instance spec is:
MEM 98304, DSK 1048576, CPU 32, using disk template 'drbd'.
Tiered allocation status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-tomsk 385580 4096 241664 -86022 225842 139820 122880 7151 5333 48 168 18 16 0.3626 0.7457 3.50 18.000 18.000 34.000 18.000
dr64-wuhan 385580 4096 192512 -54101 243073 188972 122880 7151 5157 48 142 16 18 0.4901 0.7211 2.96 16.000 16.000 34.000 16.000
dr64-yalta 385580 4096 217088 -72213 236609 164396 122880 7151 5363 48 154 16 16 0.4264 0.7500 3.21 16.000 16.000 32.000 16.000
Tiered allocation results:
- no instances allocated
- most likely failure reason: FailDisk
- initial cluster score: 4.31138397
- final cluster score: 4.31138397
- memory usage efficiency: 56.30%
- disk usage efficiency: 26.10%
- vcpu usage efficiency: 80.56%
Standard (fixed-size) instance spec is:
MEM 7629, DSK 122070, CPU 8, using disk template 'drbd'.
Standard allocation status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-tomsk 385580 4096 241664 -86022 225842 139820 122880 7151 5333 48 168 18 16 0.3626 0.7457 3.50 18.000 18.000 34.000 18.000
dr64-wuhan 385580 4096 192512 -54101 243073 188972 122880 7151 5157 48 142 16 18 0.4901 0.7211 2.96 16.000 16.000 34.000 16.000
dr64-yalta 385580 4096 217088 -72213 236609 164396 122880 7151 5363 48 154 16 16 0.4264 0.7500 3.21 16.000 16.000 32.000 16.000
Normal (fixed-size) allocation results:
- 0 instances allocated
- most likely failure reason: FailDisk
- initial cluster score: 4.31138397
- final cluster score: 4.31138397
- memory usage efficiency: 56.30%
- disk usage efficiency: 26.10%
- vcpu usage efficiency: 80.56%
For a larger cluster:
djh@dr64-frown ~> sudo gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
dr64-frown 7.0T 4.2T 503.3G 328.9G 220.2G 19 20
dr64-india 7.0T 4.6T 503.3G 305.8G 212.6G 22 18
dr64-macau 7.0T 4.4T 503.3G 188.0G 297.7G 12 28
dr64-malta 7.0T 4.1T 503.3G 293.9G 206.5G 23 16
dr64-mauve 7.0T 4.2T 503.3G 255.9G 214.5G 22 17
dr64-mocha 7.0T 4.5T 503.3G 263.2G 222.7G 27 12
dr64-nauru 7.0T 3.6T 503.3G 143.9G 314.2G 13 27
dr64-nepal 7.0T 3.7T 503.3G 209.2G 298.5G 14 25
dr64-samoa 7.0T 4.4T 503.3G 148.9G 327.4G 23 17
dr64-twice 7.0T 4.2T 503.4G 276.2G 208.6G 22 17
djh@dr64-frown ~> sudo hspace --standard-alloc 64G,8G,4 --disk-template drbd -L -p
Warning: cluster has inconsistent data:
- node dr64-mocha is missing -76829 MB ram and 184 GB disk
Initial cluster status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-frown 515429 4096 389120 -95135 217348 122213 81920 7154 4300 48 248 20 19 0.2371 0.6012 5.17 20.000 20.000 39.000 20.000
dr64-india 515429 4096 368640 -74678 217371 142693 86016 7154 4696 48 226 22 18 0.2768 0.6565 4.71 22.000 22.000 40.000 22.000
dr64-macau 515429 4096 290816 -91818 312335 220517 176128 7154 4554 48 190 11 29 0.4278 0.6367 3.96 11.000 11.000 40.000 11.000
dr64-malta 515429 4096 372736 -73258 211855 138597 73728 7154 4230 48 231 23 16 0.2689 0.5913 4.81 23.000 23.000 39.000 23.000
dr64-mauve 515429 4096 348160 -56503 219676 163173 81920 7154 4252 48 218 22 17 0.3166 0.5945 4.54 22.000 22.000 39.000 22.000
dr64-mocha 515429 4096 360448 -76829 227714 150885 98304 7154 4440 48 226 27 12 0.2927 0.6207 4.71 27.000 27.000 39.000 27.000
dr64-nauru 515429 4096 270336 -88236 329233 240997 131072 7154 3681 48 180 12 27 0.4676 0.5147 3.75 12.000 12.000 39.000 12.000
dr64-nepal 515429 4096 286720 -81212 305825 224613 131072 7154 3752 48 182 14 25 0.4358 0.5246 3.79 14.000 14.000 39.000 14.000
dr64-samoa 515429 4096 360448 -176053 326938 150885 73728 7154 4488 48 224 24 16 0.2927 0.6274 4.67 24.000 24.000 40.000 24.000
dr64-twice 515442 4096 389120 -91463 213689 122226 73728 7154 4266 48 224 22 18 0.2371 0.5963 4.67 22.000 22.000 40.000 22.000
The cluster has 10 nodes and the following resources:
MEM 5154303, DSK 73254400, CPU 480, VCPU 1920.
There are 197 initial instances on the cluster.
Tiered (initial size) instance spec is:
MEM 98304, DSK 1048576, CPU 32, using disk template 'drbd'.
Tiered allocation status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-frown 515429 4096 389120 -95135 217348 122213 81920 7154 4300 48 248 20 19 0.2371 0.6012 5.17 20.000 20.000 39.000 20.000
dr64-india 515429 4096 368640 -74678 217371 142693 86016 7154 4696 48 226 22 18 0.2768 0.6565 4.71 22.000 22.000 40.000 22.000
dr64-macau 515429 4096 290816 -91818 312335 220517 176128 7154 4554 48 190 11 29 0.4278 0.6367 3.96 11.000 11.000 40.000 11.000
dr64-malta 515429 4096 372736 -73258 211855 138597 73728 7154 4230 48 231 23 16 0.2689 0.5913 4.81 23.000 23.000 39.000 23.000
dr64-mauve 515429 4096 348160 -56503 219676 163173 81920 7154 4252 48 218 22 17 0.3166 0.5945 4.54 22.000 22.000 39.000 22.000
dr64-mocha 515429 4096 360448 -76829 227714 150885 98304 7154 4440 48 226 27 12 0.2927 0.6207 4.71 27.000 27.000 39.000 27.000
dr64-nauru 515429 4096 270336 -88236 329233 240997 131072 7154 3681 48 180 12 27 0.4676 0.5147 3.75 12.000 12.000 39.000 12.000
dr64-nepal 515429 4096 286720 -81212 305825 224613 131072 7154 3752 48 182 14 25 0.4358 0.5246 3.79 14.000 14.000 39.000 14.000
dr64-samoa 515429 4096 360448 -176053 326938 150885 73728 7154 4488 48 224 24 16 0.2927 0.6274 4.67 24.000 24.000 40.000 24.000
dr64-twice 515442 4096 389120 -91463 213689 122226 73728 7154 4266 48 224 22 18 0.2371 0.5963 4.67 22.000 22.000 40.000 22.000
Tiered allocation results:
- no instances allocated
- most likely failure reason: FailDisk
- initial cluster score: 43.11738389
- final cluster score: 43.11738389
- memory usage efficiency: 66.67%
- disk usage efficiency: 40.36%
- vcpu usage efficiency: 111.93%
Standard (fixed-size) instance spec is:
MEM 7629, DSK 61035, CPU 4, using disk template 'drbd'.
Standard allocation status:
F Name t_mem n_mem i_mem x_mem f_mem u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt p_fmem p_fdsk r_cpu lCpu lMem lDsk lNet
dr64-frown 515429 4096 389120 -95135 217348 122213 81920 7154 4300 48 248 20 19 0.2371 0.6012 5.17 20.000 20.000 39.000 20.000
dr64-india 515429 4096 368640 -74678 217371 142693 86016 7154 4696 48 226 22 18 0.2768 0.6565 4.71 22.000 22.000 40.000 22.000
dr64-macau 515429 4096 290816 -91818 312335 220517 176128 7154 4554 48 190 11 29 0.4278 0.6367 3.96 11.000 11.000 40.000 11.000
dr64-malta 515429 4096 372736 -73258 211855 138597 73728 7154 4230 48 231 23 16 0.2689 0.5913 4.81 23.000 23.000 39.000 23.000
dr64-mauve 515429 4096 348160 -56503 219676 163173 81920 7154 4252 48 218 22 17 0.3166 0.5945 4.54 22.000 22.000 39.000 22.000
dr64-mocha 515429 4096 360448 -76829 227714 150885 98304 7154 4440 48 226 27 12 0.2927 0.6207 4.71 27.000 27.000 39.000 27.000
dr64-nauru 515429 4096 270336 -88236 329233 240997 131072 7154 3681 48 180 12 27 0.4676 0.5147 3.75 12.000 12.000 39.000 12.000
dr64-nepal 515429 4096 286720 -81212 305825 224613 131072 7154 3752 48 182 14 25 0.4358 0.5246 3.79 14.000 14.000 39.000 14.000
dr64-samoa 515429 4096 360448 -176053 326938 150885 73728 7154 4488 48 224 24 16 0.2927 0.6274 4.67 24.000 24.000 40.000 24.000
dr64-twice 515442 4096 389120 -91463 213689 122226 73728 7154 4266 48 224 22 18 0.2371 0.5963 4.67 22.000 22.000 40.000 22.000
Normal (fixed-size) allocation results:
- 0 instances allocated
- most likely failure reason: FailDisk
- initial cluster score: 43.11738389
- final cluster score: 43.11738389
- memory usage efficiency: 66.67%
- disk usage efficiency: 40.36%
- vcpu usage efficiency: 111.93%Can anyone please give me a hint of what is going on here? Thank you!!
-danny
--