Slow ganeti-watcher

199 views
Skip to first unread message

Алексей Миронов

unread,
May 15, 2015, 6:35:18 AM5/15/15
to gan...@googlegroups.com
Hi,
I have ganeti cluster ( version v2.12.2 ), consisting of of 8 nodes, 169 instances.
The task (from crontab) ganet-watcher runs long - total execution time is ~30min. It is normal? 
gnt-cluster verify executed without errors.

I found this errors in job.log :

2015-05-15 10:00:05,161: job-365016 pid=23434 INFO Op 1/1: opcode GROUP_VERIFY_DISKS(4ae50574-113b-407c-9691-923c6b93af16) waiting for locks
2015-05-15 10:04:17,773: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:07:23,089: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:13:22,551: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:16:54,124: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:19:12,969: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:22:37,582: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:25:44,027: job-365016 pid=23434 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 10:26:01,700: job-365016 pid=23434 INFO Finished job 365016, status = success

What can be caused these errors?

Hrvoje Ribicic

unread,
May 15, 2015, 8:07:45 AM5/15/15
to gan...@googlegroups.com
Hi Aleksej,

This does not look normal, although it might be resolved by fixes present in 2.12.4. Is the master suffering from heavy CPU load while the verification is happening?

Also, could you attach the relevant part of the wconfd logs? According to this log snippet, it might be that the daemon is dying randomly.

Cheers,
Riba
Hrvoje Ribicic
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Apollon Oikonomopoulos

unread,
May 15, 2015, 9:14:01 AM5/15/15
to gan...@googlegroups.com
On 03:35 Fri 15 May , Алексей Миронов wrote:
> Hi,
> I have ganeti cluster ( version v2.12.2 ), consisting of of 8 nodes, 169
> instances.
> The task (from crontab) ganet-watcher runs long - total execution time is
> ~30min. It is normal?

No this is not normal. This was a regression in the 2.12 release series,
affecting medium- and large-sized clusters and has been fixed in 2.12.4.

Cheers,
Apollon

Алексей Миронов

unread,
May 15, 2015, 3:47:14 PM5/15/15
to gan...@googlegroups.com
I just updated ganeti to 2.12.4 version, and CLUSTER_VERIFY_GROUP still runs long (last run is 22 min)
Cpu load (while the runs watcher), as usual.
Here is part of wconfd.log:

2015-05-15 22:00:02,092305000000 MSK: ganeti-wconfd pid=30903/ThreadId 960 INFO Successfully handled echo
2015-05-15 22:00:02,250031000000 MSK: ganeti-wconfd pid=30903/ThreadId 965 INFO Successfully handled listLocksWaitingStatus
2015-05-15 22:00:05,128543000000 MSK: ganeti-wconfd pid=30903/ThreadId 970 INFO Successfully handled readConfig
2015-05-15 22:00:05,738272000000 MSK: ganeti-wconfd pid=30903/ThreadId 977 INFO Successfully handled updateLocksWaiting
2015-05-15 22:00:05,738674000000 MSK: ganeti-wconfd pid=30903/ThreadId 981 INFO Successfully handled hasPendingRequest
2015-05-15 22:00:05,739180000000 MSK: ganeti-wconfd pid=30903/ThreadId 985 INFO Successfully handled listLocks
2015-05-15 22:00:05,739505000000 MSK: ganeti-wconfd pid=30903/ThreadId 975 INFO Successfully handled readConfig
2015-05-15 22:00:05,994316000000 MSK: ganeti-wconfd pid=30903/ThreadId 992 INFO Successfully handled updateLocksWaiting
2015-05-15 22:00:05,994900000000 MSK: ganeti-wconfd pid=30903/ThreadId 998 INFO Successfully handled hasPendingRequest
2015-05-15 22:00:05,995383000000 MSK: ganeti-wconfd pid=30903/ThreadId 1002 INFO Successfully handled listLocks
2015-05-15 22:00:05,998904000000 MSK: ganeti-wconfd pid=30903/ThreadId 1007 INFO Successfully handled updateLocksWaiting
2015-05-15 22:00:05,999201000000 MSK: ganeti-wconfd pid=30903/ThreadId 1011 INFO Successfully handled hasPendingRequest
2015-05-15 22:00:05,999549000000 MSK: ganeti-wconfd pid=30903/ThreadId 1015 INFO Successfully handled listLocks
2015-05-15 22:00:06,001627000000 MSK: ganeti-wconfd pid=30903/ThreadId 975 INFO Successfully handled readConfig
2015-05-15 22:00:06,297274000000 MSK: ganeti-wconfd pid=30903/ThreadId 1023 INFO Successfully handled updateLocksWaiting
2015-05-15 22:00:06,297685000000 MSK: ganeti-wconfd pid=30903/ThreadId 1027 INFO Successfully handled hasPendingRequest
2015-05-15 22:00:06,298054000000 MSK: ganeti-wconfd pid=30903/ThreadId 1031 INFO Successfully handled listLocks
2015-05-15 22:00:06,306403000000 MSK: ganeti-wconfd pid=30903/ThreadId 975 INFO Successfully handled readConfig
2015-05-15 22:00:06,614187000000 MSK: ganeti-wconfd pid=30903/ThreadId 1039 INFO Successfully handled updateLocksWaiting
2015-05-15 22:00:06,614986000000 MSK: ganeti-wconfd pid=30903/ThreadId 1047 INFO Successfully handled hasPendingRequest
2015-05-15 22:00:06,615515000000 MSK: ganeti-wconfd pid=30903/ThreadId 1051 INFO Successfully handled listLocks
2015-05-15 22:00:06,770210000000 MSK: ganeti-wconfd pid=30903/ThreadId 975 INFO Successfully handled readConfig
2015-05-15 22:01:06,905787000000 MSK: ganeti-wconfd pid=30903/ThreadId 975 WARNING Error during message receiving: user error (Timeout in reading a response)
2015-05-15 22:22:01,183081000000 MSK: ganeti-wconfd pid=30903/ThreadId 1059 INFO Successfully handled freeLocksLevel
2015-05-15 22:22:01,184240000000 MSK: ganeti-wconfd pid=30903/ThreadId 1063 INFO Successfully handled freeLocksLevel
2015-05-15 22:22:01,185155000000 MSK: ganeti-wconfd pid=30903/ThreadId 1068 INFO Successfully handled freeLocksLevel
2015-05-15 22:22:01,186394000000 MSK: ganeti-wconfd pid=30903/ThreadId 1073 INFO Successfully handled freeLocksLevel
2015-05-15 22:22:01,187293000000 MSK: ganeti-wconfd pid=30903/ThreadId 1078 INFO Successfully handled dropAllReservations
2015-05-15 22:22:01,187935000000 MSK: ganeti-wconfd pid=30903/ThreadId 1082 INFO Successfully handled freeLocksLevel

and job.log:

2015-05-15 22:00:05,363: job-365036 pid=22439 INFO Restarting job 365036
2015-05-15 22:00:05,736: job-365036 pid=22439 INFO Op 1/1: opcode GROUP_VERIFY_DISKS(4ae50574-113b-407c-9691-923c6b93af16) waiting for locks
2015-05-15 22:22:01,186: job-365036 pid=22439 ERROR Network error: [Errno 32] Broken pipe, retring (retry attempt number 1)
2015-05-15 22:22:01,324: job-365036 pid=22439 INFO Finished job 365036, status = success

пятница, 15 мая 2015 г., 15:07:45 UTC+3 пользователь Hrvoje Ribicic написал:
Reply all
Reply to author
Forward
0 new messages