Job Master Crash

26 views
Skip to first unread message

qsu...@gmail.com

unread,
Feb 11, 2018, 1:37:32 AM2/11/18
to mogile
Hi Eric, 

My job master are crashing.
I am encounter the following error, can you help on it ?


Feb 11 06:34:31 ipattern-core04 mogilefsd: [Sun Feb 11 06:34:31 2018] crash log: Can't use string ("0") as a HASH ref while "strict refs" in use at /usr/local/share/perl5/MogileFS/Worker/JobMaster.pm line 114.
Feb 11 06:34:32 ipattern-core04 mogilefsd: [Sun Feb 11 06:34:32 2018] Child 102476 (job_master) died: 256 (UNEXPECTED)
Feb 11 06:34:32 ipattern-core04 mogilefsd: [Sun Feb 11 06:34:32 2018] Job job_master has only 0, wants 1, making 1.
Feb 11 06:35:02 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:02 2018] crash log: Can't use string ("0") as a HASH ref while "strict refs" in use at /usr/local/share/perl5/MogileFS/Worker/JobMaster.pm line 114.
Feb 11 06:35:03 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:03 2018] Child 103868 (job_master) died: 256 (UNEXPECTED)
Feb 11 06:35:03 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:03 2018] Job job_master has only 0, wants 1, making 1.
Feb 11 06:35:33 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:33 2018] crash log: Can't use string ("0") as a HASH ref while "strict refs" in use at /usr/local/share/perl5/MogileFS/Worker/JobMaster.pm line 114.
Feb 11 06:35:34 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:34 2018] Child 103905 (job_master) died: 256 (UNEXPECTED)
Feb 11 06:35:34 ipattern-core04 mogilefsd: [Sun Feb 11 06:35:34 2018] Job job_master has only 0, wants 1, making 1.
Feb 11 06:36:04 ipattern-core04 mogilefsd: [Sun Feb 11 06:36:04 2018] crash log: Can't use string ("0") as a HASH ref while "strict refs" in use at /usr/local/share/perl5/MogileFS/Worker/JobMaster.pm line 114.
Feb 11 06:36:05 ipattern-core04 mogilefsd: [Sun Feb 11 06:36:05 2018] Child 104087 (job_master) died: 256 (UNEXPECTED)
Feb 11 06:36:05 ipattern-core04 mogilefsd: [Sun Feb 11 06:36:05 2018] Job job_master has only 0, wants 1, making 1.

Eric Wong

unread,
Feb 13, 2018, 1:37:02 AM2/13/18
to mog...@googlegroups.com
qsu...@gmail.com wrote:
> Feb 11 06:34:31 ipattern-core04 mogilefsd: [Sun Feb 11 06:34:31 2018] crash
> log: Can't use string ("0") as a HASH ref while "strict refs" in use at
> /usr/local/share/perl5/MogileFS/Worker/JobMaster.pm line 114.

Maybe this is a fix correct? (sorry, I haven't dealt with this
code much)

diff --git a/lib/MogileFS/Store.pm b/lib/MogileFS/Store.pm
index c16aec1..6ba4c7c 100644
--- a/lib/MogileFS/Store.pm
+++ b/lib/MogileFS/Store.pm
@@ -1723,7 +1723,7 @@ sub grab_queue_chunk {
my $tries = 3;
my $work;

- return 0 unless $self->lock_queue($queue);
+ return () unless $self->lock_queue($queue);

my $extwhere = shift || '';
my $fields = 'fid, nexttry, failcount';



But, I wonder if your DB is really overloaded. It seems we
wait 30s for GET_LOCK() to succeed in MySQL; and 30s is a
very generous timeout, so I wonder if something is wrong with
your MySQL instance...

qsu...@gmail.com

unread,
Feb 13, 2018, 1:49:52 AM2/13/18
to mogile
yes,  we encounter such GET_LOCK() fail in our MySQL, especially when we do rebalance or fsck. I can see a lot of GET_LOCK waiting in MySQL when executing `show processlist`
The MySQL is shared with other databases and the loading is not heavy as we monitored. But we have 2PT data in MogileFS cluster and about 50 hosts.

在 2018年2月13日星期二 UTC+8下午2:37:02,Eric Wong写道:
Reply all
Reply to author
Forward
0 new messages