I have 2 servers (A and B) with a mysql (master-master mode) and a
mogstored each. I want to add a server C with only a mogstored,
because I need to quit the mogstored from the server B.
I have created the host and the device for the server C in the server
A, and I have modified the class to "--mindevcount=3". It shows ok
when doing "host list" and "device list". The replication has begun,
but it's very slowly (in one day C only has 4,3GB in the device).
mogadm --conf=/etc/mogilefs/mogilefs.conf settings list
enable_rebalance = 1
schema_version = 12
I also made a "fsck reset", "fsck stop" and "fsck start".
Is there any option to force the replication to go more quickly? How
can I know when the replication has finished, and it's all completely
replicated?
I have 4 mogilefsd (in other servers), 2 goes to the A's db and the
other 2 goes to B's, but in my mogilefs.conf in A and B only appears 2
of the trackers (the other two doesn't appear anywhere). Could it
affect?
Mogstored in C is from the svn three days ago, and A's and B's
mogstored are from a month ago.
If you need more information, ask for it. Thanks!
Don't enable rebalance,
Make sure you're on at least version 2.30, and start a fsck. Then add a
few more fsck workers (as many as you can tolerate):
telnet trackerhost 7001
!want 4 fsck
... The next major rel will allow you to monitor the fsck progress better,
but for now you can watch it progress by waiting for 'fsck status' to say
it's complete, and then wait for 'count(1) from file_to_queue' to return
0 rows.
Slowly increase the fsck workers until the load is too high.. but don't go
over 10 or so. It's pretty quick.
-Dormando
> Make sure you're on at least version 2.30, and start a fsck.
Maybe mogstored need a --version option to know which version we have
installed (I have installed all from subversion).
>Then add a
> few more fsck workers (as many as you can tolerate):
>
> telnet trackerhost 7001
> !want 4 fsck
Done. I didn't know nothing about this, it is not documented in the
man. Where can I find this kind of info?I have done it only in one of
my trackers (I have five), must I do it in all of them, or will it
load too much the db?
> ... The next major rel will allow you to monitor the fsck progress better,
> but for now you can watch it progress by waiting for 'fsck status' to say
> it's complete, and then wait for 'count(1) from file_to_queue' to return
> 0 rows.
Ok. Thanks for the info. I hope the replication could finish in few hours.
... Bah, I guess the mogilefsd has !version but mogstored doesn't. If
you've upgraded one, the other will be up to date. Another todo...
> Done. I didn't know nothing about this, it is not documented in the
> man. Where can I find this kind of info?I have done it only in one of
> my trackers (I have five), must I do it in all of them, or will it
> load too much the db?
It was a new feature in 2.30, noted in the release notes. There wasn't
even a way to set a default for number of fsck workers until yesterday.
2.32 has an annotated conf/mogilefsd.conf that notes it at least.
You add as many fsck workers as you can handle. Add a couple at a time
then watch those monitoring graphs that you certainly have and see if
things are still okay.
> Ok. Thanks for the info. I hope the replication could finish in few hours.
Feel free to bug me privately if it looks stuck, but if it's moving at
all, that should do it.
-Dormando
I have added in two of my mogilefsd 7 fsck and 3 replicates, so this
is my config now in 2 of my 5 mogilefsd:
!jobs
delete count 1
delete desired 1
delete pids 4909
fsck count 7
fsck desired 7
fsck pids 4923 4975 4976 5021 5022 5023 5024
job_master count 1
job_master desired 1
job_master pids 4922
monitor count 1
monitor desired 1
monitor pids 4920
queryworker count 10
queryworker desired 10
queryworker pids 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919
reaper count 1
reaper desired 1
reaper pids 4921
replicate count 3
replicate desired 3
replicate pids 4908 4977 4978
My first mogilefsd, is getting this output (192.168.2.9 and
192.168.2.7 are my oldest mogstored, my new mogstored is 192.168.2.8).
Will this appear when the replication finished?
[replicate(4977)] Rebalance for DevFID[d=3;f=4696]
(http://192.168.2.9:7500/dev3/0/000/004/0000004696.fid) failed: no
suitable destination devices available
[replicate(4977)] Rebalance for DevFID[d=1;f=2575]
(http://192.168.2.7:7500/dev1/0/000/002/0000002575.fid) failed: no
suitable destination devices available
PS: As far as I increment replicate jobs in telnet, the rows in
file_to_replicate are decreasing quickly. Is it good idea to have 3
replicate jobs in each mogilefsd?
They're for different things. 'file_to_queue' is a generic queue for
one-off or less often used job types (like fsck). file_to_replicate is
specifically for replicate jobs. So you'll need to slowly increase the
number of replicate workers as well. Which, it appears you've done?
> My first mogilefsd, is getting this output (192.168.2.9 and
> 192.168.2.7 are my oldest mogstored, my new mogstored is 192.168.2.8).
> Will this appear when the replication finished?
> [replicate(4977)] Rebalance for DevFID[d=3;f=4696]
> (http://192.168.2.9:7500/dev3/0/000/004/0000004696.fid) failed: no
> suitable destination devices available
> [replicate(4977)] Rebalance for DevFID[d=1;f=2575]
> (http://192.168.2.7:7500/dev1/0/000/002/0000002575.fid) failed: no
> suitable destination devices available
Disable rebalance, it doesn't work very well right now...
> PS: As far as I increment replicate jobs in telnet, the rows in
> file_to_replicate are decreasing quickly. Is it good idea to have 3
> replicate jobs in each mogilefsd?
As many as you need to keep the table down in size. There's a special case
where it could grow due to broken fids, search the archives for ENDOFTIME
or wait a little longer for some documentation improvements.
-Dormando
It puts priority on the time they were entered into the file_to_queue
table.
> I am also struggling with slow replication, and recently had a hdd
> failiure, so some files got to be at one hdd only. Then I changed
> devcount - this is something like 4 days ago.. and I still haven't
> gotten all my devcount 1 files replicated :(
Add more replicate workers, so long as your cluster can take on the load?
If you change devcount, make sure your file_to_replicate table has fully
drained already, the run a fsck from start to finish. That'll add all of
the files that need more copies. Just changing the devcount doesn't make
the system re-replicate files.
-Dormando