Best way to make mogstored replication?

37 views
Skip to first unread message

Yuki (aka Rubén Gómez)

unread,
Oct 17, 2009, 6:19:05 AM10/17/09
to mog...@googlegroups.com
Hello!

I have 2 servers (A and B) with a mysql (master-master mode) and a
mogstored each. I want to add a server C with only a mogstored,
because I need to quit the mogstored from the server B.

I have created the host and the device for the server C in the server
A, and I have modified the class to "--mindevcount=3". It shows ok
when doing "host list" and "device list". The replication has begun,
but it's very slowly (in one day C only has 4,3GB in the device).

mogadm --conf=/etc/mogilefs/mogilefs.conf settings list
enable_rebalance = 1
schema_version = 12

I also made a "fsck reset", "fsck stop" and "fsck start".

Is there any option to force the replication to go more quickly? How
can I know when the replication has finished, and it's all completely
replicated?

I have 4 mogilefsd (in other servers), 2 goes to the A's db and the
other 2 goes to B's, but in my mogilefs.conf in A and B only appears 2
of the trackers (the other two doesn't appear anywhere). Could it
affect?

Mogstored in C is from the svn three days ago, and A's and B's
mogstored are from a month ago.

If you need more information, ask for it. Thanks!

dormando

unread,
Oct 17, 2009, 6:42:03 PM10/17/09
to mog...@googlegroups.com
Hey,

Don't enable rebalance,

Make sure you're on at least version 2.30, and start a fsck. Then add a
few more fsck workers (as many as you can tolerate):

telnet trackerhost 7001
!want 4 fsck

... The next major rel will allow you to monitor the fsck progress better,
but for now you can watch it progress by waiting for 'fsck status' to say
it's complete, and then wait for 'count(1) from file_to_queue' to return
0 rows.

Slowly increase the fsck workers until the load is too high.. but don't go
over 10 or so. It's pretty quick.

-Dormando

Yuki (aka Rubén Gómez)

unread,
Oct 18, 2009, 6:50:27 AM10/18/09
to mog...@googlegroups.com
On Sun, Oct 18, 2009 at 12:42 AM, dormando <dorm...@rydia.net> wrote:
>
> Hey,
>
> Don't enable rebalance,
Done.

> Make sure you're on at least version 2.30, and start a fsck.

Maybe mogstored need a --version option to know which version we have
installed (I have installed all from subversion).

>Then add a
> few more fsck workers (as many as you can tolerate):
>
> telnet trackerhost 7001
> !want 4 fsck

Done. I didn't know nothing about this, it is not documented in the
man. Where can I find this kind of info?I have done it only in one of
my trackers (I have five), must I do it in all of them, or will it
load too much the db?

> ... The next major rel will allow you to monitor the fsck progress better,
> but for now you can watch it progress by waiting for 'fsck status' to say
> it's complete, and then wait for 'count(1) from file_to_queue' to return
> 0 rows.

Ok. Thanks for the info. I hope the replication could finish in few hours.

dormando

unread,
Oct 18, 2009, 9:37:10 PM10/18/09
to mog...@googlegroups.com
> > Make sure you're on at least version 2.30, and start a fsck.
> Maybe mogstored need a --version option to know which version we have
> installed (I have installed all from subversion).

... Bah, I guess the mogilefsd has !version but mogstored doesn't. If
you've upgraded one, the other will be up to date. Another todo...

> Done. I didn't know nothing about this, it is not documented in the
> man. Where can I find this kind of info?I have done it only in one of
> my trackers (I have five), must I do it in all of them, or will it
> load too much the db?

It was a new feature in 2.30, noted in the release notes. There wasn't
even a way to set a default for number of fsck workers until yesterday.
2.32 has an annotated conf/mogilefsd.conf that notes it at least.

You add as many fsck workers as you can handle. Add a couple at a time
then watch those monitoring graphs that you certainly have and see if
things are still okay.

> Ok. Thanks for the info. I hope the replication could finish in few hours.

Feel free to bug me privately if it looks stuck, but if it's moving at
all, that should do it.

-Dormando

Yuki (aka Rubén Gómez)

unread,
Oct 20, 2009, 7:02:02 AM10/20/09
to mog...@googlegroups.com
On Mon, Oct 19, 2009 at 3:37 AM, dormando <dorm...@rydia.net> wrote:
> You add as many fsck workers as you can handle. Add a couple at a time
> then watch those monitoring graphs that you certainly have and see if
> things are still okay.
I added 5 fsck, and my file_to_queue report 0 rows, but I have 248184
rows in file_to_replicate. How does this two tables work? When you try
to replicate the mogstored, it inserts to file_to_queue, and after
that inserts to file_to_replicate? The replication will be finished
when file_to_replicate shows 0 rows?

I have added in two of my mogilefsd 7 fsck and 3 replicates, so this
is my config now in 2 of my 5 mogilefsd:

!jobs
delete count 1
delete desired 1
delete pids 4909
fsck count 7
fsck desired 7
fsck pids 4923 4975 4976 5021 5022 5023 5024
job_master count 1
job_master desired 1
job_master pids 4922
monitor count 1
monitor desired 1
monitor pids 4920
queryworker count 10
queryworker desired 10
queryworker pids 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919
reaper count 1
reaper desired 1
reaper pids 4921
replicate count 3
replicate desired 3
replicate pids 4908 4977 4978

My first mogilefsd, is getting this output (192.168.2.9 and
192.168.2.7 are my oldest mogstored, my new mogstored is 192.168.2.8).
Will this appear when the replication finished?
[replicate(4977)] Rebalance for DevFID[d=3;f=4696]
(http://192.168.2.9:7500/dev3/0/000/004/0000004696.fid) failed: no
suitable destination devices available
[replicate(4977)] Rebalance for DevFID[d=1;f=2575]
(http://192.168.2.7:7500/dev1/0/000/002/0000002575.fid) failed: no
suitable destination devices available

PS: As far as I increment replicate jobs in telnet, the rows in
file_to_replicate are decreasing quickly. Is it good idea to have 3
replicate jobs in each mogilefsd?

dormando

unread,
Oct 20, 2009, 12:43:57 PM10/20/09
to mog...@googlegroups.com
> I added 5 fsck, and my file_to_queue report 0 rows, but I have 248184
> rows in file_to_replicate. How does this two tables work? When you try
> to replicate the mogstored, it inserts to file_to_queue, and after
> that inserts to file_to_replicate? The replication will be finished
> when file_to_replicate shows 0 rows?

They're for different things. 'file_to_queue' is a generic queue for
one-off or less often used job types (like fsck). file_to_replicate is
specifically for replicate jobs. So you'll need to slowly increase the
number of replicate workers as well. Which, it appears you've done?

> My first mogilefsd, is getting this output (192.168.2.9 and
> 192.168.2.7 are my oldest mogstored, my new mogstored is 192.168.2.8).
> Will this appear when the replication finished?
> [replicate(4977)] Rebalance for DevFID[d=3;f=4696]
> (http://192.168.2.9:7500/dev3/0/000/004/0000004696.fid) failed: no
> suitable destination devices available
> [replicate(4977)] Rebalance for DevFID[d=1;f=2575]
> (http://192.168.2.7:7500/dev1/0/000/002/0000002575.fid) failed: no
> suitable destination devices available

Disable rebalance, it doesn't work very well right now...

> PS: As far as I increment replicate jobs in telnet, the rows in
> file_to_replicate are decreasing quickly. Is it good idea to have 3
> replicate jobs in each mogilefsd?

As many as you need to keep the table down in size. There's a special case
where it could grow due to broken fids, search the archives for ENDOFTIME
or wait a little longer for some documentation improvements.

-Dormando

Jon

unread,
Oct 21, 2009, 11:27:34 AM10/21/09
to mogile
Why doesn't tracker put priority on files with 0 or 1 devcount, when
devcount in class is set to 4?

I am also struggling with slow replication, and recently had a hdd
failiure, so some files got to be at one hdd only. Then I changed
devcount - this is something like 4 days ago.. and I still haven't
gotten all my devcount 1 files replicated :(

dormando

unread,
Oct 21, 2009, 7:22:27 PM10/21/09
to mogile
> Why doesn't tracker put priority on files with 0 or 1 devcount, when
> devcount in class is set to 4?

It puts priority on the time they were entered into the file_to_queue
table.

> I am also struggling with slow replication, and recently had a hdd
> failiure, so some files got to be at one hdd only. Then I changed
> devcount - this is something like 4 days ago.. and I still haven't
> gotten all my devcount 1 files replicated :(

Add more replicate workers, so long as your cluster can take on the load?

If you change devcount, make sure your file_to_replicate table has fully
drained already, the run a fsck from start to finish. That'll add all of
the files that need more copies. Just changing the devcount doesn't make
the system re-replicate files.

-Dormando

Sheng-Ho Yuan

unread,
May 1, 2013, 11:47:02 PM5/1/13
to mog...@googlegroups.com
Hi Dormando,

How to make file_to_replicate table fully drained?

Can I just remove insert permission of mogilefs db user on file_to_replicate table?
Reply all
Reply to author
Forward
0 new messages