Are you marking the devices as 'dead' before removing them?
> What I have ended up with is files totally gone, tons and tons of
> policy violations never getting corrected, and severe load issues from
> trackers without priority running wild requesting the same 404 over
> and over again (example). I have had tracker try and try on devices
> clearly marked as DOWN, and even DEAD. Halting all other constructive
> tasks. It has tried to read a file where it get 404 in return, just to
> try it again the next second. Tons and tons of errors related to mysql
> slaves getting overloaded.
This doesn't make much sense... can you post your mogilefsd.conf ? Smells
like you might need to set 'rebalance_ignore_missing = 1'
> What I have done to try and fix this:
> - Disabled rebalance (helped A LOT on mysql load)
Rebalance is broken, you shouldn't use it.
> - Disabled drain (also help a lot on mysql load)
rebalance_ignore_missing = 1 should stop this from spinning.
> - Increased replication count for classes
This doesn't automatically change anything. you have to run a fsck first.
> - Reset fsck log, and start it again
> - Manually removed all entries in mysql database table
> file_to_replicate where fromdevid no longer is alive
> - Any read and load balancing operation now done in my client
> application directly against mysql, to avoid any and all unneeded
> traffic to tracker
Oi. Bad bad bad.
> - Added retries in my application if tracker returns an error due to
> mysql load
Not as bad.
> My file_to_queue table is completely empty, and my fsck_log table has
> 570000 entries. My file_to_replicate table is just growing, and no
> files seems to actually replicate. This is after 2 weeks of first
> initiating reset / start of fsck
>
> My setup is 2 master-master replicated mysql server, 2 trackers
> pointing to the same mysql server, on version 2.30, 1 tracker pointing
> to the other server running 2.31 and 10 mogstored nodes.
To confirm, do you have:
Databases 'A' and 'B', in master<->master replicated pairs.
Tracker 1 (2.30) -> talks to database A
Tracker 2 (2.30) -> talks to database A
Tracker 3 (2.31) -> talks to database B
? If so, please stop *immediately*. *ALL* trackers *MUST* point to the
same database, at all times. It will absolutely not work correctly with
them split.
If I missunderstood what you just described, correct me and submit what
your tracker configs are. That should give us an idea of what's going on.
> - I have all the different storage nodes in different data centers so
> traffic goes over the internet (ip restricted firewall)
Are you using a custom replication policy? How are you ensuring files go
everywhere?
> - I have two servers behind NAT, which is making the tracker time out
> on requests to localhost. This error is also result in timeout for
> localhost if I use mogadm --trackers=ouside.server.com:7001 check
I'm not sure I understand what's going on here.
> My goal:
> - Create a distributed file system for use as content delivery network
MFS can definitely help with something like this, but it seems like you've
stretched it the wrong way. Lets start with the above and move from
there...
-Dormando
Is this different on any of your hosts, besides the above dsn line?
How have you configured your classes? what is your mindevcount everywhere?
How many hosts/devices do you have?
> Not knowing what default settings for rebalance_ignore_missing, I'm
> assuming it is 0. (I added the line 'rebalance_ignore_missing = 1'
> now)
Yeah.
> "Rebalance is broken, you shouldn't use it. " Noted. But what is
> broken excactly? It seems to have filled newly insterted boxes/devices
> when I enabled it before.
There're a lot of cases where it'll just spin and waste CPU. Drain has
similar issues, but less of them. MogileFS will prefer hosts that are more
empty and more idle for all operations. So usually it's best to just pick
the hosts you think could benefit from having fewer files and drain them
for a while.
> I can see how the ideal case it so handle tracker only for all files
> related tasks, but I was having problems with persistent sockets and
> load. Not to mention the extra latency in my case with a mysql server
> located in another datacenter. Why is extracting paths for a spesific
> key directly from database so horrible? (Or a full set of keys, to
> minimize latency issues).
Because the tracker has intelligence and options in the way it returns
paths. Doing that also tells me you probably don't understand the problem
so well. Either your tracker is mildly broken or you're not querying it
correctly.
You should be fetching all paths by passing a pathcount argument, and also
passing 'noverify=1', when calling get_paths. Then MogileFS uses its io
monitoring to determine which paths are safe to return, and what order is
ideal for the IO load.
noverify=1 tells mogilefs to not run HEAD requests against each path
before returning them. that's probably most of your slowdown, since those
will go over the internuts.
> You where correct in the setup, I have changed the version since post
> to 2.32 from svn.
Please use the release... SVN trunk has some broken commits in it right
now.
> So, for reference:
>
> Databases 'A' and 'B', in master<->master replicated pairs.
>
> Tracker 1 (2.32) -> talks to database A
> Tracker 2 (2.32) -> talks to database A
> Tracker 3 (2.32) -> talks to database B
>
> (I changed all trackers to point to database B now)
>
> I need to write failover logic in the tracker itself, and switch all
> trackers to database B, in case A goes down? Or what's the optimal way
> to do this?
There's an endless amount of documentation on the internet for the various
ways of doing HA MySQL setups. That's a little beyond the scope of this
discussion :) The easiest would be to use a hostname and switch that out.
Then it gets harder from there.
Once you revert back to the 2.32 tarball... You should stop all trackers,
start all trackers clean. Ensure rebalance is off, ensure nothing is
draining, or readonly. Ensure deletes are actually working, replication is
actually working. Then clear the fsck log (mogadm fsck clear ? I forget),
and re-run a fsck. Once the file_to_queue table is empty, the fsck has
completed.
Then you can look over the log to see what's going on. Replication and etc
should be working.
It might also be a good step to, *before you run fsck*, make sure that
you can upload a few files and they are actually replicating correctly.
-Dormando
The tracker keeps note of when devices are dead, marked down, etc. I guess
you'll be fine for now, but maybe we can come up with a batch paths
request or something.
> Regarding HA MySQL setups, my question is rather;
> Is there a way for me to have the trackers connect to mysql server2
> when server1 down, without having to move the ip (not possible in my
> enviroment) or update dns record (really slow) of the mysql server?
You should really do some reading and fiddle. Doing it in the
client/tracker won't be the best idea since you have things everywhere
doing requests. DNS isn't slow if you have something smart dumping changes
into hosts files, or swapping configs and restarting, or something.
You'll probably end up with an iptables redirect or a tcp proxy, I'd
bet...
> [fsck(16252)] Fixing FID 405834
> [reaper(16250)] Reaper running; looking for dead devices
> [monitor(16249)] Monitor running; scanning usage files
> [fsck(16252)] node server1.domain.com seems to be down in
> get_file_size
> [fsck(16252)] Fsck stalled: dev unreachable at /usr/local/share/perl/
> 5.8.8/MogileFS/Worker/Fsck.pm line 290, <GEN3> line 597.
> [monitor(16249)] Monitor running; scanning usage files
> [monitor(16249)] dev7: used = 12125768, total = 296362216, writeable =
> 1
> [monitor(16249)] dev9: used = 84287852, total = 296362216, writeable =
> 1
> [monitor(16249)] dev5: used = 11099084, total = 98787332, writeable =
> 1
> [reaper(16250)] Reaper running; looking for dead devices
> [replicate(16235)] source_down: Requested replication source device 7
> not available
> [monitor(16249)] Monitor running; scanning usage files
>
> Where dev5 is the only device on server1.domain.com and dev 7 is the
> only device on localhost. The dns database record for server2
> containing dev7 is added in /etc/hosts file pointing to 127.0.1.1
>
> Both servers are clearly up, and 'mogadm check' returns 'OK' on dev5
> and dev7
Mogile's not too tolerant to having a super latent setup... As you've
noticed I guess. It tries pretty hard to not get hung up on servers that
are limping, which both helps the server recover by avoiding giving it
traffic, and avoids mogilefs from tripping up over them.
In short, there's a really low timeout for the get_file_size command
there. If you crack open MogileFS/HTTPFile.pm and look at line 243, you
can increase that timeout. If that clears it up I'll go make it
configurable...
-Dormando
Which either means the monitor process hasn't insisted to the replicate
workers that the device is actually up. Do you have any errors before
that? errors in accessing dev7 that end up marking it as down.
Or, if you reset the replicate workers via '!want 0 replicate' ' sleep 60
; '!want N replicate' (N is your old worker count), does it work for a
while, before giving up again?
Guess I should try to make that more explicit sooner than later...
In fact the system tries at least a little bit to prevent you from doing
that :)
When fsck "stalls", it will retry that file after 10 minutes. If your
queue is empty then it has eventually gotten around to fixing all of them.
I'm not sure what you're doing with dev7... are you really sure this thing
works? There must be some error you're missing, since that other error you
note should only happen after that device has been noted as being down.
Can you attach output of 'mogadm check', and also "select * from
devices;"?