Forcing Replication on Existing Files

19 views
Skip to first unread message

Michael Kamprath

unread,
Feb 3, 2020, 12:02:18 PM2/3/20
to QFS Development
I've noticed with qfsfsck that I have a number that have status 2 "Files reachable lost if server down". How do I force replication on this files so that I could loose a node? I did figure out that I could use qfsdataverify to discover the chunks that pertain to a file using -v(verbose) mode, and then use qfsadmin force_replication to replicate them, and this will remove them from status 2. However, this is very manual and tedious. Is there a better way?

Michael

Michael Ovsiannikov

unread,
Feb 4, 2020, 9:51:43 PM2/4/20
to <qfs-devel@googlegroups.com>
In theory re-balance should schedule re-replication in this case in order to fix the chunk placement problem.

Re-balancing scans all chunks consitinously, unless it is turned off by the configuration. When it finds chunks with placement problems, and determines that it is possible to fix them given the current state of the system, it schedulers chunks re-relocation. Placement problem correction have higher precedence than io and space re-balancing.

If the system is not configured with enough chunk servers or racks (failure groups) it might not be theoretically possible to correct placement problems. It is possible that that some corner cases exist where re-balancing does not correct the problem, though with sufficient number of racks and servers it worked reasonably well, at least in the past.

Offline layout emulator can be used for re-balancing verification, testing, and debugging.

Force re-replication command is intended for debugging and verification, it is not intended for system maintenance.

Given that force re-replication command fixed chunk placement problem, I’d suggest to check whether or not re-balancing is enabled, and the system was running for sufficiently long period of time to allow [low cpu priority] re-balance scan detect placement problem.

— Mike.

On Feb 3, 2020, at 9:02 AM, Michael Kamprath <mic...@kamprath.net> wrote:

I've noticed with qfsfsck that I have a number that have status 2 "Files reachable lost if server down". How do I force replication on this files so that I could loose a node? I did figure out that I could use qfsdataverify to discover the chunks that pertain to a file using -v(verbose) mode, and then use qfsadmin force_replication to replicate them, and this will remove them from status 2. However, this is very manual and tedious. Is there a better way?

Michael

--
You received this message because you are subscribed to the Google Groups "QFS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qfs-devel+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qfs-devel/81f2ca56-9d7e-4cb5-a2b6-56b3a1dbcb23%40googlegroups.com.

Michael Kamprath

unread,
Feb 10, 2020, 12:00:21 AM2/10/20
to QFS Development
For anybody who stumbles across this thread later, some offline conversations ultimately culminated with a bug report: QFS-350

Michael K.
To unsubscribe from this group and stop receiving emails from it, send an email to qfs-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages