I had to interrupt the ConvertToCapped operation because it was going
to exceed the available disk. It ran for about 16 hours and was going
to take about 2-3 days to finish the 30TB collection. I did a killOp
on the convert process, which reported it was killed, but it kept
creating datafiles. I was unable to force a server shutdown short of a
kill -9. When I brought the server back up (with the secondary down so
as not to replicate the convertToCapped operation) it came up quickly,
did a brief recovery referencing the last datafile before the addition
by converToCapped. However, even bringing it up as part of a replica
set (with the secondary down but the arbiter up) results in the same
symptoms as before, outside the replica set - only 800GB of VM is
mapped (should be 8000) and any db.collection.stats operation fails
with "can't map file memory". This was what was happening before I ran
the convertToCapped, but only when I omitted the replSet argument on
startup. Now it does it always. Since I can't see any collection
stats, I can't be certain whether it knows about the newly created
empty datafiles, so I don't know if it's safe to delete them and retry
the convertToCapped with a smaller size. Also, I can't bring up the
secondary within the replica set because the only outstanding
operation to replicate is the convertToCapped, which I don't want to
replicate. It seems my only option is to remove the system from the
replica set, recreate the filesystem (rather than delete the existing
20TB of data) and try to rebuild the replica set from the 4TB
datafiles on the remaining server, which was formerly the secondary.
1. Why am I unable to map all files when outside of the replica set or
when the secondary is not available? This is not a ulimit issue.
2. Is there a way to delete an operation from the oplog (in this case
the convertToCapped) to prevent it from replicating?
On Feb 8, 6:09 pm, Scott Hernandez <
scotthernan...@gmail.com> wrote:
> In addition. It cant reuse the old space until it is freed, after the copy.