Yeah, so the idea behind the log compaction algorithm is incremental compaction allows for more consistent performance and reduces the overhead of replication. Taking and/or replicating a large snapshot can be costly. Incremental compaction works at its own pace on individual segments of the log. The state machine continues to operate independently of the incremental compaction algorithm, and compaction threads can be throttled to reduce the load on the servers. Perhaps more interestingly, the incremental compaction algorithm allows Copycat to exclude a lot of entries from ever being replicated to some nodes in the first place, and that helps followers that briefly fall behind the rest of the cluster quickly catch back up. We've actually seen that Copycat is able to large percentages of entries from being replicated in our systems.
But incremental compaction doesn't suit all use cases, which is why I mentioned counters. The state of a counter is the sum of all its increments, so Copycat would have to hold all increments in the log as long as the counter exists. Snapshots are supported primarily for those use cases and most use cases that require counters. But we try to keep the size of snapshots relatively small, do counting on clients when possible (e.g. a reentrant lock/mutex), and use incremental compaction for the most active state machines like maps, queues, and other data structures.
As for the log, Copycat's log always starts at index 1 even if the actual entries on disk start at index 1,000,000. There's not really any reason for that other than for simplicity and consistency. Indexes are never removed from the log, the entries just become null, and iterating from 1 to 1,000,000 when the server starts is trivial in terms of time. But sure, it would be fine to do because presumably the entries can't be applied anyways.