Traditionally speaking, the “data” in each log entry is an instruction for a replicated state machine. Each node feeds the same list of instructions to its state machine in the same order, ensuring that each node will end up in the same state. If your log is [1: x=0, 2: x++, 3: y=x*2], every node eventually ends up in state {x=1, y=2}.
From this perspective, your new node needs a full copy of the “data” for each entry in the log, so it too can apply each instruction in order. Just applying ‘y=x*2’ on a brand new node isn’t enough to get in sync with the other nodes.
As the log grows, bootstrapping a new node in this way becomes less and less inefficient. To deal with this, implementations typically take periodic “snapshots” of the state machine: as of log entry 12345 the state was {x=67890, y=42, ...}. New nodes can be bootstrapped with a recent snapshot, then only need to replicate/apply log entries from that position forward, rather than replicating/applying every log entry since the beginning of time.
A special case of this, which is maybe what you’re asking about, is if the “data” in every log entry is in fact a complete snapshot of the entire state machine. This is usually much less efficient than logging only changes, so I wouldn’t necessarily recommend it. But it does mean that you should only need to pick up the “data” from the most recent log entry to bootstrap a new node.
-d