etcd persistence

2,860 views
Skip to first unread message

Stas Oskin

unread,
Jun 1, 2014, 3:56:23 PM6/1/14
to coreo...@googlegroups.com
Hi,

Is there any way to persist etcd data, so in case all nodes crashed, the latest configuration will be restored?

Regards.

Brandon Philips

unread,
Jun 1, 2014, 4:07:50 PM6/1/14
to coreos-dev
On Sun, Jun 1, 2014 at 12:56 PM, Stas Oskin <stas....@gmail.com> wrote:
> Is there any way to persist etcd data, so in case all nodes crashed, the
> latest configuration will be restored?

etcd persists configuration to disk already. What problem are you encoutering?

Stas Oskin

unread,
Jun 1, 2014, 4:48:15 PM6/1/14
to coreo...@googlegroups.com
etcd persists configuration to disk already. What problem are you encoutering?

Running etcd on single node, rebooting then node causes etcd loose all the entered settings.
Looked through config code, but don't see anything related to persistence?

Brandon Philips

unread,
Jun 1, 2014, 5:07:32 PM6/1/14
to coreos-dev
On Sun, Jun 1, 2014 at 1:48 PM, Stas Oskin <stas....@gmail.com> wrote:
> Running etcd on single node, rebooting then node causes etcd loose all the
> entered settings.

Which configuration settings are you not seeing persisted? Can you
write a script that I could use to reproduce? What version of etcd are
you using?

Brandon

Stas Oskin

unread,
Jun 1, 2014, 5:24:04 PM6/1/14
to coreo...@googlegroups.com

Which configuration settings are you not seeing persisted? Can you
write a script that I could use to reproduce? What version of etcd are
you using?

For example these keys:
        'redis_host',
        'redis_port',
        'redis_password',
        'redis_database',
        'redis_options',
        'mysql_host',
        'mysql_user',
        'mysql_password',
        'mysql_database',
        'mysql_port',
        'mysql_charset'

Using etcdctl to insert these.

The etcd version is git master - b3c5ed60bd335a2008ce66a07bf544dc4eec4f2

Stas Oskin

unread,
Jun 1, 2014, 6:05:53 PM6/1/14
to coreo...@googlegroups.com
After inspecting config.go, I found a reference to /etc/etcd/etcd.conf, which I actually don't have.
Looking into config manual, I found out about snapshots and their directory, which I assume provide the persistence feature.

Any idea what default values are specified, if no such file is present?

Brandon Philips

unread,
Jun 1, 2014, 6:44:42 PM6/1/14
to coreos-dev
On Sun, Jun 1, 2014 at 3:05 PM, Stas Oskin <stas....@gmail.com> wrote:
> After inspecting config.go, I found a reference to /etc/etcd/etcd.conf,
> which I actually don't have.
> Looking into config manual, I found out about snapshots and their directory,
> which I assume provide the persistence feature.


The default directory is ${PWD}/${etcdname}.etcd. If you want to
configure it use `-data-dir`.

Brandon

Stas Oskin

unread,
Jun 1, 2014, 8:21:28 PM6/1/14
to coreo...@googlegroups.com

The default directory is ${PWD}/${etcdname}.etcd. If you want to
configure it use `-data-dir`.

Brandon

I actually see it, but the snapshot is outdated - 12 hours ago.
Any way to update it?

Brandon Philips

unread,
Jun 1, 2014, 9:20:14 PM6/1/14
to coreos-dev
On Sun, Jun 1, 2014 at 5:21 PM, Stas Oskin <stas....@gmail.com> wrote:
> I actually see it, but the snapshot is outdated - 12 hours ago.
> Any way to update it?
>

There is also a log file for all of the commits that have not been
shapshotted. The snapshots happens whenever a certain number of
changes have been made (10000 by default) which you can change with
`-snapshot-count`.

Brandon

Stas Oskin

unread,
Jun 1, 2014, 10:20:54 PM6/1/14
to coreo...@googlegroups.com

There is also a log file for all of the commits that have not been
shapshotted. The snapshots happens whenever a certain number of
changes have been made (10000 by default) which you can change with
`-snapshot-count`.


Any way to  force dumping the current key values in the etcd RAM to snapshot?

Brandon Philips

unread,
Jun 2, 2014, 12:12:08 AM6/2/14
to coreos-dev
No there isn't a force. Do you have a reason for wanting to do this?

Brandon

Stas Oskin

unread,
Jun 2, 2014, 12:42:50 AM6/2/14
to coreo...@googlegroups.com
If we restart the node now, we will loose the settings again...

Brandon Philips

unread,
Jun 2, 2014, 1:55:50 AM6/2/14
to coreos-dev
On Sun, Jun 1, 2014 at 9:42 PM, Stas Oskin <stas....@gmail.com> wrote:
> If we restart the node now, we will loose the settings again...

Can you please tell me how I can reproduce this issue, what version
you are using and what settings you are losing exactly.

Brandon

Alex Stockwell

unread,
Sep 9, 2014, 8:58:12 PM9/9/14
to coreo...@googlegroups.com
For anyone who finds this thread with a similar question, you are probably looking for the Snapshots and Logging documentation, found here: https://github.com/coreos/etcd/blob/master/Documentation/tuning.md

Also, this commit details how the Raft protocol uses logging, but that bit seems to have since been removed from the master branch's docs: https://github.com/coreos/etcd/pull/501/files

Alex

Jeffrey Miller

unread,
Nov 19, 2014, 10:26:31 AM11/19/14
to coreo...@googlegroups.com
This seems to be the most relevant thread on the internet for this question, but if not please let me know. I have consulted the Snapshots and Logging documentation, and still have a few questions. 

What is the expected behaviour for a single node etcd server that goes down and comes back up? Should it re-populate itself from the log file (if there is no snapshot)? If there is a snapshot and a log file, does it replay the logfile on top of the snapshot to re-populate? 

Assuming one of these is true, does it do this automatically or is this a process that needs to be manually kicked off?

Thanks for any insight you can provide!

Jeff

Brandon Philips

unread,
Nov 19, 2014, 2:10:29 PM11/19/14
to coreos-dev
On Wed, Nov 19, 2014 at 7:26 AM, Jeffrey Miller <jef...@gmail.com> wrote:
> What is the expected behaviour for a single node etcd server that goes down
> and comes back up? Should it re-populate itself from the log file (if there
> is no snapshot)? If there is a snapshot and a log file, does it replay the
> logfile on top of the snapshot to re-populate?

It reads in the snapshot if it exists and applies that. Then it
applies the log entries. At that point it is ready to rejoin the
cluster. If it is missing log entries that the cluster has accepted
the current leader will send those to the newly restarted follower.

> Assuming one of these is true, does it do this automatically or is this a
> process that needs to be manually kicked off?

This all happens automatically in the scenario you described: if the
machine etcd was running on simply rebooted or the etcd process was
killed and restarted.

Thank You,

Brandon

Jeffrey Miller

unread,
Nov 19, 2014, 2:17:05 PM11/19/14
to coreo...@googlegroups.com
Brandon,

Thanks for your explanation, that clears it up for me.

Jeff

Jeffrey Miller

unread,
Nov 19, 2014, 4:50:16 PM11/19/14
to coreo...@googlegroups.com
So we just did some experimentation to try to isolate the issue. In the latest containerised version (0.4.6), we can see the log file being written to a persistent data mount (mounted from the docker host). It writes the log file, and we brought the snapshot count down very low, so we saw it write a snapshot too. When we bring the container down, we still see the log and snapshots, but as soon as either we restart that container or recreate a new image (with the same mount from the host), it clears out the snapshots and logs and everything is gone. Do you think this is a bug, or perhaps a configuration error? Again, this is a single node setup. Thanks!

Tyler Rivera

unread,
Nov 20, 2014, 2:35:54 PM11/20/14
to coreo...@googlegroups.com
I was working on this with Jeff and was able to sort it out last night after talking to Brandon in #coreos on freenode. It turns out that we were using a Docker image downloaded from Docker's hub, which is apparently built right off of etcd's github master. We've sinced moved over to the images built on quay.io (specifically quay.io/coreos/etcd:v0.4.6) which are nicely tagged and appear to be stable. We also found that our etcd instance was being run with the "-f" flag as an arg -- which may have been picked up from a tutorial some months past. We're now up and running. Thanks for the help.
Reply all
Reply to author
Forward
0 new messages