Improving container logs

8,768 views
Skip to first unread message

Jérôme Petazzoni

unread,
Nov 25, 2013, 8:12:38 PM11/25/13
to docker-dev
Hey list,

This is an attempt to gather ideas about improving the handling of container logs.
This is not about logs of Docker itself (see https://github.com/dotcloud/docker/issues/936 for that); we're talking about the logs generated by the containers here.

Current situation:
- nothing special is done regarding syslog (if a process tries to use standard syslog calls, that will go to /dev/null since /dev/log won't exist, unless you run syslog in the container).
- nothing special is done regarding regular log files (e.g. if you run something that writes log to /var/log/blah/blah.log, it will just stay here).
- stdout+stderr of processes running in containers are captured by Docker, and stored in a JSON format looking like this:
{"log":"Creating config file /etc/mercurial/hgrc.d/hgext.rc with new version\n","stream":"stderr","time":"2013-11-01T13:51:19.763621802-07:00"}
- those log files are stored to disk, and grow boundlessly
- logs can be consumed entirely (with "docker logs") or kind of streamed (with "docker attach"), but it's not possible to stream from a given point, or consume only parts of the logs

Ideally, we want to capture more log sources (syslog and regular files come to mind), and better ways to consume logs.

Specifically, it would be nice if we could...
1) handle log entries sent to syslog, since many unix daemons use that, and it allows to carry some extra info (facility and priority)
2) handle regular logfiles, since some programs will use that (and sometimes different log files will have different meanings, e.g. access.log and error.log)
3) store log entries in a bounded ring buffer, to make sure that logs will never fill up disk space by default
4) stream log entries, and be able to resume the stream (if the stream breaks) without losing entries

I would like to know:
- if you think that those features are needed indeed (or if some should be scrapped)
- if you think that additional features are needed for container logging
- if you are willing to work on implementing that kind of stuff

I have some ideas about how to implement this, but before, I'd like to get feedback on the general idea.

Thank you!




--

Brian Lalor

unread,
Nov 25, 2013, 8:30:51 PM11/25/13
to docker-dev
On Nov 25, 2013, at 8:12 PM, Jérôme Petazzoni <jerome.p...@docker.com> wrote:

I would like to know:
- if you think that those features are needed indeed (or if some should be scrapped)

Absolutely.  Capturing syslogs and managing them sanely is the #1 reason that I’m using CMD [“/sbin/init”].  Installing rsyslog, logrotate and cronie are additional overhead for sure, but (a) capturing logs is essential to triaging problems and (b) rotating them is essential to long-term stability.

- if you think that additional features are needed for container logging

You touched on the big ones, in my opinion: syslog integration and something to keep them from growing boundlessly.  If you start using syslog (/dev/log), it’ll be important to have the container alias or ID in the stream if they’re fed to the host’s syslog.    When it comes to capturing a process’s stderr and stdout, however, a ringbuffer may not be sufficient.  The simplest solution may be to flush those logs to an actual file and then let logrotate handle the purging.

I kind of think this belongs on docker-user, but I’ll throw my 2¢ in here (mainly because I’m not a docker-dev; I’m just here for the CentOS support!).

--
Brian Lalor


James Turnbull

unread,
Nov 25, 2013, 8:37:15 PM11/25/13
to docker-dev
I am +100 on syslog integration. That's my primary interest. Happy to
help where I can.

Regards

James

--
* The Docker Book (http://dockerbook.com)
* The LogStash Book (http://logstashbook.com)
* Pro Puppet (http://tinyurl.com/ppuppet)
* Pro Linux System Administration (http://tinyurl.com/linuxadmin)
* Pro Nagios 2.0 (http://tinyurl.com/pronagios)
* Hardening Linux (http://tinyurl.com/hardeninglinux)

Kushal Pisavadia

unread,
Nov 26, 2013, 3:09:12 AM11/26/13
to ja...@lovedthanlost.net, docker-dev
Yes! Logs and disk consumption are my two biggest issues with docker, to the point that I've made all of my apps log to stdout to get around the config overhead.

The issue with this is that the current log file format for stdout/stderr escapes everything and isn't smart enough to understand JSON. My apps currently log JSON to stdout and this gets interpolated as a string and then gets JSON escaped. As a result I've had to process these logs on-the-fly to unescape them.

If you could provide a way to just stream raw stdout/stderr to a file, that would be ideal and could be reused for syslogs. Something like `docker logs -f id` or alternatively to provide the raw logs (and a file descriptor) instead of wrapping them in JSON.



--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--

Kushal Pisavadia
@KushalP
Government Digital Service

Maciej Mazur

unread,
Nov 26, 2013, 3:01:48 PM11/26/13
to docke...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


On 2013-11-26, Jérôme Petazzoni wrote:
> Specifically, it would be nice if we could...
> 1) handle log entries sent to syslog, since many unix daemons use that, and
> it allows to carry some extra info (facility and priority)

As a docker user I can just add that it would be great if it is possible
to forward all /dev/log from containers to one of the Docker container
with syslog server installed. This is how I collect logs now. On "client"
containers I have syslog-ng installed that forward all logs to one container
with syslog-ng configured as "server".

Maciej Mazur

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBCAAGBQJSlP43AAoJEN8RP8s0TK6fM9wP/i6AZLwI5TufeG5HTKnm4gD/
e8QxV6lxQnXGA97EQzgzh51nyxyjjAf3p+EXcx8RDwJWKuK76mbiWeEo2km9iQI2
gRWDP7zstpBGVlKcD21Slzfjuii2hDt6eOfXO/K2g/gTVEqwcgi/+WJSBpm9fQNs
RqPcUDjWSGofhBqsDX9W62FmPZFKBVIPhfTXs2W0FQzQmhP8nI0CKWSu2REiK5w0
vZ9C1d6snv6rsBVb5uvBRRwEQ2Aeq+Rv6QTCudjKbdRA5cR3rjPx1NyBQZbgzlRO
40m1/m867z/R/rq0Uz147b+EST75nomwuPfKuuWaHj7IGbuhOGdaBNdYrvWzSWCt
TW43Sw3B/DKkzQCfnecUZEOfIh/oEnmdBCB18yB3nDzjb8YP9wcGoH/50VrVOjR0
zdwYYekMFtO6ph8l3tOONaxiZiwlunOBt0QTQPj+yXVfHoawLQZOxHKRADg3KiD3
WS8rbGJZGm/nwq6emDX3Qa/8F9aU21hsBO+N7lmj4Kyy+uY1aizuLPodSAScITtG
gw9ZdGAkquI2TyhwLGdM4JLjz4kCwcoHJ2QdvvzYKrvk8p8svLLZfe1vcOxr/W6e
LHgnDO/fEHuRmI2p1cZ1cfteGZdpjy/yn4wAKnyZrDYRL5px9kBHEGwhqILRAjZo
nuugiaPyXixJnwCq9M8r
=aZHl
-----END PGP SIGNATURE-----

Jérôme Petazzoni

unread,
Nov 26, 2013, 9:25:12 PM11/26/13
to Maciej Mazur, docker-dev
@Kushal: if the internal format is still JSON, but tools can easily consume in JSON or raw formats, would that be acceptable?

Others: it looks like we are on the same page! I'm almost disappointed because I was expecting that someone would point out that I forgot some important feature (well, I probably did anyway...:-))

@Maciej: forwarding logs to a "collector" container — that's a nice pattern, and I think we'll have to remember it. Now, I hope that we can reach a kind of "holy grail" where we don't even have to run syslogd in all the "client" containers...

I'll start drafting some ideas about how we could achieve those goals; meanwhile, if someone has additional requirements/comments/ideas, don't hesitate to throw them in!



--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Kushal Pisavadia

unread,
Nov 27, 2013, 3:47:47 AM11/27/13
to Jérôme Petazzoni, Maciej Mazur, docker-dev
Jérôme,

I'd be happy if it was easy to consume the raw log format.

There are already a ton of tools that can consume JSON (Heka, Lumberjack, Logstash, etc). An issue with the current JSON format is that it splits logs up by newline instead of by writes. Putting the pain on the consumer to re-join these splits.

For example, if you had a Java app that threw an exception, the exception would cover multiple lines.


--

Kushal Pisavadia

unread,
Nov 27, 2013, 3:53:08 AM11/27/13
to Jérôme Petazzoni, Maciej Mazur, docker-dev
Jérôme,

I'd be happy if it was easy to consume the raw log format.

There are already a ton of tools that can consume JSON (Heka, Lumberjack, Logstash, etc). An issue with the current JSON format is that it splits logs up by newline instead of by writes. 

For example, if you had a Java app that threw an exception, the exception would cover multiple lines.

At the moment, I'm just describing my own use cases and pain points, which I can appreciate isn't as helpful. It would be great if docker described a pattern for how to log in containers and another for getting the raw logs for a container.

On Wednesday, 27 November 2013, Jérôme Petazzoni wrote:


--

Geoffrey Bachelet

unread,
Nov 27, 2013, 4:13:26 AM11/27/13
to docke...@googlegroups.com
Here's what I do right now: I have a daemon that listens on the /getEvents API endpoint for containers creation and attach them on the fly (it can also attach already running containers at startup). It then forwards any message received from the containers to a rabbitmq queue for later consumption (could as well forward to syslog). It works really well for plain stdout output, it's getting a bit trickier when you want to get specific log files, as you need to get them to the container's stdout. Using tail -f as an entrypoint works well, but you have to parse the stream to split it to the original log files.

So basically what I'd really love would be a standardized way to get things to a container's stdout :)

Also, I was wondering if it would somehow be possible to bindmount /dev/log? I guess it would have to be symlinked somewhere and bindmount that directory. That would allow a syslog sitting on the host to receive logs from any arbitrary container using syslog as a logging facility, or am I missing something? (I'm not very well versed in syslog so maybe I just said something really stupid).

Dan Buch

unread,
Nov 27, 2013, 12:13:39 PM11/27/13
to docke...@googlegroups.com
+1 to better syslog integration :-)

Just to add to the list of current use cases: We're running containers via upstart in the foreground and logging JSON to stdout (piped through https://github.com/modcloth-labs/loggly-pipe).  This has allowed us to rely on upstart log rotation.  We're even doing this in the case of "service containers" like PostgreSQL by sending logs to stdout, although I suspect we're missing some things that are trying to log to syslog.

My expectation is that we'll eventually hit the problem where we'll have to deal with a legacy <something> that can't be sent to stdout and has multiple log files, so having support for that sounds like a good idea to me, although it could be hacked with a bind mount pretty trivially AFAICT.


--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dan Buch
cell: 412.440.7624
internets: meatballhat

Jérôme Petazzoni

unread,
Nov 29, 2013, 10:34:41 PM11/29/13
to Dan Buch, docker-dev
So, I did some experiments, and the results are pretty interesting.
As you might know, /dev/log is a SOCK_DGRAM UNIX socket, created by your local syslog daemon.
Any process willing to emit log entries just sends datagrams to that socket, using a specific format.
(See http://tools.ietf.org/html/rfc3164#section-5.4 for a quick example about that format; it's fairly simple.)

Good news: you can bind-mount the /dev/log socket from your host, into your containers. (As hinted by Geoffrey!)
(Note: since my version of Docker won't let me do "-v /dev/log:/dev/log", I had to do "-v /hostdev:/dev" then create a symlink, but we could bind-mount it just like we bind-mount /etc/resolv.conf without a problem.)

It means that all your containers can automatically get syslog support. Any process running in a container, and using the regular syslog calls, will emit messages to the host's syslog.
Bonus: the receiver end (the syslog daemon) should be able to lookup the PID of the sender, and from that PID, it is able to find out the name of the container, the name of the process, and the UID running it.

PROPOSITION 1: bind-mount /dev/log into containers just like we bind-mount /etc/resolv.conf.
- result: all containers can use syslog calls, and logs are sent to the host's syslogd.
- drawback: this is only as good as your host's syslogd (i.e. with a default syslogd this will probably be very confusing; journald might cope a bit better).
- improvement: when starting Docker, specify the socket to bind (instead of /dev/log); e.g. "docker -d -devlog /var/run/logger.sock".
- further improvement: Docker creates its own "syslog-compatible" consumer socket to handle syslog events from containers.

Okay, the next challenge was about storing and retrieving logs efficiently.
We have a problem there.
- From on overall design/architecture point of view, it would be nice to keep Docker as simple as possible, and offload all the complex tasks to external things: shipping logs, archiving them, querying them...
- At the same time, we have (at least) one command, "docker logs", which makes an extremely bold assumption; this assumption is "Docker has infinite storage for your logs" :-)

PROPOSITION 2: change the behavior of "docker logs", so that it doesn't send ALL the logs for a given container, because that's an unrealistic demand.
- result: now that we're not asking Docker to do something impossible, we can focus on what's actually doable :-)
- drawback: bad news for people who relied on "docker logs" to output ALL THE THINGS.

PROPOSITION 3: convert all messages (logs emitted by Docker itself, logs emitted by containers from stdout/stderr or syslog, events about containers...) into a common format (which serializes to JSON?)
- result: we can now log "rich" messages.
- drawback: requires unification of container logs and Docker logs; if this is not desirable, Docker logs can be kept out of the equation.

PROPOSITION 4: define a "log plugin API".
A "log plugin" should expose functions to store a log message (and do whatever it wants with it), and functions to retrieve log messages for a given container, with some optional parameters. It would be reasonable to enforce each log message to bear a container ID and name (except for messages related to Docker itself), as well as a timestamp, and unique sequence number. Then, "docker logs" would be reimplemented by talking to this API (in a probably less powerful way). Other interesting commands like "docker tail" or "docker stream-logs-since-this-timestamp-or-id..." could easily be added.
A default plugin would provide a limited in-memory ring buffer.
Other plugins could use logstash, journald, etc. to provide better & faster storage, or more search features.
- result: the code of Docker remains simple enough, but can be extended to support arbitrarily powerful logging systems.
- drawback: the default plugin will only retain a very small amount of logs, and in a volatile manner (logs are lost when daemon is restarted).
- improvement A: logs can also be sent to a stream, to be stored to disk (+log-rotated) for "classical" consumption (with grep etc.), or to be shipped to other systems without much fuss.
- improvement B: the logs for each container can be kept in a tiny on-disk ring buffer, so that logs aren't lost on restarts.
- corner case: when I do "docker logs angry_linus", assuming that two containers have been using that name, what should happen? should I get the concatenation of both? or just the most recent one?

OK, that last proposition as probably the most controversial one, and the one that we need to agree on before moving forward. Let me know if you think it's broken / flawed / or if you have something better in mind :-)

One last thing: if we want to handle "plain log files" inside containers, I think we can do it pretty easily (and independently of the previous work), with two variants of the same idea.
1. We could tell Docker (when we start a container) that a specific file will be a log file:
docker run -d -log /var/log/nginx/access.log -log /var/log/nginx/error.log nginx
Docker will create named pipes in those locations, and "attach" to the other end of the pipe, reading everything written there by the procses, and injecting that in the regular log stream.
2. We could tell Docker to watch a specific set of files (because we don't know how they will be named):
docker run -d -logs /var/log/tomcat/*.log tomcat
Docker will watch files created matching this glob (or directory...?) and read them to inject them into the log stream.
In the latter scenario, the container is still responsible for rotating and cleaning up older log files; but when you have a system that just generates a bunch of log files with arbitrary names, it's not unreasonable to embed a little extra something to deal with it.

Comments / feedback / crossbow bolts welcome :-)

Alexander Larsson

unread,
Dec 2, 2013, 3:53:54 AM12/2/13
to docke...@googlegroups.com, Dan Buch

On Saturday, 30 November 2013 04:34:41 UTC+1, Jérôme Petazzoni wrote:

PROPOSITION 2: change the behavior of "docker logs", so that it doesn't send ALL the logs for a given container, because that's an unrealistic demand.
- result: now that we're not asking Docker to do something impossible, we can focus on what's actually doable :-)
- drawback: bad news for people who relied on "docker logs" to output ALL THE THINGS.

Do we even need "docker logs"? Logging is a complicated thing and there are several logging systems available out there with many man-years of work on development done on them and massive deployment both in terms of log generation and log consumption. All these systems can in some form implement "docker logs". Why is there a need for docker to reimplement this?
 
PROPOSITION 3: convert all messages (logs emitted by Docker itself, logs emitted by containers from stdout/stderr or syslog, events about containers...) into a common format (which serializes to JSON?)
- result: we can now log "rich" messages.
- drawback: requires unification of container logs and Docker logs; if this is not desirable, Docker logs can be kept out of the equation.

There are already common formats for logging, some have RFCs like syslog, and others are more de-factor standards like journald. I don't see how creating a new format is helping, it just means you have to write more conversion software to convert the docker logs to something that works in your existing logging infrastructure. Also, not having e.g. the journal directly consume the syslog output means it won't get the extra kernel-supplied metadata it otherwise would extract (i believe).

Geoffrey Bachelet

unread,
Dec 3, 2013, 4:44:52 AM12/3/13
to Alexander Larsson, docke...@googlegroups.com, Dan Buch
> Do we even need "docker logs"?

+1 on this. "docker logs" is convenient as a quick way to see what's going on in a container, for debugging purpose for example, but in now way should it be considered a canon way of fetching logs imho.

I love Jerome's "Proposition 4". As Alexander said, Logging is a complicated matter, and several solutions already exist that docker could (should?) interface with.

For me the perfect plugin system would:

1. expose (at least) 2 interfaces: one for transforming logs (so you could marshall logs into a format understandable to syslog for example, or any other format you need), and one for delivering logs (delivery to syslog, rabbitmq, or whatever).

2. be leverageable without knowledge of Go (using https://github.com/progrium/go-plugins for example)

3. allow multiple plugins to run simultaneously (for example, I want to be able to pipe logs through my websocket servers but I also want to persist them in an arbitrary datastore at the same time)

I'm also all for the "plain log files" Jerome described, sounds perfectly fabulous.

Brian Morearty

unread,
Dec 3, 2013, 2:28:12 PM12/3/13
to docke...@googlegroups.com, Alexander Larsson, Dan Buch, geoffrey...@gmail.com
I'm also in favor of the "plain log files" option that you mentioned, Jérôme. As you said, this can be done regardless of which of your propositions (or some other proposition) is implemented.

But it would be really useful to be able to parse out the different logs visually and programmatically. It would be nice if `docker logs` would put a marker in each line of log output, to indicate where it came from. (Even better for human scanning: colorize the markers by default, but allow colors to be turned off.)

Here's an example of what I mean. It's super-easy to scan the list and see the differences between the app web logs and the heroku router logs:



Brian Morearty
Hands on with Docker: http://handsonwith.com/

Alexander Larsson

unread,
Dec 4, 2013, 2:33:21 AM12/4/13
to docke...@googlegroups.com, Alexander Larsson, Dan Buch, geoffrey...@gmail.com


On Tuesday, 3 December 2013 20:28:12 UTC+1, Brian Morearty wrote:
I'm also in favor of the "plain log files" option that you mentioned, Jérôme. As you said, this can be done regardless of which of your propositions (or some other proposition) is implemented.

But it would be really useful to be able to parse out the different logs visually and programmatically. It would be nice if `docker logs` would put a marker in each line of log output, to indicate where it came from. (Even better for human scanning: colorize the markers by default, but allow colors to be turned off.)

It *exactly* this kind of feature creep I'm worried about. Docker is a *piece* of the linux/unix echosystem, not a replacement for it. All the other logging systems have features like this, plus a boatload more, including things like remote logging, rate limiting, log rotation,  log indexing, secure metadata for log rows, etc.

We can either make sure we integrate well with these systems, or we'd be on an eternal mission to replicate all those features (that were added because people need them). And, in practice I think everyone who runs docker in production already have an existing production logging setup, so they won't use the docker log command anyway.

Furthermore, even a simple "forwarding to systemd journal" plugin is problematic, because by intercepting the kernel logging event you make it impossible for the journal to collect trustworthy kernel-based metadata for the log line, so you  want to allow syslog from the container going *directly* to the journal. This is why I wondered if "docker log" was really necessary, because the existance of this command implies that the docker daemon have to listen to the syslog, which is problematic per the above. Furthermore, it means docker must store a duplicate of the log to be able to recall it (and handle log rotation, limits, etc).

I guess it may be possible for a plugin system to have the "docker logs" command reach into the journal itself to recall the logs, rather than store it itself. That way it would be less problematic.

Kushal Pisavadia

unread,
Dec 4, 2013, 6:17:57 AM12/4/13
to Alexander Larsson, docke...@googlegroups.com, Dan Buch, geoffrey...@gmail.com
I'd agree with the sentiment here: output raw logs (not bounded JSON) and build whatever functionality you want on top of this.

Docker is a nice(r) interface over LXC (and maybe in the future others). It should concentrate on those things.

It shouldn't be reimplementing log shippers when there's already a large ecosystem available that you can point users at.

What could be nice is providing separate files for stderr/stdout as you already have that functionality built.

If docker was to ship plain ol' logs I'd be happy to write a one-pager for the docs pointing people at log shippers :-)
--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--

Brian Morearty

unread,
Dec 4, 2013, 1:34:14 PM12/4/13
to Alexander Larsson, docke...@googlegroups.com, Dan Buch, geoffrey...@gmail.com

Alexander,

It wasn’t clear to me whether the feature creep you’re talking about it Jérôme’s proposal #4, or my addition of making it easy to parse out the different logs. From the way you inlined my comments, it looks like you were addressing that. I probably wasn’t clear in giving a reason for what I said. I apologize for that.

Here’s the reason: as a simple matter of pragmatism, if Docker were to provide a -log feature that consolidates multiple logs into the docker logs output but were simply to commingle output from those different log files without any way to distinguish them, then the feature would be useless. A single stream of log output from multiple sources? How would that be useful? What’s the first thing people would do? They would have to write parsers to pull apart the lines based on where they came from. Those parsers would be based on heuristics, and they would have errors, and a lot of time would be wasted. There are no built-in or third-party tools that will automatically parse those log lines out from each other.

So my point is just that if proposal #4 is implemented, tagging the log lines is a must-have. Not a nice-to-have, and not feature creep. But if proposal #4 is not implemented, well, tagging isn’t needed. :-)

Brian



--

Andy Goldstein

unread,
Dec 10, 2013, 3:14:32 PM12/10/13
to docke...@googlegroups.com, Alexander Larsson, Dan Buch, geoffrey...@gmail.com
Hi all,

I'm working on adding logging and metrics aggregation support to OpenShift, so I'm interested to see where Docker is headed with respect to logging. From an OpenShift perspective, we want to enable logging and metrics aggregation at the platform level. So instead of having container logs live only in the containers themselves, we'd like to allow the option of shipping logs elsewhere. We can provide a set of reasonable and minimal defaults (e.g. store container logs in someplace like /var/log/openshift/container/$uuid/logs), but we probably can't provide a configuration that will work for 100% of our users (they would presumably customize our default configuration to get what they want).

I agree that there are other tools out there that specialize in dealing with logging (shipping, parsing, storing, indexing), so it makes sense to me that Docker should facilitate these other tools in performing their specific jobs, but I don't think Docker should really do much more than that for logging. And being able to know which log each specific event came from is definitely something we'd want as well.

Andy

Geoffrey Bachelet

unread,
Jan 21, 2014, 9:50:28 AM1/21/14
to docke...@googlegroups.com
Is there any news on this? I can't see anything new related to container logs on the mailing-lists or in the github issues :/


On Tuesday, November 26, 2013 2:12:38 AM UTC+1, Jérôme Petazzoni wrote:

Jérôme Petazzoni

unread,
Jan 21, 2014, 10:42:45 AM1/21/14
to Geoffrey Bachelet, docker-dev
Hi Geoffrey,

This is still on my radar, but I've been completely swamped over the last 2 months.
I have a draft summary that I should complete next week or so.

Best,


--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dick Davies

unread,
Jan 23, 2014, 4:31:03 AM1/23/14
to Jérôme Petazzoni, Geoffrey Bachelet, docker-dev
It's an area I'm interested in too - here are a few 2cents :)

I dislike the ring buffer idea - standard tools like logrotate that can
easily manage that for you without risking data loss.

'docker logs' - from what I've seen is a bit impractical - as it's per-container
(last I looked?) so not going to scale well to spin up one of those
per-container
and pipe the output.

Best approach I've come up with is to run containers via something like
systemd which will pass everything on nicely to journal (which is very
flexible),
and then collect everything up in the host for shipping with something like
Logstash / logstash-forwarder. I know someone's working on that project to
integrate systemd/journal fully but for now it can forward easily enough.


I don't have a good answer to the syslog thing, and I agree that's a gap
that should be filled.

If a process can log to a remote syslog (or a pipe for apache logs) there's
nothing to do. Local syslog is the hard bit.

Feeding a /dev/log into a container that forwards everything up to the host
seems like a good approach (I'm not familiar enough with the codebase yet
to implement that myself but when I am I'll see if that's possible).

It's debatable whether that should go direct to the hosts syslog
(easy and clean but docker loses visibility of that information)
or into the core docker logging gubbins
(everything in one place but then you have to re-implement syslog
in docker itself which is arguably bloat).

Deni Bertović

unread,
Jan 26, 2014, 5:11:54 PM1/26/14
to docke...@googlegroups.com
+1 for PROPOSITION 4.

No need for docker to reinvent the wheel, offloading to proven solutions seems like the obvious way to go, imho.

-Deni

Gabriel Monroy

unread,
Feb 2, 2014, 4:34:38 PM2/2/14
to docke...@googlegroups.com, Dan Buch
+1 for the log plugin API (proposition 4).  In my opinion, handling log streams is central to Docker's raison d'être as a container engine.  A plugin system seems like the best long-term approach to handling a wide range of logging scenarios:
  • For Deis, we want Docker to outsource logging to syslog with nothing ever written to disk
  • In-memory ring-buffer for simple `docker logs` seems valuable for smallish dev installations (boot2docker?)
  • Writing to the host's /var/log is an easy integration for systems already wired up to Splunk, for example
Pasting a #docker-dev conversation with our BDFL on the subject:

[20:38:09] <gabrtv> good thread on fixing logs: https://groups.google.com/forum/#!topic/docker-dev/3paGTWD6xyw
[20:38:24] <crosbymichael> I've read that a few times
[20:38:28] <crosbymichael> what do you think about it?
[20:39:38] <crosbymichael> me today "i'm getting ddos'd by myself, let me check container logs to see what is going on..... I'm ddos'n myself with all logs from the entire month"
[20:41:01] <gabrtv> i don't have a strong preference for a specific solution, however
[20:41:50] <gabrtv> i tend to think docker should be doing less in the case of logs, not more
[20:41:57] <crosbymichael> me too
[20:42:10] <shykes> yeah anything that logs to disk should be a plugin
[20:42:24] <shykes> sending to syslog should be a plugin
[20:42:25] <gabrtv> i also think dumping logs into /var/lib/docker (as json!) is a bad idea
[20:42:54] <crosbymichael> gabrtv: it's dumped as json so we can split stderr, stdout
[20:43:03] <shykes> the original design was: start simple with a dump to disk (not in json). expose a network api to atomically "get and flush"
[20:43:26] <shykes> so that it's the responsibility of an outside system to collect and truncate logs via the remote api
[20:43:38] <bkc_> sounds like a good idea :)
[20:44:16] <crosbymichael> it will be a gold rush once we have plugins, so many  to build
[20:44:32] <shykes> in that design the design is used as a temporary buffer
[20:44:41] <gabrtv> shykes: agree. that's essentially the same contract any other service maintains w/ system administrators.
[20:45:03] <shykes> yeah the important decision was, "let's not rely on ssh access + regular logrotate"
[20:45:12] <shykes> "let's expose it on the remote api instead"
[20:45:35] <gabrtv> shykes: so are plugins the near-term answer in your view?
[20:45:39] <shykes> what's missing currently is 1) aggregate logging (as opposed to one-shot) and 2) that actual flush-and-truncate which we never implemented
[20:45:45] <shykes> yes
[20:46:03] <shykes> well mid-term, they are really coming together
[20:46:10] <shykes> maybe a few short-term fixes before that
[20:46:14] <gabrtv> look forward to pitching in on that..
[20:46:19] <shykes> but I don't see a major overhaul of logging before plugins
[20:46:53] <shykes> unless jerome proposes an actual patch from this email thread
[20:47:36] <shykes> (or anyone else with specific suggestions in that thread)
 

Jérôme Petazzoni

unread,
Feb 25, 2014, 7:29:16 PM2/25/14
to Gabriel Monroy, docker-dev, Dan Buch
Hi,

I'm un-burying this conversation, because I had the opportunity to talk about logging recently with various people and here are some ideas that came out of it:
- Docker may inject a /dev/log socket in each container
- Docker may aggregate those logging streams + stdio streams of each container
- if a container logs to plain files, it can log to a volume, then that volume can be shared with another container whose sole purpose is to "tail" those files to syslog or stdout
- then, this "firehose" of logs can be sent to multiple places: local file (/fifo/socket), stdin of another container, journald
- Docker doesn't do any kind of buffering whatsoever; if one of the outputs of the firehose is clogged, it drops messages, and when it will unblock, it will write some information to tell that X messages were dropped (or at least, that some messages were dropped)

By default, the firehose could be sent to journald (when available), local file (with logrotate rules setup by distro scripts), stdout (otherwise).
Then, there could be different ways to plumb logs to a new destination. Fantasy examples:
- docker run -logger pipestash # starts the pipestash image and send it all the logs!
- docker log add /var/log/docker.log # log to this file
- docker log add fd:1 # log to stdout
- docker log clear # don't log anymore anywhere
- docker log add 4cd79fab0529e # send logs to stdin of this container

How does that sound?



--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alexander Larsson

unread,
Mar 3, 2014, 8:17:26 AM3/3/14
to docke...@googlegroups.com, Gabriel Monroy, Dan Buch


On Wednesday, 26 February 2014 01:29:16 UTC+1, Jérôme Petazzoni wrote:
Hi,

I'm un-burying this conversation, because I had the opportunity to talk about logging recently with various people and here are some ideas that came out of it:
- Docker may inject a /dev/log socket in each container
- Docker may aggregate those logging streams + stdio streams of each container
- if a container logs to plain files, it can log to a volume, then that volume can be shared with another container whose sole purpose is to "tail" those files to syslog or stdout
- then, this "firehose" of logs can be sent to multiple places: local file (/fifo/socket), stdin of another container, journald
- Docker doesn't do any kind of buffering whatsoever; if one of the outputs of the firehose is clogged, it drops messages, and when it will unblock, it will write some information to tell that X messages were dropped (or at least, that some messages were dropped)

By default, the firehose could be sent to journald (when available), local file (with logrotate rules setup by distro scripts), stdout (otherwise).

This sounds about right to me. With one detail, its important that its possible for the /dev/log in the container to go directly to journald (rather than being pumped into the journal via docker) so that the journal gets the correct peer information about the logging process.

Johannes Ziemke

unread,
Mar 3, 2014, 11:20:31 AM3/3/14
to Jérôme Petazzoni, Gabriel Monroy, docker-dev, Dan Buch
Hi Jerome,

On Wed, Feb 26, 2014 at 1:29 AM, Jérôme Petazzoni <jerome.p...@docker.com> wrote: 
- Docker may aggregate those logging streams + stdio streams of each container

Imo it should be possible to retain the source of a log line. Not sure how the aggregation would look like though.
 
- if a container logs to plain files, it can log to a volume, then that volume can be shared with another container whose sole purpose is to "tail" those files to syslog or stdout

But that's nothing Docker needs to support, right? IMO Docker should provide only one way to log but make it possible to use a "filter" (think svlogd) which converts the input to whatever Docker supports.
 
 
- Docker doesn't do any kind of buffering whatsoever; if one of the outputs of the firehose is clogged, it drops messages, and when it will unblock, it will write some information to tell that X messages were dropped (or at least, that some messages were dropped)

I'm not so sure about that. In general I agree, we can't block everything if the firehose is clogged, but there are scenarios where loosing a log line can be an severe issue. Think of security/auditing logs or statistics that will be generated from the logs. I think what you proposed is a good default, but we might want to make this configurable.

How does that sound?

Beside that +1

Jérôme Petazzoni

unread,
Mar 3, 2014, 12:57:35 PM3/3/14
to Alexander Larsson, docker-dev, Gabriel Monroy, Dan Buch
Good point. I forgot to mention it in my recap, but of course Docker should annotate each log entry with its origin (source: syslog/stdout/stderr, container,  and maybe PID and others).

I don't know if we want a special mode where everything gets bridged directly to journald; but why not.

Jérôme Petazzoni

unread,
Mar 3, 2014, 1:02:51 PM3/3/14
to Johannes Ziemke, Gabriel Monroy, docker-dev, Dan Buch
On Mon, Mar 3, 2014 at 8:20 AM, Johannes Ziemke <johanne...@docker.com> wrote:
Hi Jerome,

On Wed, Feb 26, 2014 at 1:29 AM, Jérôme Petazzoni <jerome.p...@docker.com> wrote: 
- Docker may aggregate those logging streams + stdio streams of each container

Imo it should be possible to retain the source of a log line. Not sure how the aggregation would look like though.

Right. I'd love to hear from logging format gurus here.
Should the firehose be JSON? BSON? protobuf? messagepack? Is there a better, standardized log format that is commonly recognized and used by other logging software?

 
 
- if a container logs to plain files, it can log to a volume, then that volume can be shared with another container whose sole purpose is to "tail" those files to syslog or stdout

But that's nothing Docker needs to support, right? IMO Docker should provide only one way to log but make it possible to use a "filter" (think svlogd) which converts the input to whatever Docker supports.

I think we agree :-)
Docker will "capture" stdout, stderr, and syslog.
If a container logs to standard files, then it's the responsibility of the user to arrange those files to be on a volume, and that volume to be shared with another container, and that container to ship those logs. Does that make more sense?
 
 
- Docker doesn't do any kind of buffering whatsoever; if one of the outputs of the firehose is clogged, it drops messages, and when it will unblock, it will write some information to tell that X messages were dropped (or at least, that some messages were dropped)

I'm not so sure about that. In general I agree, we can't block everything if the firehose is clogged, but there are scenarios where loosing a log line can be an severe issue. Think of security/auditing logs or statistics that will be generated from the logs. I think what you proposed is a good default, but we might want to make this configurable.

OK, so, it looks like we have (at least) two options:
1) Make this configurable (between drop/nodrop), but "nodrop" means "buffer"; should the buffer size be bounded, unbounded? What happens when the buffer is full? Should the inputs block? What if the logging container itself is blocked?
2) Decide that "if you want lossless logging, you should set it up yourself", e.g. either ship secure logs directly in a lossless manner, or send the logging firehose to a file, then share that file with a container to take care of the buffering...?

 

Alexander Larsson

unread,
Mar 3, 2014, 2:34:14 PM3/3/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch

Well, you can supply all kinds of metadata to your log lines with journal. But there are some specific fields that are "trusted" in the journal (the ones that starts with underscore). These are fields that journal gets itself from a trusted source (i.e. the kernel) such as pid, uid, etc, and docker can never supply these fields itself (wouldn't be trusted if the logging client can put whatever in these fields).

 

Jérôme Petazzoni

unread,
Mar 3, 2014, 3:00:06 PM3/3/14
to Alexander Larsson, docker-dev, Gabriel Monroy, Dan Buch
Hmm, sorry, I think I'm missing a key point here -- why can't Docker (the daemon, not a container) generate those fields, and filter out those provided by the containers?

Alexander Larsson

unread,
Mar 4, 2014, 7:31:23 AM3/4/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
Here is the manpage for the journal fields:

http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=SYSTEMD.JOURNAL-FIELDS&ampsect=7

In particular, this part:

TRUSTED JOURNAL FIELDS
       Fields prefixed with an underscore are trusted fields, i.e. fields that
       are implicitly added by the journal and cannot be altered by client
       code.

       _PID=, _UID=, _GID=
           The process, user and group ID of the process the journal entry
           originates from formatted as decimal string.
...

These fields are added to each log line by the journal daemon itself from the information it gets from the kernel.

So, for example, if a container uses syslog to report a line like "error: foo" the journal then uses a kernel call (like e.g. SO_PEERCRED) to check the remote side UID and adds a _UID field to the log line in the journal.

Now, it is true that if the client logged to the docker daemon it could similarily call SO_PEERCRED and get the remote uid. It could even pass it on as an extra field when storing it in the journal. However, it cannot call this field _UID, as this is a trusted field name that clients (and docker daemon is a journal client) can't write. Instead the journal will itself fill in the _UID field of the log line with the uid that the docker daemon is running under.

Jérôme Petazzoni

unread,
Mar 4, 2014, 3:11:09 PM3/4/14
to Alexander Larsson, docker-dev, Gabriel Monroy, Dan Buch
Ah, thanks for those details, very helpful!

Now, I wonder: is there any way to provide the journal with trusted entries?
By talking to some DBUS interface, maybe?
Or does journal "trust no one"?


--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Kiyoto Tamura

unread,
Mar 4, 2014, 7:10:07 PM3/4/14
to docke...@googlegroups.com, Johannes Ziemke, Gabriel Monroy, Dan Buch
>Should the firehose be JSON? BSON? protobuf? messagepack? Is there a better, standardized log format that is commonly recognized and used by other logging software?

I vote for JSON. It achieves a decent balance between human and machine readability. Another option is syslog format since most log collectors out there support it (Rsyslogd, Logstash, Fluentd, etc.)

James Turnbull

unread,
Mar 4, 2014, 7:12:18 PM3/4/14
to docker-dev
Kiyoto Tamura wrote:
>>Should the firehose be JSON? BSON? protobuf? messagepack? Is there a
> better, standardized log format that is commonly recognized and used by
> other logging software?
>
> I vote for JSON. It achieves a decent balance between human and machine
> readability. Another option is syslog format since most log collectors
> out there support it (Rsyslogd, Logstash, Fluentd, etc.)

I'd like to see JSON. Preferably I'd like to see a pluggable interface
that I can add my own logging plugins too.

Cheers

James




--
* The Docker Book (http://dockerbook.com)
* The LogStash Book (http://logstashbook.com)
* Pro Puppet (http://tinyurl.com/ppuppet2 )
* Pro Linux System Administration (http://tinyurl.com/linuxadmin)
* Pro Nagios 2.0 (http://tinyurl.com/pronagios)
* Hardening Linux (http://tinyurl.com/hardeninglinux)

Clayton Coleman

unread,
Mar 5, 2014, 12:21:30 AM3/5/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
It's not possible that I'm aware of - the only trusted entries are those set on the receiving side of the various sd_journal_* functions.

Clayton Coleman

unread,
Mar 5, 2014, 12:22:11 AM3/5/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
Ignore the link, just realized that was in Alex's previous post.

Alexander Larsson

unread,
Mar 5, 2014, 2:18:38 AM3/5/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
"Trusted" generally means that the kernel told us this. How trusted would say a selinux label be if any client could do a dbus call saying "my selinux label is FOO". Not very...

So, no, there is no way to do this, which is why we want to have a way to have the *real* syslog node in /dev/log so that journal can get the actual real data about the logger from the kernel.

Ian Ragsdale

unread,
Mar 28, 2014, 2:04:52 AM3/28/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
Totally agree. My 2 cents, as a person getting started with docker, is that adding the ability to pass any socket in as the container's /dev/log entry would be by far the most flexible. That provides a super portable way for any container to pass on its logs, and tons of server software will already know how to write to it. That seems like the easiest way to make sure logs quickly get out of the container and someplace useful (the local system logs as a lowest common denominator). There are already tons of log shipping systems out there, no need to reinvent the wheel.

In addition, when you combine that with the ability to mount a directory from a container, you've now got a good way to containerize your log shipping system. Just have it create a socket in the exported directory, and then you can pass in that socket to all your other containers. One simple change, and you've got a really flexible system that works well with the docker ecosystem, and you don't have to spend a ton of time writing log shipping / parsing / formatting code. Instead, you'll get a nice ecosystem of docker logging containers that people can use.

- Ian

Brad Murray

unread,
Mar 30, 2014, 5:57:50 PM3/30/14
to docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
I would vote for json, but rather than define a new format, is it possible to use something compatible with http://logstash.net/ out of the box?  Seems to be one of the leading 'structured logs' variants.

Ian Ragsdale

unread,
Mar 30, 2014, 7:35:56 PM3/30/14
to Brad Murray, docke...@googlegroups.com, Alexander Larsson, Gabriel Monroy, Dan Buch
I spent a good bit of yesterday playing with various options, and I'm starting to get the feeling that there really is no "one size fits all" option. People just have too many different log setups these days. So, here's a set of steps that I think would provide maximum flexibility to work with anybody's system, with hopefully not too much extra work within docker.

1) Allow each container to specify a path (inside the container) that contains that container's logs. If that path is specified for a container, it will be volume mapped to a per-container directory within a special docker log directory, which is configurable per docker instance. This allows sysadmins to ensure that all docker container logs are stored in a single place, so they can be watched by whatever means they like, given a special filesystem or quota, etc.

2) Have docker create a per-container datagram socket and map it to /dev/log within each container by default. Docker should then listen for log entries on this socket, tag each entry (so that they may be distinguished by container), and make sure they get logged within the host. Currently people running single-process containers could be missing a lot of log data without this.

3) Consolidate stdout & stderr from each container and any log entries from the per-container /dev/log socket into a single stream. Let the docker user decide on a per-container basis whether that stream should be written to disk in the same log directory as the logs from step #1, or whether the individual entries should be written to the host's syslog system.

4) All log entries from #2 and #3 are in JSON format like now.

I think most people are either going to be using mostly a file-based system (with a daemon that tails files on disk and does something useful with the output) or a syslog system (rsyslogd or syslog-ng). This makes it pretty easy to make sure all the output from docker goes either to a log directory or to syslog, so that people can easily integrate into whatever system they like. I don't think it's particularly important what format it's in, all the log consolidation systems I've seen have some way to translate from one format to their preferred format.

For super bonus extra credit, build in the ability to tail any container log files, tag the entries with container name, image name, and filename, and pipe them into the syslog system as well. That gives any docker users one single place to catch all docker logs and ship them into whatever system they want.

- Ian

Michael Neale

unread,
Mar 31, 2014, 3:26:51 AM3/31/14
to docke...@googlegroups.com, Alexander Larsson, Dan Buch, geoffrey...@gmail.com
On Wednesday, December 4, 2013 6:33:21 PM UTC+11, Alexander Larsson wrote:

It *exactly* this kind of feature creep I'm worried about. Docker is a *piece* of the linux/unix echosystem, not a replacement for it. All the other logging systems have features like this, plus a boatload more, including things like remote logging, rate limiting, log rotation,  log indexing, secure metadata for log rows, etc.

We can either make sure we integrate well with these systems, or we'd be on an eternal mission to replicate all those features (that were added because people need them). And, in practice I think everyone who runs docker in production already have an existing production logging setup, so they won't use the docker log command anyway.

This was my reaction too. Docker, mostly-happily runs under supervisors like systemd that work nicely with the app-container centric view of the world - working with whatever logging systems (and there are a lot) already exist. 

Solomon Hykes

unread,
Mar 31, 2014, 3:53:54 AM3/31/14
to Michael Neale, docker-dev, Alexander Larsson, Dan Buch, geoffrey...@gmail.com
Hi everyone, thanks for sharing your (very diverse!) opinions on this thread. I thought it would be a good time for an official maintainer to weigh in. Just a few quick points:

1) We are actively working on a solution to logging in Docker. My opinion is that it's not adequate for production use of Docker. Although we're not participating actively in this thread, I'm keeping an eye on it to make sure the final design will address everybody's concerns. So far I'm pretty confident it will.

2) We are not going to reinvent the wheel of logging. The goal is for Docker to integrate cleanly into existing logging systems. There will be reasonable builtins to integrate with major logging systems out there, and a clean API to add missing integrations. My hope is that over time the community produces and maintains nicely composable plugins for any logging scenario imaginable.

3) The current "log to a json file forever" behavior will become optional - just a plugin among others. It's a nice default for development and defaults, but as many have pointed out, it is not appropriate for production use.

4) We are NOT going to standardize on one particular logging message format and force all applications to use that, because there is no one true format out there. Some applications use syslog, some print to stdout/stderr, some write to the systemd journal, some dump json, or combined in a custom location on the filesystem. Docker should have a way to get logs out of all of them, a) without requiring applications to change how they log, and b) without requiring every docker-based infrastructure to support a matrix of multiple logging formats. As usual, it's all about separation of concerns.

Hopefully we will have something to show soon. In the meantime feel free to come by #docker-dev on IRC if you want to help out.


Thanks again for taking the time to write all this, we will make sure the 1.0 logging infrastructure is up to the expectations.



--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Neale

unread,
Mar 31, 2014, 5:15:26 AM3/31/14
to docke...@googlegroups.com, Michael Neale, Alexander Larsson, Dan Buch, geoffrey...@gmail.com

On Monday, March 31, 2014 6:53:54 PM UTC+11, Solomon Hykes wrote:
Hi everyone, thanks for sharing your (very diverse!) opinions on this thread. I thought it would be a good time for an official maintainer to weigh in. Just a few quick points:



 I think in one swoop you calmed everyones concerns. Sounds great.  

Robbie Vanbrabant

unread,
Jun 10, 2014, 8:05:24 AM6/10/14
to docke...@googlegroups.com, michae...@gmail.com, alexande...@gmail.com, d.b...@modcloth.com, geoffrey...@gmail.com
Now that docker 1.0 is available, is there any news about this? Is it now possible to prevent container logs from growing boundlessly?

Matthias Johnson

unread,
Jun 23, 2014, 12:26:15 PM6/23/14
to docke...@googlegroups.com
I've been working on aggregating logs from containers and currently have chosen to grab the container logs written to on the host under /var/lib/docker/containers/*.log

This is nice since I get the JSON I can then consume with logstash via the json_lines plugin.

Another nice benefit is that I'm able to purge/rotate those logs on the host level, which address a concern I have over boundless growth noted by others.

One downside is that all the logs look "the same". By that I mean that the info in the JSON is very generic and identies only a few fiels such as the stream, message and time. The time is very nice since it produces a greater detail timestamp than some of the things I run in the container.

I don't know if there are potential issues with pulling those raw logs and would appreciate any feedback on that.

One item that I think would be nice is to add some additional data to the JSON stream that could add additional detail. For example the hostname of the container, and perhaps some other configurable fields. I was thinking that this could come from ENV variables based on prefix. For example all environment settings beginning with DOCKER_LOG could result in an additional fields in the JSON output.

For example:

DOCKER_LOG_TYPE='test app'
DOCKER_LOG_VERSION='0.77'

would end up adding the additional JSON to the output: { "type": "test app", "version": "0.77", "log": "some text" .... }

I think that would go a long way at making the raw logs easily digestible by JSON tools and provide good additional details.

\@matthias

On Monday, November 25, 2013 6:12:38 PM UTC-7, Jérôme Petazzoni wrote:
Hey list,

This is an attempt to gather ideas about improving the handling of container logs.
This is not about logs of Docker itself (see https://github.com/dotcloud/docker/issues/936 for that); we're talking about the logs generated by the containers here.

Current situation:
- nothing special is done regarding syslog (if a process tries to use standard syslog calls, that will go to /dev/null since /dev/log won't exist, unless you run syslog in the container).
- nothing special is done regarding regular log files (e.g. if you run something that writes log to /var/log/blah/blah.log, it will just stay here).
- stdout+stderr of processes running in containers are captured by Docker, and stored in a JSON format looking like this:
{"log":"Creating config file /etc/mercurial/hgrc.d/hgext.rc with new version\n","stream":"stderr","time":"2013-11-01T13:51:19.763621802-07:00"}
- those log files are stored to disk, and grow boundlessly
- logs can be consumed entirely (with "docker logs") or kind of streamed (with "docker attach"), but it's not possible to stream from a given point, or consume only parts of the logs

Ideally, we want to capture more log sources (syslog and regular files come to mind), and better ways to consume logs.

Specifically, it would be nice if we could...
1) handle log entries sent to syslog, since many unix daemons use that, and it allows to carry some extra info (facility and priority)
2) handle regular logfiles, since some programs will use that (and sometimes different log files will have different meanings, e.g. access.log and error.log)
3) store log entries in a bounded ring buffer, to make sure that logs will never fill up disk space by default
4) stream log entries, and be able to resume the stream (if the stream breaks) without losing entries

I would like to know:
- if you think that those features are needed indeed (or if some should be scrapped)
- if you think that additional features are needed for container logging
- if you are willing to work on implementing that kind of stuff

I have some ideas about how to implement this, but before, I'd like to get feedback on the general idea.

Thank you!




Jérôme Petazzoni

unread,
Jun 23, 2014, 2:37:51 PM6/23/14
to Matthias Johnson, docker-dev
Hi Matthias,

How do you deal with log rotation in that scenario?


--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Johnson

unread,
Jun 25, 2014, 10:09:37 AM6/25/14
to docke...@googlegroups.com, open...@gmail.com
Right now I'm using the follow logrotate config:

/var/lib/docker/containers/*/*.log {
  rotate 7
  daily
  compress
  delaycompress
  copytruncate
}


I suspect I could be more clever with a restart of a container as a postrotate step, but it would likely more effective if docker had a way to be HUPed to close and re-open the files.

\@matthias

Kevin Littlejohn

unread,
Jul 14, 2014, 8:09:29 AM7/14/14
to docke...@googlegroups.com
Heya,

Does anyone out there have an advance on this?  We're currently trying to get docker containers more widely used, and one of our challenges is logging - we aggregate to splunk currently, and have a splunk-forwarder container now that shares a log directory with each container, but that has some drawbacks and requires some knowledge of logging by the app developers.

Our ideal at this point would be a combination of being able to collect all "docker logs" without docker recording them to disk (that solves the "omg docker's log dir is huge" problem for long-running containers) and some other unspecified something to manage multiple logfiles inside a container without logrotate requirements (my _ideal_ here would have been a FUSE mounted from another container that takes file writes and turns them into log output, but I couldn't quite get that working).

I'm wondering if a patch to dockerd to simply allow for switching logs off would be likely to be accepted, so we can at least deal with stdout logging without risk of disk space?

KJL

Matthias Johnson

unread,
Jul 14, 2014, 10:14:34 AM7/14/14
to Kevin Littlejohn, docke...@googlegroups.com
For what it's worth, the approach I've taken appears to work well. I did have to augment the logrotate to restart the logstash-forwarder container. Since I've added that as a post-rotate step, my logs appear to have been coming in consistently as well as keeping the logs on disk from exploding.

Here is the revised logrotate config:

/var/lib/docker/containers/*/*.log {
  rotate 7
  daily
  compress
  delaycompress
  copytruncate
  sharedscripts
  postrotate
    /usr/bin/docker restart forwarder
  endscript
}

\@matthias

\@matthias




--
You received this message because you are subscribed to a topic in the Google Groups "docker-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/docker-dev/3paGTWD6xyw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to docker-dev+...@googlegroups.com.

Kevin Littlejohn

unread,
Jul 14, 2014, 4:15:04 PM7/14/14
to Matthias Johnson, docke...@googlegroups.com
Correct me if I'm wrong, but that approach relies on containerised daemons opening and seeking their logfiles (so they don't get confused if they're rotated out from under them without signal), and not generating too much to stdout (so their container log doesn't grow too much)?

I think both of those are still an issue for us, because we're dealing with containers whose contents we have little control over.

Matthias Johnson

unread,
Jul 14, 2014, 5:14:27 PM7/14/14
to Kevin Littlejohn, docke...@googlegroups.com
Not quite. In my case the containers themselves use the "docker way" of logging to stdout. I've been able to get most of my services to send their logs to stdout. When that happens docker captures the logs and stores them as JSON on the host file system under /var/lib/docker/containers/*/*.log.

The tools running inside a container are then fully oblivious to any logrotation that takes place at the host level. In other words they don't need to be HUPed or restarted.

The logstash-forwarder, which I also run in a container (though it doesn't have to) can then read those logs which I expose as a volume to that container.

\@matthias


Balazs Varga

unread,
Aug 11, 2014, 6:50:45 AM8/11/14
to docke...@googlegroups.com, ke...@littlejohn.id.au, Solomon Hykes
Hello,

Bumping up this thread regarding the ongoing work about the logging driver proposal:

@Solomon Hykes  is this in sync with your previous / ongoing work with logging?

--
Balazs Varga

Solomon Hykes

unread,
Aug 11, 2014, 9:05:53 PM8/11/14
to Balazs Varga, docker-dev, ke...@littlejohn.id.au
Yes, 7195 is where the action is for improving logging (or at least a first step in that direction). I think it's in good shape, and we can probably stop bikeshedding in the coming days and let Michael work on an implementation (and whomever wants to help him on that).

Prasanna Gautam

unread,
Nov 4, 2014, 10:26:21 PM11/4/14
to docke...@googlegroups.com, balo.s...@gmail.com, ke...@littlejohn.id.au
I'd like to help and compare with other solutions to compare any performance concerns if you can point me to where the current state of implementation is at. 
Reply all
Reply to author
Forward
0 new messages