Support HTTP style syslog drain URL for TCP load balance

187 views
Skip to first unread message

Ivan

unread,
Nov 25, 2013, 8:19:09 PM11/25/13
to vcap...@cloudfoundry.org

Hi,

I am thinking to create a syslog application running in Cloud Foundry, which will accept the logging records from loggregator, format the messages and forward to third-party systems. While now, the CF router could only support HTTP style request and CONNECTION/UPGRADE header style TCP load balancing. 

Could Cloud Foundry consider to support HTTP style syslog drain URL ? I mean, the loggregator could detect once the http style URL is used, those special headers could be added. So that, there will be no single node issue there.

Thanks.

James Bayer

unread,
Nov 26, 2013, 2:27:04 AM11/26/13
to vcap...@cloudfoundry.org
the "gcf logs" command with tail support already supports websocket streaming to clients. for run.pivotal.io it's something like:
wss://loggregator.run.pivotal.io:4443/tail/?app=YOURAPPGUIDHERE

heroku doesn't support http draining from what i can see.
loggly supports using their own restful api, but not a standard one: http://www.loggly.com/docs/basics-of-sending-data/

does the websocket approach work in your use case? 

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

Ivan

unread,
Nov 26, 2013, 4:16:05 AM11/26/13
to vcap...@cloudfoundry.org

Thanks, James,

One reason is that, per the previous thread [1], Tammer stated that ' However, we highly discourage direct connections to the websocket stream (it's not considered a public API).  Draining your logs to a syslog compliant endpoint is the right way to go.', so I am not sure whether Cloud Foundry has changes the strategy for this ? IIUC, webscoket is not recommended ?

Another reason is that,  I am hoping to create a logging service there, which will pull the logging records from loggregator and forward to other third-party storage system. If using webscoket way, it looks like I need to know the user name and password for the user applications, as I could see the Authorization header checking in the loggregator codes. Do you have any suggestion for this ? CF seems not support the function like : grant one service to access the application materials.

[1] https://groups.google.com/a/cloudfoundry.org/forum/#!searchin/vcap-dev/gcf$20create-user-provided-service$20my-drain-service|sort:relevance/vcap-dev/lVLLvnmXG_g/jNWvENhOT7kJ

Ivan

unread,
Nov 27, 2013, 8:02:42 AM11/27/13
to vcap...@cloudfoundry.org

 Hi, one following question, would CF consider a PR for supporting HTTP style syslog URL in the loggregator ? Thanks.

James Bayer

unread,
Nov 28, 2013, 12:20:56 AM11/28/13
to vcap...@cloudfoundry.org
well tammer is correct that we have not documented the websocket api and consider it private. we built it for the CLI tail my logs use case. if there is a great use case to consider supporting it officially we'd like to know it.

however, for your use case we recommend syslog using tcp as a push out of CF to your logging system. you can always setup something like logstash as this intermediate component, which can then forward to the system of your choice.

i would not recommend accepting a PR without understanding the use case better for an HTTP post (why syslog over TCP can't be used as in the RFC specs) and understanding which type of HTTP API format could be used that would be at all generic to the logging domain and not be specific to a vendor tool or customization.

Ivan

unread,
Nov 28, 2013, 1:07:18 AM11/28/13
to vcap...@cloudfoundry.org

Hi,

Yeah, I would like to explain a bit for this. As you know, there are lots of analytic systems here and there, and not all of them support syslog protocol, I am thinking to have a logging dapter service there, which plays the role as a adapter between loggregator and logging analytic system. For the logging adapter srevices, it is deployed in the CF as applications.

Considering that, currently, loggregator will read the syslog_drain_url (accurately, dea-logging-agent does that) configurations and send the log messages to that url. For the syslog_drain_url, it is better to use the http://domain_name, as in case one of my logging node instance crashed, the logging pushing will still work (cf router will find another running instance for that). That is why I am wondering whether loggregator could support this kind of url. (adding the connection header and upgrade header, so router could do the load balance).

Thoughts ? Thanks.

Doug McClure

unread,
Nov 28, 2013, 1:34:00 PM11/28/13
to vcap...@cloudfoundry.org
Ivan - we should work internally on this. Our log analytic system will need similar to consume loggregator's syslog drains. We're considering some sort of loggregator2logstash component in the CF area to then send out to a centralized pool of logstash instances for mediation and routing to the ultimate endpoints.  This is a simple starting point for us as we support both syslog and logstash.

Please ping me internally and we can chat on ideas and requirements. I'm pretty familiar with your side of things. We might benefit from working on a common component here.

Doug

David Laing

unread,
Nov 28, 2013, 5:45:02 PM11/28/13
to vcap-dev

My team has had some success with bundling lumberjack with our CF apps to ship logs/* to an external literary endpoint.

(eg. https://github.com/cityindex/logsearch-flowdock-bot/blob/master/start_lumberjack_shipper.rb)

We've been thinking of pulling this out into a separate "wrapper" buildpack like the multi-buildpack.

Does this technique sound of interest?

James Bayer

unread,
Nov 29, 2013, 1:28:52 AM11/29/13
to vcap...@cloudfoundry.org
david, one potential hangup for something like this is that the DEA/warden currently only monitors a single PID in the warden container. so if something happens to lumberjack, then you wouldn't necessarily have that issue surface because the system would think your app is running fine. are you concerned about that at all?

Ivan

unread,
Nov 29, 2013, 3:17:43 AM11/29/13
to vcap...@cloudfoundry.org

It is good to know other guys are also considering how to use loggregator syslog_drain_url ;-)

Hi, James, I am just feeling that, with more and more people using syslog_drain_url is that, the users are possible to develop their own small tools for analyzing their own business logs, and they would  definitely deploy the analytic applications in the CF, too. In this way, make loggregator support the http style syslog url could be a reasonable feature requrest ?

Another quick question is for web socket, I mentioned in another thread, it seems that loggregator will only push application logs to syslog_drain_urls, with websocket, I would assume that the users could also fetch the app related logs from other cf components, e.g. the access log from gorouter. So from this side, would you guys consider to make the websocket 'public'  ?

Thanks.

David Laing

unread,
Nov 29, 2013, 3:29:34 AM11/29/13
to vcap...@cloudfoundry.org
james,

Architecturally I agree that running your log shipper inside your app container isn't the right way to go.  However, in use cases where syslog isn't feasible...

Anyway, as to how to run multiple processes in a single app; I've bumped into several scenarios where this is really helpful (ssh tunnel to external service, background worker, log shipping etc).  I've solved it in two ways.

1.  The .net-buildpack has a Procfile container[1]; which runs multiple processes using forego[2]
2.  A "run in parallel" bash script [3]; set your --command to "start.sh proc1 proc2 proc3"

In both these cases warden is watching the pid of the "container" script; which is set to exit if any of its sub-processes fail.

This technique has worked pretty well for me over the last few months.

Regards

d



On Friday, 29 November 2013, James Bayer wrote:
david, one potential hangup for something like this is that the DEA/warden currently only monitors a single PID in the warden container. so if something happens to lumberjack, then you wouldn't necessarily have that issue surface because the system would think your app is running fine. are you concerned about that at all?

On Thu, Nov 28, 2013 at 2:45 PM, David Laing <da...@davidlaing.com> wrote:

My team has had some success with bundling lumberjack with our CF apps to ship logs/* to an external literary endpoint.

(eg. https://github.com/cityindex/logsearch-flowdock-bot/blob/master/start_lumberjack_shipper.rb)

We've been thinking of pulling this out into a separate "wrapper" buildpack like the multi-buildpack.

Does this technique sound of interest?

On 28 Nov 2013 18:34, "Doug McClure" <dmcc...@gmail.com> wrote:
Ivan - we should work internally on this. Our log analytic system will need similar to consume loggregator's syslog drains. We're considering some sort of loggregator2logstash component in the CF area to then send out to a centralized pool of logstash instances for mediation and routing to the ultimate endpoints.  This is a simple starting point for us as we support both syslog and logstash.

Please ping me internally and we can chat on ideas and requirements. I'm pretty familiar with your side of things. We might benefit from working on a common component here.

Doug


On Thursday, November 28, 2013 1:07:18 AM UTC-5, Ivan wrote:

Hi,

Yeah, I would like to explain a bit for this. As you know, there are lots of analytic systems here and there, and not all of them support syslog protocol, I am thinking to have a logging dapter service there, which plays the role as a adapter between loggregator and logging analytic system. For the logging adapter srevices, it is deployed in the CF as applications.

Considering that, currently, loggregator will read the syslog_drain_url (accurately, dea-logging-agent does that) configurations and send the log messages to that url. For the syslog_drain_url, it is better to use the http://domain_name, as in case one of my logging node instance crashed, the logging pushing will still work (cf router will find another running instance for that). That is why I am wondering whether loggregator could support this kind of url. (adding the connection header and upgrade header, so router could do the load balance).

Thoughts ? Thanks.

On Thursday, November 28, 2013 1:20:56 PM UTC+8, James Bayer wrote:
well tammer is correct that we have not documented the websocket api and consider it private. we built it for the CLI tail my logs use case. if there is a great use case to consider supporting it officially we'd like to know it.

however, for your use case we recommend syslog using tcp as a push out of CF to your logging system. you can always setup something like logstash as this intermediate component, which can then forward to the system of your choice.

i would not recommend accepting a PR without understanding the use case better for an HTTP post (why syslog over TCP can't be used as in the RFC specs) and understanding which type of HTTP API format could be used that would be at all generic to the logging domain and not be specific to a vendor tool or customization.

On Tue, Nov 26, 2013 at 1:16 AM, Ivan <xhh...@gmail.com> wrote:

Thanks, James,

One reason is that, per the previous thread [1], Tammer stated that ' However, we highly discourage direct connections to the websocket stream (it's not considered a public API).  Draining your logs to a syslog compliant endpoint is the right way to go.', so I am not sure whether Cloud Foundry has changes the strategy for this ? IIUC, webscoket is not recommended ?

Another reason is that,  I am hoping to create a logging service there, which will pull the logging records from loggregator and forward to other third-party storage system. If using webscoket way, it looks like I need to know the user name and password for the user applications, as

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.


--
David Laing
Open source @ City Index - github.com/cityindex
http://davidlaing.com
Twitter: @davidlaing

James Bayer

unread,
Dec 1, 2013, 2:11:55 AM12/1/13
to vcap...@cloudfoundry.org
david, those are good techniques for running multiple processes in a buildpack, thanks for sharing.

ivan, i'm not following your suggestion. david is talking about using something like this: https://github.com/elasticsearch/logstash-forwarder
inside of his buildpack to send logs someplace else running logstash. lumberjack uses a custom protocol as far as i can tell. it sounds like doug m above is interested in this space. after you guys sync up, let us know what you come up with.


Thank you,

James Bayer

Doug McClure

unread,
Dec 2, 2013, 11:23:19 PM12/2/13
to vcap...@cloudfoundry.org, da...@davidlaing.com
David,

I'm interested in learning & experimenting!  I'm very much still trying to wrap my head around all the terminology, components, etc so definitely a noob here still on the CF front.

I'm looking for the best way to plumb a given environment's loggregator feed(s) and get them all shipped outbound towards some smarter thing that gives me more flexibility for mediating the log stream (breaking into unique flows by application/service component, routing to unique destinations, archiving, etc.)  In my case, there could be hundreds of unique customer CF application/service environments within the various CF environments so I'll need to be able to handle each unique environment's logs individually, securely, reliably, etc. without any impact on performance, etc.

We make use of Logstash today as a mediation tool so my first experiment would be to find the best way to ship logs out of each unique app/service environment that allows for scale, performance, reliability, flexibility, security, etc. This might be using logstash-forwarder, logstash, rsyslog, some hybrid thing, etc. Not sure yet - many open questions like these.

Are all possible application/service and platform logs for a given unique customer CF environment available via loggregator?
Are there differences in what types of logs available via loggregator versus a syslog drain? (or are they the same thing?)
What scenarios would require having a log shipping agent (e.g. logstash-forwarder) actually within a client's CF app/service environment? (why did you add logstash-forwarder to your buildpack?)
What meta data is available for each log source/log record emitted from a given loggregator/syslog drain output? Will they all be uniquely identifiable to a specific app/service/client/group/org, etc. or will this data need to be added to a log stream externally? (e.g I need to be able to uniquely identify one type of log record from another for a given client app and act upon them differently)

What tests can we work on?

Doug
To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+unsubscribe@cloudfoundry.org.

Ivan

unread,
Dec 3, 2013, 10:03:45 AM12/3/13
to vcap...@cloudfoundry.org
I just found the link below,  does it mean that heroku also supports http drain ?
$ heroku drains:add https://logs.loggly.com/inputs/$$CUSTOMER_TOKEN$$/tag/$$TAG1,TAG2$$ --app $$HEROKU_APP_NAME$$
[1] http://www.loggly.com/g1-support/logging-from-heroku/


2013/11/26 James Bayer <jba...@gopivotal.com>



--
Ivan

James Bayer

unread,
Dec 3, 2013, 2:24:08 PM12/3/13
to vcap...@cloudfoundry.org
Reply all
Reply to author
Forward
0 new messages