Warden drop packect unexpectedly

92 views
Skip to first unread message

lv...@huawei.com

unread,
Dec 4, 2013, 3:50:35 AM12/4/13
to vcap...@cloudfoundry.org
I want to get some performence data of router as so interested in go version of CF v141. When I tied to send traffic 3000 connections per second, the traffic began to fail fast in 2min.
The app pushed to warden was a simple jsp server, assigned 2G mem. I found the errors as below, guess that the problem maybe on warden as the "listendrops" of warden has increased seriously, i think that caused by resouce limit or network parameters, and i tried to set [somaxconn] to 20480 and [tcp_max_syn_backlog] to 819200 for warden, but did not help matters.
Maybe my thought was wrong. What is the problem?
 
Gorouter log:
{"timestamp":1385832421.657804251,"process_id":25048,"source":"router.proxy.request-handler","log_level":"warn","message":"proxy.endpoint.failed","data":{"Error":"read tcp 60.60.60.244:61003: connection reset by peer","Host":"cf2bak.mutilcloud.com","RemoteAddr":"10.67.142.47:17747","RouteEndpoint":{"ApplicationId":"cd7eb745-1f95-489a-9483-1132a26022b6","Host":"60.60.60.244","Port":61003,"Tags":{"component":"dea-1"}},"X-Forwarded-For":null,"X-Forwarded-Proto":null}}
{"timestamp":1385832421.657999754,"process_id":25048,"source":"router.proxy.request-handler","log_level":"warn","message":"502 Bad Gateway: Registered endpoint failed to handle the request.","data":{"Error":"read tcp 60.60.60.244:61003: connection reset by peer","Host":"cf2bak.mutilcloud.com","RemoteAddr":"10.67.142.47:17747","RouteEndpoint":{"ApplicationId":"cd7eb745-1f95-489a-9483-1132a26022b6","Host":"60.60.60.244","Port":61003,"Tags":{"component":"dea-1"}},"X-Forwarded-For":null,"X-Forwarded-Proto":null}}

DEA log:
Dec  4 15:08:00 localhost kernel: [623337.939595] TCP: Possible SYN flooding on port 61011. Sending cookies.
Dec  4 15:08:00 localhost kernel: [623337.939607] TCP: Possible SYN flooding on port 61011. Sending cookies.
Dec  4 15:08:05 localhost kernel: [623342.954539] net_ratelimit: 1726 callbacks suppressed

Warden:
root@17ckfek1u0c:~# cat /proc/net/netstat | awk '/TcpExt/ { print $21,$22 }'
ListenOverflows ListenDrops
35427 35427

James Bayer

unread,
Dec 4, 2013, 10:03:38 AM12/4/13
to vcap...@cloudfoundry.org
thanks for this report. i've added a chore to the icebox for runtime to investigate [1]. mark kropf, the pm for this team, will consider prioritizing it against their other priorities. are you trying to find the breaking point for cloud foundry in terms of max performance (there will always be some limiting resource) or are you trying to meet a certain known performance goals? 



To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

lv...@huawei.com

unread,
Dec 6, 2013, 4:52:03 AM12/6/13
to vcap...@cloudfoundry.org
Hi James
sorry for the later relpy. i just strated to study cloud foundry, doing some analysis on PaaS. As your blog wrote High performance dynamic routing in Enterprise[1]. i want to learn more about performance of CF components, such as router, dea, warden... can achieve how much of the performance under certain resource limits. will you show me so some capacity info or data, based on the results, i can well do a plan to deploy inner CF PaaS. 

Thank you,
Lory


在 2013年12月4日星期三UTC+8下午11时03分38秒,James Bayer写道:

James Bayer

unread,
Dec 7, 2013, 2:13:08 AM12/7/13
to vcap...@cloudfoundry.org
lory,

here is what alex suraci from the runtime team wrote back:

We are not using the bandwidth limit features but I believe we set limits on things like file descriptors (16k) and # of processes (512). With a barrage of requests they could be hitting the FD limit.

again, if you have a particular scaling load and use case in mind, we would like to understand it. we know there are going to be limits and breaking points to the system somewhere depending on how heavily it's stressed. if your test harness is public, please share the details with us.

we have load tests that we execute as part of our integration tests and expect basic web performance to work well. we also have users that have been big loads on cloud foundry in production. here is an example from last year's comic relief, a UK charity, payment system that cloud foundry handled well [1]. of course we can always improve and we welcome any specific feedback or contributions that help improve performance.

A year of planning for Red Nose Day 2013 resulted in a 6 week media campaign culminating in 7 hours of prime time television coverage on the 15th March. Their donations platform is required to process in the region of 600,000 transactions in this 7 hour period, handle in excess of 10,000 concurrent call centre operators, and handle peaks of 500 donations completing per second from web, mobile and call centers. 

Reply all
Reply to author
Forward
0 new messages