Hey Jan!
You'd be correct on the multiple postings, weren't sure they were being posted.
We currently run this in production on cloudant and were hoping to have a backup utilizing the new couchdb 2.0. We are able to consistently replicate.
The memory leak happens when we kick off a new view.
beam.smp terminates on a OOM by the kernel.
Checking /var/log/syslog shows:
Jan 31 18:32:44 couchdb7 kernel: [594086.565577] Out of memory: Kill process 23731 (beam.smp) score 961 or sacrifice child
Jan 31 18:32:44 couchdb7 kernel: [594086.565622] Killed process 23773 (memsup) total-vm:4228kB, anon-rss:12kB, file-rss:0kB
Jan 31 18:32:44 couchdb7 kernel: [594086.569327] Out of memory: Kill process 23731 (beam.smp) score 961 or sacrifice child
Jan 31 18:32:44 couchdb7 kernel: [594086.569392] Killed process 23731 (beam.smp) total-vm:126594220kB, anon-rss:64708732kB, file-rss:0kB
Jan 31 18:32:56 couchdb7 monit[9113]: 'couchdb' process is not running
The couchdb.log file at the time of crash contains:
1981936-[debug] 2017-01-31T17:16:35.355774Z
cou...@couchdb7.geopoll.com <0.9036.262> -------- OS Process #Port<0.63437> Input :: ["map_doc",{"_id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","_rev":"5-b90c6c87a0a48e647528a1b3c5bfe12b","MetaData":{"PollId":"147402","Car
rierId":"25504","UserPollStateId":"
3362564708"},"UserId":"1002449829201","CreateDate":"2015-11-23T06:42:40.0285675Z","LastModifiedDate":"2015-11-23T06:43:07.5474967Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6
","CallbackUri":"
http://de-geopoll-1:8645/billingcallback","CallbackSent":true,"Activities":[{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0297329Z","State":"PROCESSING"},{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0307329Z","State":"SUCCESS"}],"Currency":"US_Dol
lar_USD","ConsumerIdentifier":"250025308","ToBeBilledIdentifier":"255763398389","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":0.11,"BillProcessingState":"SUCCESS","BillingProvider":"TRANSFERTO","NextProcessingTime":"0001-01-01T00:00:00","NextProces
singTimeAsLong":0,"Id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","CreatedDate":"2015-11-23T06:42:40.0285675Z","ModifiedDate":"2015-11-23T06:43:07.5474967Z","Type":"Bill"}]
1981937-[debug] 2017-01-31T17:16:35.355856Z
cou...@couchdb7.geopoll.com <0.11910.262> -------- OS Process #Port<0.63508> Output :: [[[["GeoPoll","8921801"],null]],[[["77802","PRETUPS"],null]],[[["77802","PRETUPS","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","SUCC
ESS","2014","03","05"],null],[["77802","ALL","SUCCESS","2014","03","05"],null],[["77802","PRETUPS","ALL","2014","03","05"],null],[["ALL","ALL","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","ALL","2014","03","05"],null],[["77802","ALL","ALL","2014","03","05"],null],
[["ALL","ALL","ALL","2014","03","05"],null]],[[["77802","2014","3","05"],null]],[["254788760292",null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","SUCCESS","2014-03-05T12:48:59.5664722Z"],null
]],[[["PRETUPS","25402","SUCCESS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS"],null]],[["254788760292",null]],[["1000374925501",null]],[[[2014,3,5,"PRE
TUPS","SUCCESS"],null]]]
1981938-[debug] 2017-01-31T17:16:35.356012Z
cou...@couchdb7.geopoll.com <0.9036.262> -------- OS Process #Port<0.63437> Output :: [[[["147402","TRANSFERTO","SUCCESS"],null]],[[["TRANSFERTO","SUCCESS","2015-11-23T06:43:07.5474967Z"],null]],[[["TRANSFERTO","SUCCESS","0001
-01-01T00:00:00"],null]]]
1981939-[debug] 2017-01-31T17:16:35.356108Z
cou...@couchdb7.geopoll.com <0.11910.262> -------- OS Process #Port<0.63508> Input :: ["map_doc",{"_id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","_rev":"3-832e63f45b45d5e3008b7e7bbe2b7392","MetaData":{"PollId":"77802","CarrierId":"25402","UserPollStateId":"3256532401","CarrierName":"Airtel-Kenya","Pretups.Version":"5.1","Pretups.Uri":"
https://41.223.56.108:8093/pretups/C2SReceiver","Auth.Login":"pretups","Auth.Password":"0971500a350af5c3d1c0b12221a0558c","Auth.GatewayCode":"EXTGW","Auth.GatewayType":"EXTGW","Auth.ServicePort":"190","Auth.SourceType":"EXT","Cmd.ExtNwCode":"KE","Cmd.Msisdn":"732810086","Cmd.Pin":"2549","Cmd.Login":"","Cmd.Password":"","Cmd.ExtCode":"2468","CountryCode":"254","MobilePhoneLength":"9","TestMobileNumber":"254733621719","Currency":"KES"},"UserId":"1000277123401","CreateDate":"2014-03-05T13:45:49.6889321Z","LastModifiedDate":"2014-03-05T13:46:14.8050931Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CallbackUri":"
http://uk-app-3:8645/billingcallback","Activities":[{"CreateDate":"2014-03-05T13:46:14.2902898Z","State":"PROCESSING"},{"MetaData":{"Type":"EXRCTRFRESP","Txnid":"R140305.1648.210003","Txnstatus":"200","Date":"05/03/2014 16:48:40","Extrefnum":"","Data":null},"CreateDate":"2014-03-05T13:46:14.2912898Z","State":"SUCCESS"}],"Currency":"Kenyan_Shilling_KES","ConsumerIdentifier":"8963201","ToBeBilledIdentifier":"254735960469","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":43.0,"BillProcessingState":"SUCCESS","BillingProvider":"PRETUPS","NextProcessingTime":"0001-01-01T00:00:00","NextProcessingTimeAsLong":0,"Id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CreatedDate":"2014-03-05T13:45:49.6889321Z","ModifiedDate":"2014-03-05T13:46:14.8050931Z","Type":"Bill"}]
1981940:[debug] 2017-01-31T17:32:57.300061Z
cou...@couchdb7.geopoll.com <0.111.0> -------- Supervisor couch_log_sup started couch_log_monitor:start_link() at pid <0.114.0>
1981941:[debug] 2017-01-31T17:32:57.301585Z
cou...@couchdb7.geopoll.com <0.111.0> -------- Supervisor couch_log_sup started config_listener_mon:start_link(couch_log_sup, nil) at pid <0.115.0>
1981942:[info] 2017-01-31T17:32:57.301605Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application couch_log started on node '
cou...@couchdb7.geopoll.com'
1981943:[debug] 2017-01-31T17:32:57.302447Z
cou...@couchdb7.geopoll.com <0.119.0> -------- Supervisor folsom_sup started folsom_sample_slide_sup:start_link() at pid <0.120.0>
1981944:[debug] 2017-01-31T17:32:57.303229Z
cou...@couchdb7.geopoll.com <0.119.0> -------- Supervisor folsom_sup started folsom_meter_timer_server:start_link() at pid <0.121.0>
1981945:[debug] 2017-01-31T17:32:57.303979Z
cou...@couchdb7.geopoll.com <0.119.0> -------- Supervisor folsom_sup started folsom_metrics_histogram_ets:start_link() at pid <0.122.0>
1981946:[info] 2017-01-31T17:32:57.304074Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application folsom started on node '
cou...@couchdb7.geopoll.com'
1981947:[debug] 2017-01-31T17:32:57.325716Z
cou...@couchdb7.geopoll.com <0.126.0> -------- Supervisor couch_stats_sup started couch_stats_aggregator:start_link() at pid <0.127.0>
1981948:[debug] 2017-01-31T17:32:57.326519Z
cou...@couchdb7.geopoll.com <0.126.0> -------- Supervisor couch_stats_sup started couch_stats_process_tracker:start_link() at pid <0.177.0>
1981949:[info] 2017-01-31T17:32:57.326595Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application couch_stats started on node '
cou...@couchdb7.geopoll.com'
1981950:[info] 2017-01-31T17:32:57.326673Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application khash started on node '
cou...@couchdb7.geopoll.com'
1981951:[debug] 2017-01-31T17:32:57.330327Z
cou...@couchdb7.geopoll.com <0.182.0> -------- Supervisor couch_event_sup2 started couch_event_server:start_link() at pid <0.183.0>
1981952:[debug] 2017-01-31T17:32:57.331211Z
cou...@couchdb7.geopoll.com <0.185.0> -------- Supervisor couch_event_os_sup started config_listener_mon:start_link(couch_event_os_sup, nil) at pid <0.186.0>
1981953:[debug] 2017-01-31T17:32:57.331268Z
cou...@couchdb7.geopoll.com <0.182.0> -------- Supervisor couch_event_sup2 started couch_event_os_sup:start_link() at pid <0.185.0>
1981954:[info] 2017-01-31T17:32:57.331367Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application couch_event started on node '
cou...@couchdb7.geopoll.com'
1981955:[debug] 2017-01-31T17:32:57.334167Z
cou...@couchdb7.geopoll.com <0.190.0> -------- Supervisor ibrowse_sup started ibrowse:start_link() at pid <0.191.0>
1981956:[info] 2017-01-31T17:32:57.334239Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application ibrowse started on node '
cou...@couchdb7.geopoll.com'
1981957:[debug] 2017-01-31T17:32:57.335727Z
cou...@couchdb7.geopoll.com <0.196.0> -------- Supervisor ioq_sup started config_listener_mon:start_link(ioq_sup, nil) at pid <0.197.0>
1981958:[debug] 2017-01-31T17:32:57.336685Z
cou...@couchdb7.geopoll.com <0.196.0> -------- Supervisor ioq_sup started ioq:start_link() at pid <0.198.0>
1981959:[info] 2017-01-31T17:32:57.336756Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application ioq started on node '
cou...@couchdb7.geopoll.com'
1981960:[info] 2017-01-31T17:32:57.336829Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application mochiweb started on node '
cou...@couchdb7.geopoll.com'
1981961:[info] 2017-01-31T17:32:57.336899Z
cou...@couchdb7.geopoll.com <0.7.0> -------- Application oauth started on node '
cou...@couchdb7.geopoll.com'
1981962:[info] 2017-01-31T17:32:57.340965Z
cou...@couchdb7.geopoll.com <0.204.0> -------- Apache CouchDB 2.0.0 is starting.
For the Large database it would happen when we kicked off 1 out the 39 views on the database, however on the smaller database I would have to kick off all 5 views within the database.
The large database has 9 design documents, with the smaller database having only 1.
The views are all JS.
Other than Fail2Ban, UFW, Logwatch, LogRotate, Monit and Zabbix-Agent there is nothing else running on the server. Except when we build it with Dreyfus and Clouseau.
Example of one of the larger Design documents:
{
"_id": "_design/bills",
"_rev": "4-b0ed6cf8f871391add5004f7e67bc3a8",
"language": "javascript",
"auto_update": true,
"views": {
"by_bill_date_and_bill_provider": {
"map": "function(doc) {\n if (doc._id.indexOf(\"bill-\") === 0){\n var date = new Date(doc.CreatedDate?doc.CreatedDate:doc.CreateDate);\n var year = date.getFullYear();\n var month = (date.getMonth() + 1);\n var day = date.getDate();\n emit([year, month, day, doc.BillingProvider, doc.BillProcessingState], null);\n }\n}",
"reduce": "_count"
},
"by_poll_id_and_bill_date": {
"map": "function(doc) {\n if ((doc._id.indexOf(\"bill-\") === 0) && doc.MetaData.PollId){\n var date = new Date(doc.CreateDate);\n var year = date.getFullYear().toString();\n var month = (date.getMonth() + 1).toString();\n var day = date.getDate().toString();\n if (day.length == 1){\n day = \"0\" + day;\n }\n\n emit([doc.MetaData.PollId, year, month, day], null);\n }\n}",
"reduce": "_count"
},
}
}
Example of a doc within the larger database:
{
"_id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
"_rev": "5-b40e00a54059c6c79004c0afd584fc60",
"MetaData": {
"PollId": "1844608",
"CarrierId": "2701",
"UserPollStateId": "12614468108"
},
"UserId": "1002196088104",
"CreateDate": "2017-01-31T07:20:58",
"LastModifiedDate": "2017-01-31T07:21:14.2473555Z",
"SystemSource": "GeoPoll",
"AttemptCount": 1,
"BillingIdentifier": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
"CallbackUri": "
http://XXXXXXXXXXX:8645/billingcallback",
"CallbackSent": true,
"Activities": [
{
"MetaData": {},
"CreateDate": "2017-01-31T07:21:11.182049Z",
"State": "PROCESSING"
},
{
"MetaData": {
"VoucherPin": "",
"OrderRef": "113234210",
"TicketNumber": "",
"BoxNumber": "",
"BatchNumber": "",
"ProcessingTime": "3064.3064"
},
"CreateDate": "2017-01-31T07:21:11.1820491Z",
"State": "SUCCESS"
}
],
"Currency": "South_African_Rand_ZAR",
"ConsumerIdentifier": "XXXXXXXXXXXX",
"ToBeBilledIdentifier": "XXXXXXXXXXXX",
"BillType": "Carrier",
"BillProcessingStateAsString": "SUCCESS",
"Value": 2,
"BillProcessingState": "SUCCESS",
"BillingProvider": "VODACOMSA",
"NextProcessingTime": "0001-01-01T00:00:00",
"NextProcessingTimeAsLong": 0,
"FinalProcessingTime": 0,
"LastSubmittedDate": "0001-01-01T00:00:00",
"Id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
"CreatedDate": "2017-01-31T07:20:58",
"ModifiedDate": "2017-01-31T07:21:14.2473555Z",
"Type": "Bill"
}
Docs usually go through 4-5 updates before they are finalized.
Within the larger database we have 16,201,998 docs totaling 23 GB. No attachments.
No other traffic besides a single user (me), including replication.
No other patterns that stand out (to me at least). The memory usage grows and grows before eventually consuming the Swap space and running into a OOM kill.
The other 11 nodes are affected.
Thanks for your assistance!!
-Tayven
________________________________
From: Jan Lehnardt <
j...@apache.org>
Sent: Tuesday, January 31, 2017 4:38 AM
To:
us...@couchdb.apache.org
Cc: Tayven Bigelow; Nick Becker
Subject: Re: Crashing due to memory use
Performance - Couchdb Wiki<
https://wiki.apache.org/couchdb/Performance>
wiki.apache.org
With up to tens of thousands of documents you will generally find CouchDB to perform well no matter how you write your code. Once you start getting into ...
>
>
> Nothing out of the ordinary is thrown in the logs. The only way to catch it is by watching memory use.
>
>
> I'm wondering if theres a configuration/setting somewhere that I am missing that could be causing this issue.
>
>
> Thanks!
>
> Tayven
>
>
>
> All information in this message is confidential and may be legally privileged. If you are not the intended recipient, notify the sender immediately and destroy this email.
--
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/
Professional Support for Apache CouchDBâ„¢ - Neighbourhood<
https://neighbourhood.ie/couchdb-support/>
neighbourhood.ie
Apache CouchDB is the first choice for geographically distributed database solutions. From cross data-centre clusters to offline-first mobile and web solutions ...
Email:
cou...@neighbourhood.ie