too many open files when adding file watcher, but limit is 1M

1,589 views
Skip to first unread message

g...@tanzenmb.com

unread,
Apr 5, 2018, 1:18:55 PM4/5/18
to Prometheus Users
Hi all - 

I'm trying to use 2 file watchers to monitor my application.  One hits the application's endpoint the other hits a node_exporter endpoint.

My application is running in a docker container running on Mesos/Marathon.  It has multiple instances, the IPs of which I can get via an API call (so I can monitor the individual containers).  I have a python script that makes the API call and writes out the IP's/ports to two different JSON files (one for app, one for node_exporter).

On startup, I get a "too many open files" error.  Unlike most of the other cases I've seen, my limits appear to be plenty high.

This is the entire log on startup (I do have --log.level=debug but no debug messages show up)

level=info ts=2018-04-05T17:04:58.26146891Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)"
level=info ts=2018-04-05T17:04:58.261570154Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)"
level=info ts=2018-04-05T17:04:58.261600951Z caller=main.go:222 host_details="(Linux 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Dec 28 14:23:39 EST 2017 x86_64 9793c3ecf950 (none))"
level=info ts=2018-04-05T17:04:58.261627063Z caller=main.go:223 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-04-05T17:04:58.35498776Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-04-05T17:04:58.355060632Z caller=web.go:382 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-04-05T17:04:58.454805243Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-04-05T17:04:58.454869605Z caller=main.go:588 msg="Loading configuration file" filename=/app/prometheus.yml
level=info ts=2018-04-05T17:04:58.455488363Z caller=main.go:491 msg="Server is ready to receive web requests."
level=error ts=2018-04-05T17:04:58.455502888Z caller=file.go:230 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
level=error ts=2018-04-05T17:04:58.455525562Z caller=file.go:230 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"


prometheus.yml, app name redacted to <myapp>:
global:
  scrape_interval:     15s
  evaluation_interval: 15s
  
  external_labels:
    monitor: '<myapp>-monitor'

rule_files:

scrape_configs:
  - job_name: '<myapp>-apps'
    scrape_interval: 15s
    file_sd_configs:
      - files:
        - 'targets/apps.json'
        refresh_interval: 2m

  - job_name: '<myapp>-nodes'
    scrape_interval: 15s
    file_sd_configs:
      - files:
        - 'targets/nodes.json'
        refresh_interval: 2m

  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: 
        - 'localhost:9090'
        - 'localhost:9100'

Brian Brazil

unread,
Apr 5, 2018, 1:22:16 PM4/5/18
to g...@tanzenmb.com, Prometheus Users
On 5 April 2018 at 18:18, <g...@tanzenmb.com> wrote:
Hi all - 

I'm trying to use 2 file watchers to monitor my application.  One hits the application's endpoint the other hits a node_exporter endpoint.

My application is running in a docker container running on Mesos/Marathon.  It has multiple instances, the IPs of which I can get via an API call (so I can monitor the individual containers).  I have a python script that makes the API call and writes out the IP's/ports to two different JSON files (one for app, one for node_exporter).

Linux does have an inotify watch limit, so you must have hit that. I'd check what else on your system is using inotify.

Brian
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/30f2f6ca-7b3f-4d88-befa-040c00c8264b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

g...@tanzenmb.com

unread,
Apr 5, 2018, 4:43:58 PM4/5/18
to Prometheus Users
right before kicking off app, I ran:

cat /proc/sys/fs/inotify/max_user_watches - output 8192
lsof | grep -i inotify | wc -l - output 0

so I'm not sure that prometheus would use up 8k watchers on startup.  Again, this in a container.  The only things running are a python script to update the target files (I tried not running that with static files, same thing), node_exporter and prometheus
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

Eric King

unread,
Nov 8, 2018, 3:43:12 PM11/8/18
to Prometheus Users
Did you ever sort this out?

eric_...@homedepot.com

unread,
Nov 27, 2018, 9:57:50 AM11/27/18
to Prometheus Users
Max user instances was defaulted to just 128 on my RHEL machines, updating to fs.inotify.max_user_instances=65000 fixed the issue for me.
Reply all
Reply to author
Forward
0 new messages