Textfile Collector reading only 1 prom file

587 views
Skip to first unread message

Bibin John

unread,
Feb 28, 2020, 3:24:45 AM2/28/20
to Prometheus Users

Hi All,

I have configured node exporter with text file collector. But it is reading metrics from only 1 prom file. could you please help?

Host operating system: output of uname -a

Linux YYYY 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version (branch: , revision: )
build user:
build date:
go version: go1.13.6

node_exporter command line flags

node_exporter --web.listen-address=:9100 --collector.textfile.directory=/home/nodeexp/db --log.level=debug --no-collector.arp --no-collector.bcache --no-collector.bonding --no-collector.conntrack --no-collector.cpu --no-collector.cpufreq --no-collector.diskstats --no-collector.edac --no-collector.entropy --no-collector.filefd --no-collector.filesystem --no-collector.hwmon --no-collector.infiniband --no-collector.ipvs --no-collector.loadavg --no-collector.mdadm --no-collector.meminfo --no-collector.netclass --no-collector.netdev --no-collector.netstat --no-collector.nfs --no-collector.nfsd --no-collector.powersupplyclass --no-collector.pressure --no-collector.schedstat --no-collector.sockstat --no-collector.softnet --no-collector.stat --no-collector.thermal_zone --no-collector.time --no-collector.timex --no-collector.uname --no-collector.vmstat --no-collector.xfs --no-collector.zfs

Are you running node_exporter in Docker?

No

What did you do that produced an error?

i have copied multiple files with .prom as extension in /home/nodeexp/db dir and node_exporter is reading from only 1 file

What did you expect to see?

As per documentation, it can read from all files with extension .prom

What did you see instead?

it is reading from only 1 file.

 

Brian Candler

unread,
Feb 28, 2020, 3:52:26 AM2/28/20
to Prometheus Users
Look at the stderr output from node_exporter.  My guess is that one of the metrics is in an invalid format; if so, textfile_collector will report and abandon the rest of the file (maybe the rest of the directory - I haven't tested this)

Another possibility is permissions on the files.

You may also get more clues from strace:

strace -f -p <pid-of-node-exporter>

Look for accesses to /home/nodeexp/db/

Bibin John

unread,
Feb 29, 2020, 1:30:58 PM2/29/20
to Prometheus Users
both file have same permissions.

/home/nodeexp/db $ ls -ltr
total 8
-rwxr-xr-x 1 user user 400 Feb 29 13:13 1.prom
-rwxr-xr-x 1 user user 396 Feb 29 13:14 2.prom

Please find content of files

/home/nodeexp/db $ cat 1.prom
kafka_topic_cluster_last_update{cluster="test", ts="1582844402760"} 1582844402760
kafka_topic_rf{cluster="test", atopic="APPC-LCM-READ-REGRESSION-1848"} 3
kafka_topic_partitioncount{cluster="test", atopic="APPC-LCM-READ-REGRESSION-1848"} 8
kafka_topic_details{cluster="test", atopic="APPC-LCM-READ-REGRESSION-1848", bpartition="0", cleader="2", dreplicas="2,3,1", eisr="1,2,3", ts="1582844402760"} 3

/home/nodeexp/db $ cat 2.prom
kafka_topic_cluster_last_update{cluster="dev", ts="1582844402760"} 1582844402760
kafka_topic_rf{cluster="dev", atopic="APPC-LCM-READ-REGRESSION-1848"} 3
kafka_topic_partitioncount{cluster="dev", atopic="APPC-LCM-READ-REGRESSION-1848"} 8
kafka_topic_details{cluster="dev", atopic="APPC-LCM-READ-REGRESSION-1848", bpartition="0", cleader="2", dreplicas="2,3,1", eisr="1,2,3", ts="1582844402760"} 3
/opt/data/tools/node_exporter/data $


Data from nodeexporter. this shows, it tried to read from both prom files but shows content of only 1 file.

# HELP kafka_topic_cluster_last_update Metric read from /home/nodeexp/db/1.prom
# TYPE kafka_topic_cluster_last_update untyped
kafka_topic_cluster_last_update{cluster="test",ts="1582844402760"} 1.58284440276e+12
# HELP kafka_topic_details Metric read from /home/nodeexp/db/1.prom
# TYPE kafka_topic_details untyped
kafka_topic_details{atopic="APPC-LCM-READ-REGRESSION-1848",bpartition="0",cleader="2",cluster="test",dreplicas="2,3,1",eisr="1,2,3",ts="1582844402760"} 3
# HELP kafka_topic_partitioncount Metric read from /home/nodeexp/db/1.prom
# TYPE kafka_topic_partitioncount untyped
kafka_topic_partitioncount{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="test"} 8
# HELP kafka_topic_rf Metric read from /home/nodeexp/db/1.prom
# TYPE kafka_topic_rf untyped
kafka_topic_rf{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="test"} 3
# HELP node_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which node_exporter was built.
# TYPE node_exporter_build_info gauge
node_exporter_build_info{branch="",goversion="go1.13.6",revision="",version=""} 1
# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
node_scrape_collector_duration_seconds{collector="textfile"} 0.000220073
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
node_scrape_collector_success{collector="textfile"} 1
# HELP node_textfile_mtime_seconds Unixtime mtime of textfiles successfully read.
# TYPE node_textfile_mtime_seconds gauge
node_textfile_mtime_seconds{file=".prom"} 1.580203921e+09
node_textfile_mtime_seconds{file="1.prom"} 1.583000029e+09
node_textfile_mtime_seconds{file="2.prom"} 1.583000042e+09
# HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE node_textfile_scrape_error gauge
node_textfile_scrape_error 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.03
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 200000
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 9.89184e+06
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58300043655e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.21330176e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 5
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 5
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Bibin John

unread,
Feb 29, 2020, 1:37:38 PM2/29/20
to Prometheus Users
this is the strace output

[pid 33318] nanosleep({0, 20000}, NULL) = 0
[pid 33318] nanosleep({0, 20000}, NULL) = 0
[pid 33318] nanosleep({0, 20000}, NULL) = 0
[pid 33318] nanosleep({0, 20000},  <unfinished ...>
[pid 33321] write(4, "HTTP/1.1 200 OK\r\nContent-Encodin"..., 2182 <unfinished ...>
[pid 33318] <... nanosleep resumed> NULL) = 0
[pid 33321] <... write resumed> )       = 2182
[pid 33318] nanosleep({0, 20000},  <unfinished ...>
[pid 33321] futex(0x1073648, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 33317] <... futex resumed> )       = 0
[pid 33317] nanosleep({0, 3000},  <unfinished ...>
[pid 33321] read(4, 0xc000204000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid 33318] <... nanosleep resumed> NULL) = 0
[pid 33321] futex(0xc0000be148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 33318] nanosleep({0, 20000},  <unfinished ...>
[pid 33317] <... nanosleep resumed> NULL) = 0
[pid 33317] futex(0x1073648, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 33318] <... nanosleep resumed> NULL) = 0
[pid 33318] futex(0x1072dd0, FUTEX_WAIT_PRIVATE, 0, {60, 0} <unfinished ...>
[pid 42464] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 42464] futex(0x1072dd0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 33318] <... futex resumed> )       = 0
[pid 42464] futex(0xc00055e148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 33318] nanosleep({0, 20000}, NULL) = 0
[pid 33318] futex(0x1072dd0, FUTEX_WAIT_PRIVATE, 0, {60, 0}

Brian Candler

unread,
Feb 29, 2020, 5:13:13 PM2/29/20
to Prometheus Users
I can replicate your problem here, when creating 1.prom and 2.prom

However, if I concatenate the two files into one, it works; I also note that the metrics with the same metric name are grouped together under the same heading (even though they weren't adjacent in the source).

# HELP kafka_topic_cluster_last_update Metric read from /tmp/prom/all.prom
# TYPE kafka_topic_cluster_last_update untyped
kafka_topic_cluster_last_update{cluster="dev",ts="1582844402760"} 1.58284440276e+12
kafka_topic_cluster_last_update{cluster="test",ts="1582844402760"} 1.58284440276e+12
# HELP kafka_topic_details Metric read from /tmp/prom/all.prom
# TYPE kafka_topic_details untyped
kafka_topic_details{atopic="APPC-LCM-READ-REGRESSION-1848",bpartition="0",cleader="2",cluster="dev",dreplicas="2,3,1",eisr="1,2,3",ts="1582844402760"} 3
kafka_topic_details{atopic="APPC-LCM-READ-REGRESSION-1848",bpartition="0",cleader="2",cluster="test",dreplicas="2,3,1",eisr="1,2,3",ts="1582844402760"} 3
# HELP kafka_topic_partitioncount Metric read from /tmp/prom/all.prom
# TYPE kafka_topic_partitioncount untyped
kafka_topic_partitioncount{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="dev"} 8
kafka_topic_partitioncount{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="test"} 8
# HELP kafka_topic_rf Metric read from /tmp/prom/all.prom
# TYPE kafka_topic_rf untyped
kafka_topic_rf{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="dev"} 3
kafka_topic_rf{atopic="APPC-LCM-READ-REGRESSION-1848",cluster="test"} 3

I can only hypothesise that this grouping only works when all the metrics with a given metric name are in the same file, and that textfile collector doesn't support using the same metric name in two different files.  But I couldn't find such a limitation documented anywhere.

Bibin John

unread,
Mar 1, 2020, 12:08:35 AM3/1/20
to Prometheus Users
Thanks Brian for your help on this. i tested files with different metric name and it worked.
Reply all
Reply to author
Forward
0 new messages