Harvester Pc

0 views

Skip to first unread message

Timothee Cazares

unread,

Aug 3, 2024, 3:16:06 PM8/3/24

to conwomindsum

There were two adult harvesters perched on leaves of the vine on the 19th, the day of the photography workshop. When I returned on the 20th for photography, no adults were present. I went back again five days later for more photos, and the entire colony was gone. The vine was still there and looked the same, but no insects were on it at all. I searched the immediate area and a lot of the greenbrier was present, but I could not find any more aphids or harvesters. Harvester caterpillars can complete development in little more than a week, so a colony like this is a relatively fleeting phenomenon. I also looked around the original vine for any harvester pupae, with no luck.

Hey there, new to filebeats.
I'm watching a large directory at the root (/log/*/**) and am sending the logs to another team's LS that I don't control. We are seeing what looks like a major lag at time from some systems so I was trying to check some filebeats stats. First I enabled logging of the json to the logfile. Here I only see 2 harvester stats and they are both always empty:

I must be missing something. We have already tweaked the LS output to add more workers and up the bulk size but it doesn't look like that is the issue here. I'm looking to find the bottleneck and the harvester is a black hole here for me.

[20.2k/s] [20.2k/s]
[21.5k/s] [20.7k/s]
[ 22k/s] [ 21k/s]
[22.2k/s] [21.3k/s]
[23.4k/s] [21.4k/s]
[ 21k/s] [21.4k/s]
[22.8k/s] [21.3k/s]
[22.3k/s] [21.4k/s]
[21.4k/s] [21.4k/s]

I definitely do not have any such log to speak of, yet lsof shows that filebeat has opened 4000 files. Am I sending files without a harvester ? I don't think that's even possible. I even copied his log settings to be sure:

As you can see there are no references to any harvesters being spun up and the stats seem to confirm that. Yet lsof shows that all relevant filehandles are open.
It also appears that the debug option still doesn't actually turn on debug.

Q1. Are you saying you don't expect logs regarding the Harvester anymore in later versions ?
Q2. Are you saying that FB is not actually designed to handle monitoring of a large dir with a few thousand logs, some of them updating at a fairly high rate ?
Q3. Even if I split this up further, the logs are still a large blackhole if they don't give me any information about each harvester. I do need that info.

No I am not directly saying that... I just haven't been looking at it. I think filestream do not report as traditional harvestors... as I do not see harvesters with type filestream and I do with type log.

Well I am saying that a single filebeat is very unlike to process 50K EPS, yes IMHO it is a fairly thin edge shipper... if it is a BIG box you are monitoring perhaps logstash might make more sense with more pipelines...

You can use an ingest processor ion the elastic side and set the ingest timestamp...
That is your latency (careful to account for all timezones etc... all dates are stored in UTC and will use the timezone of the local system to interpret on read / write.

There are several BUs and a few thousand files under /log, depending on how many facilities each source uses. We have about 2000 log sources, some only use one facility and some 2 or 3 so you can easily double that number to know how many files we have.

However, there is another team that needs this data and they wanted us to send it with filebeat. They have a Logstash/Kafka/Nifi custom layer cake there that I don't have control over. They just gave me the LS endpoint. They do not currently have ES as an option.
So I tried to do the exact same thing with filebeat that I did with the splunk forwarder:

It looked fine until someone started noticing that some logs are sometimes way behind. We checked all timestamps already. They are all UTC. It's not always 8h. I'm told some were even more than that. When I check randomly they say "Oh right now it's fine" so it's very hard to troubleshoot.

I was told that it's probably my harvesters not catching up or the filehandles not being closed early enough but it's all wild guesses since I have little visibility into how filebeat is doing. That's why I started looking for those Harvester logs but as I mentioned there aren't any. All references I find online are for the "log" output which has been replaced by the "filestream" output. The only stats that I get are from the http option with curl. Here a current prd output:

Does this mean there are no harvesters ? The docs would have you believe that there is still a harvester spun up for each file but where are the logs ? It would have to have at least a few hundred harvesters running at any time. I see barely 50-60 filebeat threads at any given time. Max harvesters is set to 0 so unlimited.

If I understand this right, this is supposed to measure the speed at which filebeat can read files off the raw fs. Let me know if I did this wrong.
Since I cannot have more than one output at the same time, I obviously cannot add the LS target back, so this seems to only be good for measuring how fast FB can read files from the fs. No idea if 20k is good, bad or average... Is there a similar test to see at which rate fb is able to pump those 20k/s to the LS endpoint ?
Unless I can grab this stat here and do some math on it ?

If 4-7k open files is too many for filbeat to handle I'd expect a ton of errors but I see nothing but clean logs and the occasional monitor stats - that's it. This is why I'm a little reluctant to believe that filebeat cannot handle all the files. Am I not correct in assuming that there would be errors if that were the case ?

Next time they tell me that logfile xyz is lagging I'd need explicit information about that very file like harvester stuff or whatever but the "filestream" output doesn't seem to track individual files/harvesters anymore. All the other numbers are always global so it's hard to track it down to individual logs that lag when others don't.

From now I would forget about the harvester, from what I looked into the code the filestream input works differently and you will not see any log or mention about harvester, but someone from elastic needs to confirm this and provide more insight.

First, are they using persistent queue in the logstash configuration that is receiving this log? If they are, is the queue getting filled or it is almost in real time? This can be checked by looking into the queue directory and if you have more than one page file, this mean that the queue is actively being used.

From what you shared I didn't see nothing that would indicate an issue with Filebeat, and since it is not a timezone issue, it could be a backpressure issue where Logstash can not output to their destination fast enough and tell filebeat to stop sending logs for a moment.

One would be to spin-up another Logstash that you can control on another server that would just receive the logs and write it into files, this would help you test if filebeat can send the logs in real time.

You can also do this in paralell if you run another instance of filebeat using different paths for the configuration and register, for example, download the filebeat tar.gz file and unpack it on a different path, this would allow you to read the same files and output them to this another Logstash.

Leandro
Thank you. I asked the team your questions. They are working to get an ES backend going so we can enable X-Pack.
In the meantime, I'm looking for a full reference document that explains the numerous monitor stats that FB outputs. I can't find it. Tough to debug when we don't know for sure what all the counters actually mean.
I started graphing some of them and noticed that some fields are not returned very often, only infrequently.
F.e. pipeline.queue.max_events
The last time this field was returned was 12:05 and it's 13:11 right now. Am I supposed to assume that a value not returned is equal to 0 ?
A full explanation of all these values would help.
Thx

Cause I see 1000s of them sometimes and it's always the same #011. As if it's trying to close a handle and can't ? It logs a new line every few seconds. I'd understand if it was a different handle number each time but no. I'll see how long this goes on.

3rd) The first snippet you provide is per batch / the every 30s metric report
That 30s period you were doing about 10K EPS. Since we don't know what you were shipping to Console, File, Logstash it is hard to say how performant that is. 10K EPS is not trivial for Filebeat that is a solid number

And after all that, I agree with @leandrojmp, configuring another target under your control to export will the next step in understanding what is going on... in the end filebeat may not be the right approach.

Ok I have some rsyslog impstats. If I can believe them we easily get up to 60k EPS at peaktime. Less than 100K though Is 100K the limit that you propose here ? Btw, this instance has 10 cores and 124G RAM.

Look for "failed" and you'll see that it is only send 1 of the 3 times. Many other counters are just like f.e. "toomany" or "duplicates"
I can also prove that by grepping through the entire file (too large to paste here) but the stats clearly show this. If I compare a constant key like "acked" with "failed" I see it there too

Littau Harvester strives to produce and provide state-of-the-art harvesting equipment season after season. The grower knows what it takes to pick their crop and deliver it to the market in the best condition possible. It is our job to design, build, and modify our machines and parts to provide solutions to these unique and changing needs. These innovations then become a part of our core offering and we adapt them in each new revision of our industry leading harvesting equipment.

Our machines have been engineered to be durable, dependable, and highly maneuverable in every field while offering a variety of picking head options to suit many different crops and environmental conditions. It is our creed and our commitment that we will continue to provide the best harvesting equipment possible to our customers year-after-year as we aim to build the best harvesting solutions we can for the world.