Ipv4 Extractor

0 views

Skip to first unread message

Brunilda Chestnut

unread,

Aug 5, 2024, 3:32:05 AM8/5/24

to outconcasa

HiAll,

Im having problems getting certain items into fields when searching for messages.

My extractor that I am using is either Toms from his videos or some I found in the MarketPlace and when I try to validate the regular expression it fails.

For example, taken from the marketplace: ^filterlog:\s+.,(inout),4,.,tcp,.*$

I get the error Does not match! Extractor would not run.

@LTS_Tom your video was/is a great starter video to this. I have around 50 or so devices pointing to it with 4 cores and 8GB of memory. Absolutly no issues. Future replacement for Splunk?!? Lets find out.

NETWIRE is a Remote Access Tool (RAT) that has been used since at least 2014. It is a publicly available commodity malware and has been observed being used by financially motivated and nation-state actors.

In the second half of 2022, we noticed an uptick in the prevalence of NETWIRE usage in our telemetry data. This prompted the Elastic Security Labs team to develop a configuration extractor to assist the security community in collecting atomic indicators within the configurations. Using this extractor will support threat tracking and improve detection, prevention, and response times.

Our initially collected batch of samples came as a mixture of executable files and memory dumps. The extractor will only work on unmapped files, so the dumps which were already mapped were run through pe_unmapper.

Syslog (RFC3164, RFC5424) is the de factostandard logging protocol since the 1980s and was originally developed as part of the sendmail project. It comes with someannoying shortcomings that we tried to improve in GELF for application logging.

We decided not to write custom message inputs and parsers for all those thousands of devices, formats, firmwares andconfiguration parameters out there but came up with the concept of Extractors introduced in the v0.20.0 series of Graylog.

The extractors allow you to instruct Graylog nodes about how to extract data from any text in the receivedmessage (no matter from which format or if an already extracted field) to message fields. You may already know whystructuring data into fields is important if you are using Graylog: There are a lot of analysis possibilities withfull text searches but the real power of log analytics unveils when you can run queries likehttp_response_code:>=500 AND user_id:9001 to get all internal server errors that were triggered by a specific user.

Creating extractors is possible via either Graylog REST API calls or from the web interface using a wizard. Selecta message input on the System -> Inputs page and hit Manage extractors in the actions menu. The wizard allowsyou to load a message to test your extractor configuration against. You can extract data using for example regularexpressions, Grok patterns, substrings, or even by splitting the message into tokens by separator characters.The wizard looks like this and should be pretty intuitive:

You can also choose to apply so called converters on the extracted value to for example convert a string consistingof numbers to an integer or double value (important for range searches later), anonymize IP addresses, lower-/uppercase astring, build a hash value, and much more.

The recommended way of importing extractors in Graylog is using Content packs. TheGraylog Marketplace provides access to many content packs that you can easilydownload and import into your Graylog setup.

You can still import extractors from JSON if you want to. Just copy the JSON extractor export into the import dialogof a message input of the fitting type (every extractor set entry in the directory tells you what type of input tospawn, e. g. syslog, GELF, or Raw/plaintext) and you are good to go. The next messages coming in should alreadyinclude the extracted fields with possibly converted values.

However, one key question that is often raised is matching a string in case insensitive manner. Java regular expressionsare case sensitive by default. Certain flags, such as the one to ignore case sensitivity can either be set in the code,or as an inline flag in the regular expression.

Most of the other flags supported by Java are rarely used in the context of matching stream rules or extractors, but ifyou need them their use is documented on the same Javadoc page by Oracle. One common reason to use regular expression flagsin your regular expression is to make use of what is called non-capturing groups. Those are parentheses which only groupalternatives, but do not make Graylog extract the data they match and are indicated by (?:).

Simple regular expressions are often sufficient to extract a single word or number from a log line, but if you know the entirestructure of a line beforehand, for example for an access log, or the format of a firewall log, using Grok is advantageous.

This will add the relevant extracted fields to our log message, allowing Graylog to search on those individual fields, whichcan lead to more effective search queries by allowing to specifically look for packets that came from a specific source IPinstead of also matching destination IPs if one would only search for the IP across all fields.

If the Grok pattern creates many fields, which can happen if you make use of heavily nested patterns, you can tell Graylog to skipcertain fields (and the output of their subpatterns) by naming a field with the special keyword UNWANTED.

However, this would create three fields named type, bytes, and errors. Even not naming the first and last patterns wouldstill create a field names BASE10NUM. In order to ignore fields, but still require matching them use UNWANTED:

In the create extractor page, you can also customize how to separate list of elements, keys, and key/values. It is also possibleto flatten JSON structures or expand them into multiple fields, as shown in the example above.

You can use one or more non-capturing groups to specify the alternatives of the field names, but still be able to extract the aparentheses group in the regular expression. Remember that Graylog will extract data from the first matched group of the regularexpression. An example of a regular expression matching the destination IP field of all those log messages from above is:

Note that the flexible date converter is using UTC as time zone by default unless you have time zone information in the parsed textor have configured another time zone when adding the flexible date converter to an extractor (see this comprehensive list of time zonesavailable for the flexible date converter).

The PyPCAPKit project is an open source Python program focuson PCAP parsing and analysis, which works as a stream PCAP file extractor.With support of DictDumper, it shall support multipleoutput report formats.

Unlike popular PCAP file extractors, such as Scapy,dpkt, PyShark, and etc, pcapkit usesstreaming strategy to read input files. That is to read frame by frame,decrease occupation on memory, as well as enhance efficiency in some way.

Besides, due to complexity of pcapkit, its extraction procedure takesaround 0.0009 seconds per packet, which is not ideal enough. Thuspcapkit introduced alternative extractionengines to accelerate thisprocedure. By now pcapkit supports Scapy, DPKT, and PyShark.Plus, pcapkit supports two strategies of multiprocessing (server &pipeline). For more information, please refer to the documentation.

For the last two boots, this baloo_file_extractor has been hitting my CPU rather hard for quite some time. htop shows between 90 - 98% CPU usage on baloo_file_extractor. The fans ramp up and stay up there for maybe 5 - 10 minutes, but eventually this behavior only lasts for few minutes. I was just installing Manjaro KDE from Manjaro GNOME using Architect iso.

I noticed this first after uninstalling amd-ucode and rebooting (I have Intel CPU).

I have not experienced this behavior on Manjaro before on any DE, so I am wondering what I can do about this.

One should note that SEKOIA.IO also provides various enrichments for data such as email addresses, urls, domain names, filenames and ipv6 addresses. This example focuses on IPv4 addresses but the interested reader can refer to the documentation of SEKOIA.IO for an exhaustive list.

To rationalize the number of queries on SEKOIA.IO APIs and improve the performances of your instance, we recommend to attach a caching strategy to the lookup table. This strategy can be configured to keep in the memory of the graylog node, the last thousand SEKOIA.IO API responses for one hour (3600sec).

The last configuration step denotes the creation of the lookup table component that ties together the previously created data adapter and cache. The created lookup table can later be used by extractors, converters, pipeline functions and decorators of Graylog.

Synapse's Spotlight Tool simplifies the process of extracting analytically relevant information from prose reports. With Spotlight, users can load a PDF document or have Spotlight retrieve content from a URL and convert it to PDF format. Users can then review and process the report's content in Spotlight.

Using Synapse's extensible scrape library, Spotlight automatically recognizes and creates many common indicators of compromise (IOCs), such as hashes, IPv4 addresses, and domains (to name a few). Power-Ups may extend these capabilities; for example, if the Synapse-MITRE-ATTACK Power-Up is installed, Spotlight can recognize references to MITRE ATT&CK elements such as techniques or groups and create the corresponding nodes.

Anything extracted via Spotlight is automatically linked to the document's media:news node using a -(refs)> (for "references") light edge. This makes it easy to identify all of the nodes referenced in a given report, or to take any node and find all of the reports that reference it.

If you want to extract (and link) information beyond what Spotlight can automatically identify, you can highlight the relevant text in the report and tell Spotlight what kind of node to create using the right-click context menu.