Ability to define variables that are not processed / stored ?

2 views
Skip to first unread message

Kick Megatron

unread,
Feb 28, 2014, 11:22:20 AM2/28/14
to megatron...@googlegroups.com
HI, still working on that same import script with the windows epoch time (now fixed with an external perl script prior to feed Megatron).

The question I have for this email is a different one.
In order for me to get the RegEx for a complex line readable and able to test, I define every variable explicitly. In my case I have lines with 28 fields.

Now I have it working with the defining all the fields, but I do not want to store all the fields in the DB (lots of useless fields).
Is there something existing like: parser.item.noimport.additionalItem.postalcode=[^\t]*|   ??



Changing the variables that I do not want with the actual regex seems to be very difficult to me to do correct (besides it ruines readability of the parser.lineRegExp).

Just asking :-)

Regards,
Kick

Tor Johnson

unread,
Mar 3, 2014, 2:28:31 AM3/3/14
to megatron...@googlegroups.com, Kick Megatron
> In order for me to get the RegEx for a complex line readable and able to test, I define every variable explicitly.

Readable reg-exp? I don't understand what you mean... ;)

> Ability to define variables that are not processed / stored

Nope, it's not supported. The reg-exp for every variable, e.g. $ipAddress or
$additionalItem_postalCode, will be enclosed by parentheses and then substituted
into the "parser.lineRegExp" expression. This means every variable will be a
group which will tokenize the input row.

I would suggest that you use comments plus non-capturing groups ("(?:X)" see [1])
to solve the problem. This technique is used in "shadowserver-spam-url"-config [2]:

# Skips the following fields: "region","city","subject","src_region","src_city","sender"
parser.lineRegExp=^"$logTimestamp",$url,$hostname,$ipAddress,$asn,$countryCode,(?:".*?"|""|),(?:".*?"|""|),(?:".*?"|""|),$ipAddress2,$asn2,$countryCode2,(?:".*?"|""|),(?:".*?"|""|),(?:".*?"|""|)

[1] http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
[2] https://github.com/cert-se/megatron-java/blob/master/conf/job-type/shadowserver-spam-url.properties

/Tor
Reply all
Reply to author
Forward
0 new messages