> In order for me to get the RegEx for a complex line readable and able to test, I define every variable explicitly.
Readable reg-exp? I don't understand what you mean... ;)
> Ability to define variables that are not processed / stored
Nope, it's not supported. The reg-exp for every variable, e.g. $ipAddress or
$additionalItem_postalCode, will be enclosed by parentheses and then substituted
into the "parser.lineRegExp" expression. This means every variable will be a
group which will tokenize the input row.
I would suggest that you use comments plus non-capturing groups ("(?:X)" see [1])
to solve the problem. This technique is used in "shadowserver-spam-url"-config [2]:
# Skips the following fields: "region","city","subject","src_region","src_city","sender"
parser.lineRegExp=^"$logTimestamp",$url,$hostname,$ipAddress,$asn,$countryCode,(?:".*?"|""|),(?:".*?"|""|),(?:".*?"|""|),$ipAddress2,$asn2,$countryCode2,(?:".*?"|""|),(?:".*?"|""|),(?:".*?"|""|)
[1]
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
[2]
https://github.com/cert-se/megatron-java/blob/master/conf/job-type/shadowserver-spam-url.properties
/Tor