Regex lookahead and lookbehind in an import filter script

34 views
Skip to first unread message

Ihor Mykhalevych

unread,
Nov 22, 2023, 11:07:06 AM11/22/23
to xnat_discussion
I write a script to filter out some specific cases out of the session that we import.
For example, a session have two series:

ep2d-advdiff-3Scan-4bval_spair_std_ADC
ep2d-advdiff-3Scan-4bval_spair Upper Prostate_ADC

so I try to use negative lookbehind like this:

.*?(?<!Upper\sProstate.{1,20})ADC

in the whitelist import script.

It works correctly here https://regex101.com/r/8rzry7/1,  but XNAT upload accepts both series. Please, suggest a way to go.

John Flavin

unread,
Nov 27, 2023, 11:58:41 AM11/27/23
to xnat_di...@googlegroups.com
Short answer: I think you should play around with your regular expression some more and try to simplify the lookbehind. Specifically the .{1,20} isn't permitted inside a lookbehind with all regex engines.

Longer answer: The weird thing is, Java's regex engine should actually accept that quantifier since it has an upper bound. (I'm getting this information from https://www.regular-expressions.info/lookaround.html#limitbehind) But even if it is supposed to be accepted, I am still guessing that is the reason that it isn't working. And regardless, matching a string using that expression would be very very slow. Lookahead/lookbehind are slow in general and you’re doing it up to 20 times, which seems like a lot. 

If there is any way to get more specific with exactly what you’re matching in that lookbehind and especially how many characters away it should be, that would probably help. I'm not sure if this would work for your use case but maybe you could flip the match, so instead of allowing everything that's not a match you’re filtering out anything that does match, and then you simplify the regex like
.*?Upper\sProstate.{1,20}ADC$

Anyway, these are all just guesses about the problem. Let us know if you were able to solve it or need more help.

John Flavin
Backend Team Lead
He/They/Any


--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xnat_discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xnat_discussion/437ed1e2-8f24-47d9-be72-edf8b337b4d3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages