My first thought would be to express your 'A and B' regex as:
(A.*B)|(B.*A)
with whatever padding, etc, is necessary. You can even substitute in the
sub-regex for A and B to avoid writing them out twice.
--
Craig Ringer
--
Craig Ringer
Regular expressions are designed to define and detect repetition and
alternatives. These are easily implemented with finite state machines.
REs not meant for conjunction. 'And' can be done but, as I remember, only
messily and slowly. The demonstration I once read was definitely
theoretical, not practical.
Python was designed for and logic (among everything else). If you want
practical code, use it.
if match1 and match2: do whatever.
Terry J. Reedy
Provided you are careful to avoid overlapping matches e.g. data = 'Fred
Johnson', query = ('John', 'Johnson').
Even this approach (A follows B or B follows A) gets tricky in the real
world of the OP, who appears to be attempting some sort of name
matching, where the word order may be scrambled. Problem is, punters
can have more than 2 words in their names, e.g. Mao Ze Dong[*], Louise
de la Valliere, and Johann Georg Friedrich von und zu Hohenlohe ... or
misreading handwriting can change the number of perceived words, e.g.
Walenkamp -> Wabu Kamp (no kidding).
[*] aka Mao Zedong aka Mao Tse Tung -- difficult enough before we start
considering variations in the order of the words.
That won't work because of overlaps. Consider
barkeep
with a search for A='bark' and B='keep'.
Neither A.*B nor B.*A will match because the 'k' needs to
be in both A and B.
The OP asked for words, so consecutive letters separated
by non-letters or end of string. With that restriction
this solution will work.
Another possibility is to use positive assertions, as in
(?=A)(?=.*B)|(?=B)(?=.*A)
The best solution is to do a string.find and not worry about
implementing this as a regexp.
Andrew
da...@dalkescientific.com