Why no RE match of A AND B?

Paddy

unread,

Mar 2, 2003, 4:00:12 PM3/2/03

to

Just looking through the docs to confirm it and their seems to be no regular expression
construct that matches 'A and B' - an equivalent to the '|' character that matches A or B
To paraphrase the documentation it could be documented as:

"&"
A&B, where A and B can be arbitrary REs, creates a regular expression that will match both
A and B in any order. An arbitrary number of REs can be separated by the "&" in this way.
This can be used inside groups as well. REs separated by "&" are tried in arbitrary order,
the beginning of the first matching branch up to the end of the last branch that allows
the complete pattern (all branches) to match is considered the span of the match.

I envision it being used where a string is accepted at present if it matches multiple REs.
The REs could be rewritten as r"<RE#1>&<RE2>&<RE3>...".

The use of "&" could be supported by more efficient, (simultaneous?), matching of all
branches of the "&" construct - if such algorithms exist!

Just an idea,

Paddy.

Jeff Epler

unread,

Mar 2, 2003, 4:49:16 PM3/2/03

to

Do you have some use cases?

Am I correct in thinking that
A&B
is equivalent in meaning to
(AB)|(BA)
except for the lack of grouping characters and repetition of the
subpatterns?

Jeff

Rene Pijlman

unread,

Mar 2, 2003, 5:44:05 PM3/2/03

to

Paddy:

>Just looking through the docs to confirm it and their seems to be no regular expression
>construct that matches 'A and B' - an equivalent to the '|' character that matches A or B
>To paraphrase the documentation it could be documented as:
>
> "&"
>A&B, where A and B can be arbitrary REs, creates a regular expression that will match both
>A and B in any order.

I take it this means:

match(r1&r2,s) <==> match(r1,s) and match(r2,s)

For example:

r1 = aa? (a or aa)
r2 = aaa? (aa or aaa)

implies that only "aa" matches r1&r2

But what's the purpose?

I assume (without a formal prove at this point) that r1&r2 can
always be reformulated as a simpler expression, BIMBW.

--
René Pijlman

Anders J. Munch

unread,

Mar 2, 2003, 6:42:16 PM3/2/03

to

"Rene Pijlman" <repl...@the.newsgroup> wrote:
> I take it this means:
>
> match(r1&r2,s) <==> match(r1,s) and match(r2,s)
>
>

> I assume (without a formal prove at this point) that r1&r2 can
> always be reformulated as a simpler expression, BIMBW.

It can always be reformulated as an expression without an intersection
('&') operator. But not necessarily a simpler one.

BIMBW?

I've never understood either why the intersection operator is usually
missing from regular expression implementations. Tradition?

- Anders

Erik Max Francis

unread,

Mar 2, 2003, 7:13:55 PM3/2/03

to

"Anders J. Munch" wrote:

> It can always be reformulated as an expression without an intersection
> ('&') operator. But not necessarily a simpler one.
>
> BIMBW?
>
> I've never understood either why the intersection operator is usually
> missing from regular expression implementations. Tradition?

Probably for the reason you mention: It can easily be reexpressed as
another regular expression, whether simpler or more complicated. That
isn't true in general, for, say, the | regular expression operator

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ All the people in her neighborhood turn around and get mad and sing
\__/ Public Enemy
CSBuddy / http://www.alcyone.com/pyos/csbuddy/
A Counter-Strike server log file monitor in Python.

Rene Pijlman

unread,

Mar 2, 2003, 7:26:35 PM3/2/03

to

Anders J. Munch:

>"Rene Pijlman" <repl...@the.newsgroup> wrote:
>> I take it this means:
>>
>> match(r1&r2,s) <==> match(r1,s) and match(r2,s)
>>
>> I assume (without a formal prove at this point) that r1&r2 can
>> always be reformulated as a simpler expression, BIMBW.
>
>It can always be reformulated as an expression without an intersection
>('&') operator. But not necessarily a simpler one.

Could you give an example?

>BIMBW?

But I May Be Wrong :-)

>I've never understood either why the intersection operator is usually
>missing from regular expression implementations. Tradition?

The Dragon Book seems to take it for granted. It's not even an
exercise.

There's a whole body of theory built on the concept of regular
expressions, including formal language theory, finite state
automata and computational complexity. All of that may have to
be redesigned to accomodate the intersection in regular
expressions. I dunno, but if it aint broken... :-)

--
René Pijlman

Rene Pijlman

unread,

Mar 2, 2003, 7:31:52 PM3/2/03

to

Erik Max Francis:

>"Anders J. Munch" wrote:
>> I've never understood either why the intersection operator is usually
>> missing from regular expression implementations. Tradition?
>
>Probably for the reason you mention: It can easily be reexpressed as
>another regular expression, whether simpler or more complicated. That
>isn't true in general, for, say, the | regular expression operator

But OTOH there are notational shortcuts in the usual RE
languages. For example:

a+ == aa*
r{3,5} == rrrr?r?

So what makes the intersection different?

--
René Pijlman

Andrew Koenig

unread,

Mar 2, 2003, 7:34:14 PM3/2/03

to

Anders> It can always be reformulated as an expression without an
Anders> intersection ('&') operator. But not necessarily a simpler
Anders> one.

Anders> BIMBW?

Anders> I've never understood either why the intersection operator is usually
Anders> missing from regular expression implementations. Tradition?

Because in general, the intersection of two regular expressions is not
a regular expression, in the sense that it cannot generally be represented
by a regular grammar or parsed efficiently by a finite automaton.

--
Andrew Koenig, a...@research.att.com, http://www.research.att.com/info/ark

Erik Max Francis

unread,

Mar 2, 2003, 7:58:37 PM3/2/03

to

Rene Pijlman wrote:

> But OTOH there are notational shortcuts in the usual RE
> languages. For example:
>
> a+ == aa*
> r{3,5} == rrrr?r?
>
> So what makes the intersection different?

Probably frequency of use and utility. I can't think of a single case
in my long history of usage of regular expressions (both simple and
extended) where an intersection would have been useful (and warranted as
opposed to just writing out what that was equivalent to without
intersections). (In fact, it wasn't entirely clear to me what the
original poster intended with his & operator, which is why I withheld
comment until others had made their guesses.)

Tim Peters

unread,

Mar 2, 2003, 6:31:38 PM3/2/03

to

[Paddy]

> Just looking through the docs to confirm it and their seems to be
> no regular expression construct that matches 'A and B' - an equivalent

> to the '|' character that matches A or B.

This is usually done by running two regexp searches. It can be faked via
positive lookahead assertions, like

pat = re.compile(r'(?=.*x)(?=.*y)')

Then pat.match(string) succeeds if and only if string contains x and string
contains y (but both before the first newline character (if any), since .
doesn't match a newline by default).

> ..

> The use of "&" could be supported by more efficient, (simultaneous?),
> matching of all branches of the "&" construct - if such algorithms
> exist!

For a so-called DFA algorithm, yes, because the intersection of regular
languages is also regular, and membership in any regular language can be
recognized in one pass over the string in question. What to do in the
presence of (non-regular) backreferences is clear as mud, though, and
Python's regexp package isn't a DFA engine anyway. I don't know of any
regexp package that supports an intersection operator, so it would also
suffer from novelty. All in all, better to do the obvious thing (i.e., run
more than one regexp).

jcm

unread,

Mar 2, 2003, 8:04:19 PM3/2/03

to

Andrew Koenig <a...@research.att.com> wrote:
...

> Because in general, the intersection of two regular expressions is not
> a regular expression, in the sense that it cannot generally be represented
> by a regular grammar or parsed efficiently by a finite automaton.

I believe you're incorrect about this. Regular languages are closed
over intersection.

Andrew Koenig

unread,

Mar 2, 2003, 10:19:46 PM3/2/03

to

jcm> I believe you're incorrect about this. Regular languages are closed
jcm> over intersection.

I guess you're right -- you can always make a DFA with states that
consist of the cartesian product of the states of the operand machines.
But implementing it efficiently may be another matter.

Tim Peters

unread,

Mar 2, 2003, 10:12:24 PM3/2/03

to

[Andrew Koenig]

> Because in general, the intersection of two regular expressions is not
> a regular expression, in the sense that it cannot generally be represented
> by a regular grammar or parsed efficiently by a finite automaton.

Give that a little more thought. If you have DFAs for regular languages L
and M, the intersection corresponds to running them both in lockstep, where
an accepting state of the combined DFA is hit whenever you land in
acceptings states for both of the original DFAs. This can be formalized
easily enough be constructing a new DFA whose states are the cross-product
of the original DFAs' states -- then the rest is obvious <wink>.

Anders J. Munch

unread,

Mar 3, 2003, 3:53:54 AM3/3/03

to

"Erik Max Francis" <m...@alcyone.com> wrote:
> Rene Pijlman wrote:
>
> > But OTOH there are notational shortcuts in the usual RE
> > languages. For example:
> >
> > a+ == aa*
> > r{3,5} == rrrr?r?
> >
> > So what makes the intersection different?
>
> Probably frequency of use and utility.

Frequency of use for a feature not implemented is usually 0 ;-)

- Anders

Anders J. Munch

unread,

Mar 3, 2003, 4:13:32 AM3/3/03

to

"Andrew Koenig" <a...@research.att.com> wrote:
> jcm> I believe you're incorrect about this. Regular languages are closed
> jcm> over intersection.
>
> I guess you're right -- you can always make a DFA with states that
> consist of the cartesian product of the states of the operand machines.
> But implementing it efficiently may be another matter.

Have done it, it's easy at the DFA level.

- Anders

Erik Max Francis

unread,

Mar 3, 2003, 4:35:31 AM3/3/03

to

"Anders J. Munch" wrote:

> Frequency of use for a feature not implemented is usually 0 ;-)

Frequency of use predicated on the supposition that it would exist,
obviously. That is, I don't hear a lot of people clamoring for such
functionality, so it's probably not really high on anyone's list. (This
is the first time I've ever heard of someone suggesting such a feature
in many years' use of regular expressions.)

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE

/ \ Custom reconciles us to everything.
\__/ Edmund Burke
Computer science / http://www.alcyone.com/max/reference/compsci/
A computer science reference.

Harvey Thomas

unread,

Mar 3, 2003, 3:59:02 AM3/3/03

to

Paddy wrote:
>
> Just looking through the docs to confirm it and their seems
> to be no regular expression
> construct that matches 'A and B' - an equivalent to the '|'

> character that matches A or B
> To paraphrase the documentation it could be documented as:
>
> "&"
> A&B, where A and B can be arbitrary REs, creates a regular
> expression that will match both

> A and B in any order. An arbitrary number of REs can be
> separated by the "&" in this way.
> This can be used inside groups as well. REs separated by "&"
> are tried in arbitrary order,
> the beginning of the first matching branch up to the end of
> the last branch that allows
> the complete pattern (all branches) to match is considered
> the span of the match.
>
> I envision it being used where a string is accepted at
> present if it matches multiple REs.
> The REs could be rewritten as r"<RE#1>&<RE2>&<RE3>...".
>

> The use of "&" could be supported by more efficient,
> (simultaneous?), matching of all
> branches of the "&" construct - if such algorithms exist!
>

> Just an idea,
>
> Paddy.
>

One of the simplifications of SGML made in the creation of XML was the dropping of the & operator in content models. I believe there were two reasons for this:

1) & was hardly ever used.
2) Implementation problems. Suppose you have a&b&c&d&e&f&g&h. There are 40320 patterns that match this, but this is more than the number of backtracking states that the Python RE engine supports.

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.

Anders J. Munch

unread,

Mar 3, 2003, 7:02:34 AM3/3/03

to

"Erik Max Francis" <m...@alcyone.com> wrote:

> "Anders J. Munch" wrote:
>
> > Frequency of use for a feature not implemented is usually 0 ;-)
>
> Frequency of use predicated on the supposition that it would exist,
> obviously.

Taking statistical evidence from a hypothetical scenario, hm.

>That is, I don't hear a lot of people clamoring for such
> functionality, so it's probably not really high on anyone's list. (This
> is the first time I've ever heard of someone suggesting such a feature
> in many years' use of regular expressions.)
>

How can one miss what one doesn't know?

- Anders

Daniel Dittmar

unread,

Mar 3, 2003, 7:22:56 AM3/3/03

to

Erik Max Francis wrote:
> obviously. That is, I don't hear a lot of people clamoring for such
> functionality, so it's probably not really high on anyone's list.
> (This is the first time I've ever heard of someone suggesting such a
> feature in many years' use of regular expressions.)

The old IBM editors allowed to search for multiple words in any order, but
it was not based on regular expressions. If I remember right, it was used
mainly when you couldn't remember if a variable was 'string_length' or
'length_of_string', so it's more for interactive use than for writing
programs.

Daniel

Paddy

unread,

Mar 3, 2003, 2:03:21 PM3/3/03

to Jeff Epler

Hmmm, use cases.

I'm doing some thought experiments around the problem of Property matching in Electronics.
You create properties describing your design, i.e. "if R=0 then R=1 in the next cycle
prove that S=1 within the next 3 cycles" for your design.

You can mathematically prove some assertions on designs but for complex assertions or
large designs, formal proof is too resource hungry so what you can do is simulate your
design, producing the state of signals R and S over many cycles, then check if the
property holds for this simulation, and how many times the property holds in this
simulation (COVERAGE RESULTS).

Their is likely to be many properties to check on the design

In my thought experiments I thought that properties could be translated to REs in some
way, and simulation results into strings.
When thinking of the need to match several properties, (OK, it could be done serially), I
came up with my original question!

Pad.

Paddy

unread,

Mar 3, 2003, 2:05:57 PM3/3/03

to Jeff Epler

Hmmm, use cases.

Paddy

unread,

Mar 3, 2003, 2:26:31 PM3/3/03

to

Thanks all, for your input so far.

Having thought of it, and seen the posts on if it is necessary or not, or needed or not;
it seems a bit odd that we are left with one of the few places where the or 'operation' is
defined, but the orthogonal and 'operation' is not ?!

Paddy.

Erik Max Francis

unread,

Mar 3, 2003, 2:45:09 PM3/3/03

to

"Anders J. Munch" wrote:

> Taking statistical evidence from a hypothetical scenario, hm.

It's a judgement call.

> How can one miss what one doesn't know?

Well, since I keep saying in the parts that you're clipping, I won't
bother repeating myself again.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE

/ \ Strange is our situation here upon earth.
\__/ Albert Einstein
Esperanto reference / http://www.alcyone.com/max/lang/esperanto/
An Esperanto reference for English speakers.

John Machin

unread,

Mar 3, 2003, 4:20:34 PM3/3/03

to

Tim Peters <tim...@comcast.net> wrote in message news:<mailman.1046651583...@python.org>...

>
> For a so-called DFA algorithm, yes, because the intersection of regular
> languages is also regular, and membership in any regular language can be
> recognized in one pass over the string in question. What to do in the
> presence of (non-regular) backreferences is clear as mud, though, and
> Python's regexp package isn't a DFA engine anyway. I don't know of any
> regexp package that supports an intersection operator, so it would also
> suffer from novelty. All in all, better to do the obvious thing (i.e., run
> more than one regexp).

I seem to detect two quite different usages of "intersection":

(1) The regular language (RL) characterised by the RE "abc" produces
just one string i.e. 'abc'. Ditto the RL described by the RE "xyz"
produces just one string 'xyz'. I (a mug punter, not an expert) would
understand the intersection of the two RLs to be empty i.e. [] i.e.
the two RLs produce no common strings. Same with "a+" and "b+",
whereas "a*" and "b*" have a non-empty intersection [''].

(2) The effect that the OP (together with Tim and other posters)
seemed to expect from the & operator was that (abc)&(xyz) would match
(abcxyz)|(xyzabc).

<:-)>
And now for a pragmatic point: if the & operator was ever so slightly
useful, or ever so slightly demanded, or ever so slightly easy to
implement, it would have surfaced in a certain language long before
now.
</:-)>

Joshua Marshall

unread,

Mar 3, 2003, 4:31:45 PM3/3/03

to

John Machin <sjma...@lexicon.net> wrote:
> Tim Peters <tim...@comcast.net> wrote in message news:<mailman.1046651583...@python.org>...
>>
>> For a so-called DFA algorithm, yes, because the intersection of regular
>> languages is also regular, and membership in any regular language can be
>> recognized in one pass over the string in question. What to do in the
>> presence of (non-regular) backreferences is clear as mud, though, and
>> Python's regexp package isn't a DFA engine anyway. I don't know of any
>> regexp package that supports an intersection operator, so it would also
>> suffer from novelty. All in all, better to do the obvious thing (i.e., run
>> more than one regexp).

> I seem to detect two quite different usages of "intersection":

> (1) The regular language (RL) characterised by the RE "abc" produces
> just one string i.e. 'abc'. Ditto the RL described by the RE "xyz"
> produces just one string 'xyz'. I (a mug punter, not an expert) would
> understand the intersection of the two RLs to be empty i.e. [] i.e.
> the two RLs produce no common strings. Same with "a+" and "b+",
> whereas "a*" and "b*" have a non-empty intersection [''].

> (2) The effect that the OP (together with Tim and other posters)
> seemed to expect from the & operator was that (abc)&(xyz) would match
> (abcxyz)|(xyzabc).

I think the difference is that in Python, the RE "abc" matches any
string containing 'abc', not just the exact string itself. I'd guess
most formalists who aren't familiar with Python would write that
regular expression as ".*abc.*", and expect they's need to write
"(.*abc.*)&(.*xyz.*)" to match both 'abcxyz' and 'xyzabc'.

David LeBlanc

unread,

Mar 3, 2003, 6:22:12 PM3/3/03

to

<snip>

> And now for a pragmatic point: if the & operator was ever so slightly
> useful, or ever so slightly demanded, or ever so slightly easy to
> implement, it would have surfaced in a certain language long before
> now.
> </:-)>

A certain molluskian by-product language seems to have Big Things in mind
for REs in it's next incarnation, should it happen. The write-up is detailed
and interesting.

Dave LeBlanc
Seattle, WA USA

Tim Peters

unread,

Mar 3, 2003, 10:59:22 PM3/3/03

to

[John Machin]

> I seem to detect two quite different usages of "intersection":
>
> (1) The regular language (RL) characterised by the RE "abc" produces
> just one string i.e. 'abc'. Ditto the RL described by the RE "xyz"
> produces just one string 'xyz'.

Most people think of the Python regexp "abc" matching any string containing
"abc", though. You can't get very far discussing regexps in terms of
regular expressions <0.9 wink>.

> I (a mug punter, not an expert) would understand the intersection of the
> two RLs to be empty i.e. [] i.e. the two RLs produce no common strings.

Yes.

> Same with "a+" and "b+", whereas "a*" and "b*" have a non-empty
> intersection [''].

Both so.

> (2) The effect that the OP (together with Tim and other posters)
> seemed to expect from the & operator was that (abc)&(xyz) would match
> (abcxyz)|(xyzabc).

I expected that the OP expected it would be equivalent to the RE

^[\x00-\xff]*(abc[\x00-\xff]*xyz|xyz[\x00-\xff]*abc)[\x00-\xff]*\Z

in terms of whether or not it matched.

> <:-)>
> And now for a pragmatic point: if the & operator was ever so slightly
> useful, or ever so slightly demanded, or ever so slightly easy to
> implement, it would have surfaced in a certain language long before
> now.
> </:-)>

RL intersection *is* routinely implemented in finite-state machine packages.
For example, see

http://www.research.att.com/~mohri/fsm/

and that package's fsmintersect program. I can think of very few plausible
use cases for it in string-matching, though ("it's a Sunday and a Thursday"
or "it's an identifier and it's a number" make less sense than "it's a floor
wax and a dessert topping"), and in this respect it's relevant that common
notations for grammar specification (like EBNF) don't bother with an
intersection operator either.

John Machin

unread,

Mar 4, 2003, 2:46:17 AM3/4/03

to

Joshua Marshall <jmar...@mathworks.com> wrote in message news:<b40hk1$aig$1...@ginger.mathworks.com>...

> I think the difference is that in Python, the RE "abc" matches any
> string containing 'abc', not just the exact string itself.

I am using "match" in the ordinary sense which coincides with the
Python definition of re.match(), whereas you seem to be using "match"
to mean what re.search() does. If I had have meant the RE ".*abc.*", I
would have said that.

Paddy

unread,

Mar 4, 2003, 3:37:17 AM3/4/03

to

David LeBlanc wrote:
> <snip>
>
>>And now for a pragmatic point: if the & operator was ever so slightly
>>useful, or ever so slightly demanded, or ever so slightly easy to
>>implement, it would have surfaced in a certain language long before
>>now.
>></:-)>

But do they think indentation for block structure is useful? Demanded?

>
>
> A certain molluskian by-product language seems to have Big Things in mind
> for REs in it's next incarnation, should it happen. The write-up is detailed
> and interesting.
>
> Dave LeBlanc
> Seattle, WA USA
>
>

Yeh,
I've bin there, read about it. from that I got the point that something that has been
stagnant for many years - a languages regexp package - is now being re-thought in quite
revolutionary ways.
If they can do it, it makes me think that the re packages contents are up for review!

(That document on perl 6 REs I guess were both on about).

Pad.

Anders J. Munch

unread,

Mar 4, 2003, 6:08:21 AM3/4/03

to

"Joshua Marshall" <jmar...@mathworks.com> wrote:
>
> > I seem to detect two quite different usages of "intersection":
>

[...]

>
> I think the difference is that in Python, the RE "abc" matches any
> string containing 'abc', not just the exact string itself. I'd guess
> most formalists who aren't familiar with Python would write that
> regular expression as ".*abc.*", and expect they's need to write
> "(.*abc.*)&(.*xyz.*)" to match both 'abcxyz' and 'xyzabc'.

Nope, formalists like me (well in this case anyway) would say that
regular expression "abc" doesn't actively match anything by itself.
It just defines a set of strings. 'abcxyz' and 'xyzabc' are not
members of the set defined by the RE "abc". To use it for something,
you need to use an active component such as re.match or re.search,
which do not check if the entire string and the RE match, but look for
matching _sub_strings.

I used the word "intersection" exactly to clarify that, regardless of
what the OP meant for an "&" operator to do, what I was talking about
was the "formalist" operation of intersection, which really is just
set intersection. Please reserve "intersection" for that use, or the
confusion will be total.

- Anders

Anders J. Munch

unread,

Mar 4, 2003, 6:24:17 AM3/4/03

to

"Erik Max Francis" <m...@alcyone.com> wrote:

> "Anders J. Munch" wrote:
>
> > Taking statistical evidence from a hypothetical scenario, hm.
>
> It's a judgement call.
>
> > How can one miss what one doesn't know?
>
> Well, since I keep saying in the parts that you're clipping, I won't
> bother repeating myself again.

I'm sorry, I don't follow you. If you have written something earlier
that addresses "How can one miss what one doesn't know?" then I am
unable to find it.

- Anders

Anders J. Munch

unread,

Mar 4, 2003, 7:26:38 AM3/4/03

to

"Rene Pijlman" <repl...@the.newsgroup> wrote:
> Anders J. Munch:
> >"Rene Pijlman" <repl...@the.newsgroup> wrote:
> >> I take it this means:
> >>
> >> match(r1&r2,s) <==> match(r1,s) and match(r2,s)
> >>
> >> I assume (without a formal prove at this point) that r1&r2 can
> >> always be reformulated as a simpler expression, BIMBW.
> >
> >It can always be reformulated as an expression without an intersection
> >('&') operator. But not necessarily a simpler one.
>
> Could you give an example?

This is not in the usual syntax, but I hope you can make something of
it anyway. (Ever since I wrote a utility with cleaned-up regexp
syntax, traditional syntax is just too painful for anything
non-trivial.)

Here's a regular expression that matches a C comment unambiguously.

:ccomment = "/*" ([^"*"] | "*"+ [^"*/"])* "*"+ "/"

Say I have another regexp that matches certain sorts of special syntax
that might occur in comments:

:encoding = "-*-" [" "A-Za-z]* "coding:" [A-Za-z0-9"- "]+ "-*-"

Now combine these two to form a regular expression to match comments
that contain encoding instructions. With intersection, it's fairly
simple:

:containsencoding = .* :encoding .*
:commentwithencoding = :ccomment & :containsencoding

I haven't tried to express this without intersection, and I wouldn't
care to try either.

the-real-fun-begins-when-you-have-a-complement-operator-as-well-ly
y'rs, Anders

Anders J. Munch

unread,

Mar 4, 2003, 7:47:21 AM3/4/03

to

"Tim Peters" <tim...@comcast.net> wrote:
> RL intersection *is* routinely implemented in finite-state machine packages.
> For example, see
>
> http://www.research.att.com/~mohri/fsm/
>
> and that package's fsmintersect program. I can think of very few plausible
> use cases for it in string-matching, though ("it's a Sunday and a Thursday"
> or "it's an identifier and it's a number" make less sense than "it's a floor
> wax and a dessert topping"), and in this respect it's relevant that common
> notations for grammar specification (like EBNF) don't bother with an
> intersection operator either.

It's useful in combination with complement ("it's a .*day but not a
Sunday").

But then complement is probably too spooky for most people.

- Anders

Tim Peters

unread,

Mar 4, 2003, 9:16:27 PM3/4/03

to

[Anders J. Munch]
> It's [regexp intersection] useful in combination with complement ("it's

> a .*day but not a Sunday").

If that's the use case, it's more an argument for adding a difference
operator (like R-S = the strings matched by R less the strings matched by
S).

> But then complement is probably too spooky for most people.

Python has negative lookahead assertions now -- but I'm not sure I've ever
seen them used <wink>.

Anders J. Munch

unread,

Mar 5, 2003, 6:13:40 AM3/5/03

to

"Tim Peters" <tim...@comcast.net> wrote:
> [Anders J. Munch]
> > It's [regexp intersection] useful in combination with complement ("it's
> > a .*day but not a Sunday").
>
> If that's the use case, it's more an argument for adding a difference
> operator (like R-S = the strings matched by R less the strings matched by
> S).

That might be so. Not that I'm proposing to add anything of either
kind.

It's easier though to express difference through intersection and
complement
R-S = R & ~S
than the other way around
R & S = (R|S) - (R-S) - (S-R)

- Anders