Lookahead (and, less so, lookbehind) assertions are actually extremely useful when crafting powerful regular expressions even in non-streaming contexts.
For example, if I want to use a regular expression to parse an HTML document (yes yes, bad idea, but bear with me), and I'm interested in grabbing the very first <pre> tag, I might write a regular expression that looks like
This is great, unless there's two <pre> tags in the document. Now I might adjust it to look like
so I only get the first. But what if the <pre> tag actually has another <pre> tag nested in it? At the time I wrote this regex, perhaps this never happened, but the document was modified later. I'd rather have my regex fail then give me bad data, so how do I do that? This is actually not that hard if we have the atomic operator (?>…) (called the possessive operator in the re2 syntax), except re2 doesn't support that either, so we have to craft a very weird pattern (and we're simplifying here by assuming there's no whitespace in the tag)
(I make no guarantees about the correctness of the pattern)
The atomic version looks like
Here we atomically match <pre>, followed by a non-matching token (\A, which we know cannot match at this point). The atomic operator means that once we encounter <pre> we cannot backtrack, and thus the pattern is guaranteed to fail.
The version with the lookahead operator is rather trivial
Here we just test for a lookahead on <pre> before consuming each character.
This is actually a real-world example. I have some source code right now that pulls a document from the web and parses out the content of the first <pre> tag. I tried using an HTML parser but the go-html-transform package can't handle this document (filed as issue #5), and I'm not building Go from source so I don't have the exp/html package. Luckily the document in question has a very simple structure that hasn't changed in years, so I don't feel so bad about using string parsing. However, I would like to be able to have my regular expression error out if the document does, in fact, change in structure, but I don't want to introduce the unreadable mess that is the 3rd pattern into my code.