> do :'s really need to be escaped? that could explain why some regex i wrote > the other day wouldn't work
Not necessary but you are safer when it is escaped. There are situations when ":" are a part of special meaning in regex (for example non-capturing groups start from "(?:" and end with ")" so, it is better to escape ":" when you want to match it as literal
> also.. one thing i don't get about | -- how does it know you're looking for > "red" or "blue" and not "re" + ("d" or "b") + "lue"?
Thanks to round brackets! Here, the first option starts right after opening round bracket and ends before a pipe, the second option starts right after pipe and ends before closing round bracket.
For single letters or digits a character class syntax is more suitable. If you want to match "a" or "b" or "c" use [abc] construct. Although you won't be punished if you use (a|b|c) construct.
(a|b|c) construct was meant for words, not for single letters.
On Sat, Oct 31, 2009 at 11:57 AM, Eugeny Sattler <eugeny.satt...@gmail.com>wrote:
> > also.. one thing i don't get about | -- how does it know you're looking > for > > "red" or "blue" and not "re" + ("d" or "b") + "lue"? > Thanks to round brackets! Here, the first option starts right after > opening round bracket and ends before a pipe, the second option starts > right after pipe and ends before closing round bracket.
> For single letters or digits a character class syntax is more suitable. > If you want to match "a" or "b" or "c" use [abc] construct. > Although you won't be punished if you use (a|b|c) construct.
> (a|b|c) construct was meant for words, not for single letters.
But how greedy is the |?
i mean for example if i have a(.*?b)+?(?=hi|bye).blah|hi|green$ .. how is that grouped? where does the alternative before the "|hi" begin?
> But how greedy is the |? > i mean for example if i have a(.*?b)+?(?=hi|bye).blah|hi|green$ .. how is > that grouped? where does the alternative before the "|hi" begin?
1) that syntax looks erroneous to me because the pipe between "blah" and "hi" and that between "hi" and "green" are not accompanied by round brackets. Ask you regex processor if it has the same opinion. :)
2) boundaries of lookahed construct can serve as boundaries of alternation. so, no need to write (?=(hi|bye)) while it is enough to write (?=hi|bye)
just as you did in your example. Just bear in mind that "(?=(hi|bye))" and "(?=hi|bye)" are treated the same. The first option starts after "?=" and ends before pipe. The second option starts after pipe and ends before closing round bracket.
My advice: stop thinking that regex engine first finds a pipe in your espression and afterwards looks for ends of your alternate options to the right and to the left (in a greedy or lazy way). The regex engine processes regular expression always from left to right, and if it did not encounter an opening round bracket, you have lost your chance to tell it that you start an alternation.