A regex puzzle

28 views
Skip to first unread message

Edward K. Ream

unread,
Aug 17, 2020, 9:49:13 AM8/17/20
to leo-editor
I would like a regex that finds complete and disjoint typescript multiline block comments

The following does not work, because the flags do not play well together.

re.compile(r'(/\*.*?\*/)(.*)', re.DOTALL | re.MULTILINE)

For example, even with .*?, the pattern will match the entire string:

/* first comment */
body
/* second comment */

The relevant code is in the bug-1617 branch. It's not pretty.

I don't know of any elegant solution. Do you?

Edward

Edward K. Ream

unread,
Aug 17, 2020, 9:54:12 AM8/17/20
to leo-editor
On Monday, August 17, 2020 at 8:49:13 AM UTC-5, Edward K. Ream wrote:

I would like a regex that finds complete and disjoint typescript multiline block comments

If you are interested in this puzzle, I recommend using https://pythex.org/ to play with possible solutions.

Edward

Thomas Passin

unread,
Aug 17, 2020, 10:21:45 AM8/17/20
to leo-editor
s1 = '''/*comment one*/
not a comment
another non-comment line
/*comment 2*/
more non-comment text'''

bits = s1.split('/*')
pieces = [x.split('*/') for x in bits]

print(pieces)
[[''], ['comment one', '\nnot a comment\nanother non-comment line\n'], ['comment 2', '\nmore non-comment text']]

for p in pieces:
    print(p[0])

comment one
comment 2

vitalije

unread,
Aug 17, 2020, 10:53:37 AM8/17/20
to leo-editor
re.compile(r'(/\*(:?.*?)\*/)')

The inner comment text must be grouped separately to be able to apply *? operator on just the inner characters. Without this grouping, *? operator applies to all matched characters to the left. Your regex would match smaller part if you have had nested comments. Like
/* first comment /* blah blah */ */



Vitalije

Thomas Passin

unread,
Aug 17, 2020, 10:59:39 AM8/17/20
to leo-editor
Nested comments aren't allowed in js and ts, are they?

vitalije

unread,
Aug 17, 2020, 11:35:08 AM8/17/20
to leo-editor


On Monday, August 17, 2020 at 4:59:39 PM UTC+2, Thomas Passin wrote:
Nested comments aren't allowed in js and ts, are they?


Probably not, but that was not my point.
Looking again in the original regex it seems the first group would match any number of sequences of triplets containing "/*" followed by any character. For example:
/*a/*b/*c/*d.... The second group will match everything that follows. In the given example first group doesn't match anything, and the second group matches whole string.

Vitalije

Thomas Passin

unread,
Aug 17, 2020, 12:04:16 PM8/17/20
to leo-editor
"When you use an regex to solve a problem, then you have another problem".

Edward K. Ream

unread,
Aug 17, 2020, 12:19:13 PM8/17/20
to leo-editor
On Mon, Aug 17, 2020 at 11:04 AM Thomas Passin <tbp1...@gmail.com> wrote:
"When you use an regex to solve a problem, then you have another problem".

This aphorism is misleading. Believing it was one of the worst mistakes I have made in my coding career. regex's are usually the simplest way to detect patterns. The alternatives are often much worse.

Edward

Edward K. Ream

unread,
Aug 17, 2020, 12:20:22 PM8/17/20
to leo-editor
On Mon, Aug 17, 2020 at 9:53 AM vitalije <vita...@gmail.com> wrote:

re.compile(r'(/\*(:?.*?)\*/)')

Thanks for this. Apparently the second group, (.*) is causing the problem.

Without this second group, the original first group works!

This is good news, because the second group can easily be computed.  Stay tuned...

Edward

Thomas Passin

unread,
Aug 17, 2020, 1:13:07 PM8/17/20
to leo-editor
Yeah, I know.  I was just being a bit flip.  If you need what it can do, you can hardly do without.  And sometimes you can find a way to format the regex string that makes it a lot more clear.
Reply all
Reply to author
Forward
0 new messages