Regular expressions in include and exclude lists are not documented as anchored

17 views
Skip to first unread message

Sergei Morozov

unread,
Sep 19, 2022, 6:01:12 PM9/19/22
to debezium
Hi List,

As a developer configuring Debezium connectors, it is not obvious to me that the regular expressions used in the include and exclude lists (e.g. table.exclude.list) are anchored. The only way to figure that out was to look into the implementation and google for the details.

If I were to take over an existing configuration that contains say database.exclude.list=^foo$, and I wanted to exclude all databases that start with "bar_", I would add something like ^bar_ to the list, which wouldn't work as expected, because the regular expression is anchored. The ^ and $ anchors in the existing configuration are allowed and are syntactically correct but are redundant and misleading.

The current documentation on connector configuration (I looked at the MySQL connector) doesn't mention anything about the regular expressions being anchored, nor it provides any examples which might help figure this out.

Besides the documentation, how else could the user experience in this regard be improved?

Thanks.

jiri.p...@gmail.com

unread,
Sep 20, 2022, 1:23:52 AM9/20/22
to debezium
Hi,

I'd personally just vote for documentation improvements. The reason is that many people are using the list just as enumeration of table names. In case of non-anchored regexes there is a risk of matching unexpected tables. So from my PoV this is just either basic vs power user use cases.

J.

Gunnar Morling

unread,
Sep 20, 2022, 4:38:26 AM9/20/22
to debezium
Hey Sergei,

Thanks for raising this, it's an interesting detail. The usage of "match" in the description of the include/exclude lists suggests to me that the given expressions are applied to the entire name (rather than finding substrings), but I'm all for  further clarifying that. I've logged https://issues.redhat.com/browse/DBZ-5625 for this requirement.

Cheers,

--Gunnar

Sergei Morozov

unread,
Sep 20, 2022, 12:06:41 PM9/20/22
to debezium
Thank you all,

> In case of non-anchored regexes there is a risk of matching unexpected tables.

I agree that this would be too risky. I was wondering if there was a way to signal to the user that the explicitly provided anchors are redundant. But unless it's supported by the standard library, it is fine to have this just documented. It would be overkill to pre-process the regular expressions on the Debezium end.

> The usage of "match" in the description of the include/exclude lists suggests to me that the given expressions are applied to the entire name [...]

That makes sense given one's Java background but it's not the case in other ecosystems (e.g. Perl-compatible regular expressions).

I agree that the documentation should solve the issue.

Regards,
Reply all
Reply to author
Forward
0 new messages