Jira (PUP-6875) [Spike] Investigate usage of ReGex \w character class in Puppet

2 views
Skip to first unread message

Ethan Brown (JIRA)

unread,
Nov 3, 2016, 5:21:03 PM11/3/16
to puppe...@googlegroups.com
Ethan Brown created an issue
 
Puppet / Bug PUP-6875
[Spike] Investigate usage of ReGex \w character class in Puppet
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2016/11/03 2:20 PM
Priority: Normal Normal
Reporter: Ethan Brown

The \w character class in Ruby does not support Unicode and will only match on [a-zA-z0-9_] which is limiting.

Instead, we should generally be using [[:word:]] where Unicode is important within a regex. This ticket involves auditing the existing code to identify critically important areas where this may be a problem in types / providers, gems, etc. Note that the Unicode compliant word character class will be slower than the ASCII version, so it would be good to keep an eye on performance as changes are proposed / made

This could also extend to supported modules, but should probably be a new ticket.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Geoff Nichols (JIRA)

unread,
Nov 9, 2016, 1:38:03 PM11/9/16
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
The {{\w}} character class in Ruby does not support Unicode and will only match on {{[a-zA-z0-9_]}} which is limiting.

Instead, we should generally be using {{[[:word:]]}} where Unicode is important within a regex.  This ticket involves auditing the existing code to identify critically important areas where this may be a problem in types / providers, gems, etc.  Note that the Unicode compliant word character class will be slower than the ASCII version, so it would be good to keep an eye on performance as changes are proposed / made

This could also extend to supported modules, but should probably be a new ticket.


h5. In scope for Agent + Platform Team
- Identify areas of concern, file tickets, assess impact and priority.
- Analysis of gems should limit scope (using {{--without development}} flag to bundler).

Geoff Nichols (JIRA)

unread,
Nov 9, 2016, 1:39:03 PM11/9/16
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP 2016- 11 12 - 30 28

Geoff Nichols (JIRA)

unread,
Nov 9, 2016, 1:39:04 PM11/9/16
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP  Grooming  2016-11-30

Geoff Nichols (JIRA)

unread,
Nov 9, 2016, 1:40:42 PM11/9/16
to puppe...@googlegroups.com

Henrik Lindberg (JIRA)

unread,
Nov 10, 2016, 9:35:07 AM11/10/16
to puppe...@googlegroups.com
Henrik Lindberg commented on Bug PUP-6875
 
Re: [Spike] Investigate usage of ReGex \w character class in Puppet

Please note that simply changing \w to :word: is not doable for elements of the puppet language since names are case independent and for many characters in the :word: class there is no way to do up/down case without also having a Locale. We cannot make the language be locale dependant. To work well we would need to make the language either be case dependant, or use Ruby's wonky "ascii is up/down-cased, but not others"-mode.

Geoff Nichols (JIRA)

unread,
Nov 30, 2016, 12:24:07 PM11/30/16
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP  2016  2017 - 12 01 - 28 11

Geoff Nichols (JIRA)

unread,
Dec 13, 2016, 8:27:07 PM12/13/16
to puppe...@googlegroups.com

Geoff Nichols (JIRA)

unread,
Dec 14, 2016, 1:43:09 PM12/14/16
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP 2017- 01 02 - 25 08

Ethan Brown (JIRA)

unread,
Jan 9, 2017, 8:02:02 PM1/9/17
to puppe...@googlegroups.com
Ethan Brown commented on Bug PUP-6875
 
Re: [Spike] Investigate usage of ReGex \w character class in Puppet

Henrik Lindberg thanks for the note - when you say "elements of the puppet language", can you be a bit more specific on the scope there? Do you mean keywords / identifiers?

We would definitely have you review anything in the parsing / lexing / etc should changes eventually be proposed there (which I don't believe we were anticipating). This ticket was filed because of a few errors encountered / fixed for end users - and is meant to identify any more similar issues proactively.

Henrik Lindberg (JIRA)

unread,
Jan 10, 2017, 6:50:02 AM1/10/17
to puppe...@googlegroups.com

Ethan Brown I used that vague term "elements of the language" to describe that there are various regular expressions in the lexer that match various parts of the language - it is not as simple as one regexp == identifier; maybe "the regular expressions used by the lexer to identify the tokens of the language then recognized as syntactical elements of the language by the parser" is more accurate. Several of those tokens are for "elements of the language" (tokens put together in the parser to have a specific meaning).

Or put differently: Since we cannot do up/down case of arbitrary unicode as that requires both a LOCALE and a gem to compute (Ruby on its own cannot do this) and the gems for this are written in Ruby and have severe performance implications, we should not alter the regular expressions in the lexer or parser. It is simply a can of worms where it is best to keep the lid on.

Geoff Nichols (JIRA)

unread,
Jan 11, 2017, 1:48:11 PM1/11/17
to puppe...@googlegroups.com

Geoff Nichols (JIRA)

unread,
Feb 8, 2017, 1:41:14 PM2/8/17
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP 2017- 02 03 - 22 08

Geoff Nichols (JIRA)

unread,
Feb 22, 2017, 12:55:06 PM2/22/17
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: AP  2017-03-08  Ready for Engineering

Geoff Nichols (JIRA)

unread,
Mar 22, 2017, 9:03:03 AM3/22/17
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: Agent Ready for Engineering  1  0

Geoff Nichols (JIRA)

unread,
Apr 11, 2017, 4:29:09 PM4/11/17
to puppe...@googlegroups.com

John Duarte (JIRA)

unread,
May 16, 2017, 5:01:29 PM5/16/17
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Feb 7, 2018, 8:23:02 PM2/7/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-6875
 
Re: [Spike] Investigate usage of ReGex \w character class in Puppet

One note, the unicode version is a much larger set of characters and has caused performance problems with tags. So we have to be careful about which regexps we change.

This message was sent by Atlassian JIRA (v7.5.1#75006-sha1:7df2574)
Atlassian logo

Josh Cooper (Jira)

unread,
Jun 6, 2020, 7:53:03 PM6/6/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
 
Change By: Josh Cooper
Team: Coremunity Night's Watch
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages