backslashes in regex_replace filter

243 views
Skip to first unread message

rjwagn...@gmail.com

unread,
Jan 8, 2024, 4:15:54 PM1/8/24
to Ansible Project
Hi - Does anyone (who understands how backslashes work in Ansible/YAML) know why both of the following tasks work:

(ansible2_15_8) rowagn@localhost:~#> cat d.yml
- hosts: all
  gather_facts: no
  vars:
    s: 'This is a string containing 1 and 2.'
    t:
      - p1_xyz
      - p2_xyz
      - p4_xyz

  tasks:
  - name: single backslash
    debug:
      msg: '{{ item }} is in s'
    loop: '{{ t }}'
    when: ( item | regex_replace('^p(\d+).*$', '\\1') ) in s

  - name: double backslash
    debug:
      msg: '{{ item }} is in s'
    loop: '{{ t }}'
    when: ( item | regex_replace('^p(\\d+).*$', '\\1') ) in s

(ansible2_15_8) rowagn@localhost:~#> ansible-playbook -i l d.yml

PLAY [all] ******************************************************************************************************************************************************

TASK [single backslash] *****************************************************************************************************************************************
ok: [localhost] => (item=p1_xyz) => {
    "msg": "p1_xyz is in s"
}
ok: [localhost] => (item=p2_xyz) => {
    "msg": "p2_xyz is in s"
}
skipping: [localhost] => (item=p4_xyz)

TASK [double backslash] *****************************************************************************************************************************************
ok: [localhost] => (item=p1_xyz) => {
    "msg": "p1_xyz is in s"
}
ok: [localhost] => (item=p2_xyz) => {
    "msg": "p2_xyz is in s"
}
skipping: [localhost] => (item=p4_xyz)

PLAY RECAP ******************************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0



The tasks are extracting the number from the strings in list t and then looking for that number in string s.  What is strange is the second example at https://docs.ansible.com/ansible/latest/collections/ansible/builtin/regex_replace_filter.html#examples indicates the backslashes in both parameters need to be doubled, but the above testing shows double backslashes are not required in the first parameter (they are required in the second parameter).

Thanks
Rob

Matt Martz

unread,
Jan 8, 2024, 4:52:37 PM1/8/24
to ansible...@googlegroups.com
This is a result of some normalization code in jinja2 that attempts to unescape strings:


That code results in those becoming '^p(\\d+).*$' and '\\1'.

Those 2 when statements, when processed by pyyaml become:

["( item | regex_replace('^p(\\d+).*$', '\\\\1') ) in s",
 "( item | regex_replace('^p(\\\\d+).*$', '\\\\1') ) in s"]

Then if we apply the .encode/.decode:

>>> "( item | regex_replace('^p(\\d+).*$', '\\\\1') ) in s".encode("ascii", "backslashreplace").decode("unicode-escape")

"( item | regex_replace('^p(\\d+).*$', '\\1') ) in s"

>>> "( item | regex_replace('^p(\\\\d+).*$', '\\\\1') ) in s".encode("ascii", "backslashreplace").decode("unicode-escape")

Rob Wagner

unread,
Jan 8, 2024, 6:58:19 PM1/8/24
to ansible...@googlegroups.com
Thanks Matt, but I still don't get why the first parameter (\\d) MAY be double backslashed but the second parameter (\\1) MUST be double backslashed.  However, I'm starting to think it's at the python level.  https://stackoverflow.com/a/33582215 says Python's string parser causes both \d and \\d to become \d.  But why?  A little more searching takes me to https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences, where I think I see why \\1 becomes \1 and \1 becomes a non-printable character (octal 1).  But then, by analogy, \\d should become \d (it does) but why doesn't \d become an error (since it's not listed as a valid escape sequence).

Maybe I'll take this over to the Python list.

--
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/A-QsBqBiWVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAD8N0v-Sf3AmrkEnFZZtxAbJHTUv%3D6gNezDkESTxoatHq-86YA%40mail.gmail.com.

Rowe, Walter P. (Fed)

unread,
Jan 9, 2024, 7:53:40 AM1/9/24
to ansible...@googlegroups.com
The \\1 must be double-backslashed because the backref needs to be backslash-digit (\1). Doubling the backslash escapes the backslash.

Walter
--
Walter Rowe, Division Chief
Infrastructure Services Division
Mobile: 202.355.4123

You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAMc-rNPNqQMKKsqy2gwRBAt%2BEWnn%2B_rEY-CJo7sWJFhGnamLQQ%40mail.gmail.com.

Rowe, Walter P. (Fed)

unread,
Jan 9, 2024, 8:32:05 AM1/9/24
to ansible...@googlegroups.com
regex_replace('^p(\d+).*$', '\\1')

'\\1' in the second argument is a "backref" (backwards reference) to the (\d+) in the first argument. It seems it is looking for an expression with digits and extracting the digits.

Your list 't' has names with p1_xyz, p2_xyz, p4_xyx so this regex would extract the 1, 2, 4 digits from those strings.

Your string 's' has digits 1 and 2. You are getting two lines of output as expected.


Walter
--
Walter Rowe, Division Chief
Infrastructure Services Division
Mobile: 202.355.4123
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

Rob Wagner

unread,
Jan 9, 2024, 9:04:41 AM1/9/24
to ansible...@googlegroups.com
Right, but why doesn’t the \\d need to be double-backslashed?  Backslash-d is regex for matching on a digit.  I just don’t get why doubling the backslash is needed on the 1 but not on the d.

On Jan 9, 2024, at 7:53 AM, 'Rowe, Walter P. (Fed)' via Ansible Project <ansible...@googlegroups.com> wrote:

 The \\1 must be double-backslashed because the backref needs to be backslash-digit (\1). Doubling the backslash escapes the backslash.

Rowe, Walter P. (Fed)

unread,
Jan 9, 2024, 9:19:28 AM1/9/24
to ansible...@googlegroups.com
Perhaps because you have single quotes inside double quotes so everything inside the single quotes is automatically escaped?


Walter
--
Walter Rowe, Division Chief
Infrastructure Services Division
Mobile: 202.355.4123

Rob Wagner

unread,
Jan 9, 2024, 9:37:12 AM1/9/24
to ansible...@googlegroups.com
But the \1 is also inside single and double quotes, so if that were the reason, I wouldn’t have to double backslash the 1

On Jan 9, 2024, at 9:19 AM, 'Rowe, Walter P. (Fed)' via Ansible Project <ansible...@googlegroups.com> wrote:

 Perhaps because you have single quotes inside double quotes so everything inside the single quotes is automatically escaped?

Matt Martz

unread,
Jan 11, 2024, 6:19:27 PM1/11/24
to ansible...@googlegroups.com
Part of the problem is also knowing what characters are escape sequences in python.

\1 is an escape sequence, equivalent to `\x01`, and not equivalent to the literal `\1`.  As such a literal `\1` needs to be represented in python as `\\1`. \d is not an escape sequence and thus can be written as a literal `\d` without escaping the `\`

There is also a difference with quoting in YAML as mentioned above, between single quotes and double quotes.  But note that the behavior of YAML with quotes only applies to quotes that surround the entire YAML value.  So the single quotes you have in the middle of your string do not affect the YAML quoting differences.  When not using quotes surrounding the full value in YAML, you are using "Plain Style" which has different rules than both single and double quoted values.

YAML single quotes are basically equivalent to python raw strings, where a backslash is always treated as literal. Double quotes require escaping backslashes.  You can read more about the flow scalar styles of YAML at https://yaml.org/spec/1.2.2/#73-flow-scalar-styles



--
Matt Martz
@sivel
sivel.net

Rob Wagner

unread,
Jan 17, 2024, 1:44:03 PM1/17/24
to ansible...@googlegroups.com
Thanks everyone.  I'm going to chalk this up to a Python anomaly.  IMO, since \d is not a valid escape sequence, Python should raise an error rather than transparently converting it into \\d.

Reply all
Reply to author
Forward
0 new messages