How to extract a value of a field within a list of deeply nested dictionaries when some keys match other values

73 views
Skip to first unread message

jean-christophe manciot

unread,
Oct 15, 2020, 12:32:04 PM10/15/20
to Ansible Project
Let's assume the following fictional dictionary with nested lists, dictionaries, and strings, of arbitrary depth.:
domain_definition.yml
```
---
domain_definition:
        - name: server01
          cluster: cluster0
          port: '8070'
        - name: server11
          cluster: cluster1
          port: '8080'
          workstations:
                  - name: debian1
                    revision: '2016-06-21'
                    url: http://debian1.example.com:9911/2016-06-21
                  - name: ubuntu1
                    revision: '2017-08-16'
                    url: http://ubuntu1.example.com:9912/2017-08-16
        - name: server12
          cluster: cluster2
          port: '8090'
          workstations:
                  - name: debian2
                    revision: '2018-04-15'
                    url: http://debian2.example.com:9921/2018-04-15
                  - name: ubuntu2
                    revision: '2020-01-12'
                    url: http://ubuntu2.example.com:9922/2020-01-12
```
The goal is to extract as efficiently as possible (the original file can be quite large) a workstation <url> matching:
  • a workstation os <name> with its <revision>: for instance ubuntu1 <--> 2017-08-16
  • only a workstation os <name>: for instance debian2
As you can see from example above, the <workstations> key is not always set.
A major constraint is to avoid "json_query" as it cannot deal properly with keys  containing a colon (as it may happen in my real context), last time I checked

I unsuccessfully tried different tasks, including the following:
```
- name: Extracting a value of a field deep within a list of complex dictionaries when some keys match other values
  hosts:
        - localhost
  strategy: debug
  tasks:
        - name: Including domain_definition
          include_vars: "domain_definition.yml"

        - name: Extracting the OS URL when <os_revision> is known
          vars:
                - os_name: ubuntu1
                - os_revision: 2017-08-16
          debug:
                msg: 
                        "os_url: {{ domain_definition |
                                    selectattr('workstations') |
                                    list |
                                    selectattr('name', 'eq', os_name) |
                                    list |
                                    selectattr('revision', 'eq', os_revision) |
                                    map(attribute= 'url') |
                                    first }}"
          ignore_errors: yes

        - name: Extracting the OS URL when <os_revision> is known
          vars:
                - os_name: ubuntu1
                - os_revision: 2017-08-16
          debug:
                msg: 
                        "os_url: {{ domain_definition |
                                    selectattr('workstations.name', 'eq', os_name) |
                                    list |
                                    selectattr('workstations.revision', 'eq', os_revision) |
                                    map(attribute= 'workstations.url') |
                                    first }}"
          ignore_errors: yes

        - name: Extracting the OS URL when <os_revision> is unknown
          vars:
                - os_name: debian2
          debug:
                msg: 
                        "os_url: {{ domain_definition |
                                    selectattr('workstations') |
                                    list |
                                    selectattr('name', 'eq', os_name) |
                                    map(attribute= 'url') |
                                    first }}"
          ignore_errors: yes

        - name: Extracting the OS URL when <os_revision> is unknown
          vars:
                - os_name: debian2
          debug:
                msg: 
                        "os_url: {{ domain_definition |
                                    selectattr('workstations.name', 'eq', os_name) |
                                    map(attribute= 'workstations.url') |
                                    first }}"
          ignore_errors: yes
```
I always get the error msg: 'dict object' has no attribute 'workstations'.

Any suggestion?

Vladimir Botka

unread,
Oct 15, 2020, 2:47:44 PM10/15/20
to jean-christophe manciot, ansible...@googlegroups.com
On Thu, 15 Oct 2020 09:32:04 -0700 (PDT)
jean-christophe manciot <actionm...@gmail.com> wrote:

> As you can see from example above, the <workstations> key is not always set.
> A major constraint is to avoid "json_query" as it cannot deal properly with
> keys containing a colon (as it may happen in my real context), last time I
> checked
> I always get the error msg: 'dict object' has no attribute 'workstations'.

Add an empty list if the attribute is missing. For example

- include_vars:
file: domain_definition.yml
name: domain_definition_raw
- set_fact:
domain_definition: "{{ domain_definition|default([]) +
[my_item] }}"
loop: "{{ domain_definition_raw.domain_definition }}"
vars:
empty_workstations:
workstations: []
my_item: "{{ (item.workstations is defined)|
ternary(item, item|
combine(empty_workstations)) }}"
- debug:
var: domain_definition

give

domain_definition:
- cluster: cluster0
name: server01
port: '8070'
workstations: []
- cluster: cluster1
name: server11
port: '8080'
workstations:
- name: debian1
revision: '2016-06-21'
url: http://debian1.example.com:9911/2016-06-21
- name: ubuntu1
revision: '2017-08-16'
url: http://ubuntu1.example.com:9912/2017-08-16
- cluster: cluster2
name: server12
port: '8090'
workstations:
- name: debian2
revision: '2018-04-15'
url: http://debian2.example.com:9921/2018-04-15
- name: ubuntu2
revision: '2020-01-12'
url: http://ubuntu2.example.com:9922/2020-01-12

This should solve the problems.

--
Vladimir Botka

jean-christophe manciot

unread,
Oct 16, 2020, 10:10:24 AM10/16/20
to Ansible Project
Thanks for your answer Vladimir.
Now, there is a 'workstations' attribute all the time.
However, I still cannot get 'os_url'.
The following includes your few lines:
```
- name: Extracting a value of a field deep within a list of complex dictionaries when some keys match other values
  hosts:
        - localhost
  strategy: debug
  tasks:
        - include_vars:
                file: domain_definition.yml
                name: domain_definition_raw
                
        - set_fact:
                domain_definition: "{{ domain_definition|default([]) + [my_item] }}"
          loop: "{{ domain_definition_raw.domain_definition }}"
          vars:
                empty_workstations:
                        workstations: []
                my_item: "{{ (item.workstations is defined)|
                              ternary(item, item|combine(empty_workstations)) }}"

        - name: (1) Extracting the OS URL when <os_revision> is known
          vars:
                - os_name: ubuntu1
                - os_revision: 2017-08-16
          debug:
                msg: 
                        "os_url: {{ domain_definition |
                                    selectattr('workstations') |
                                    list |
                                    selectattr('name', 'eq', os_name) |
                                    list |
                                    selectattr('revision', 'eq', os_revision) |
                                    map(attribute= 'url') |
                                    first }}"
          ignore_errors: yes
```
leads to:
"The task includes an option with an undefined variable. The error was: No first item, sequence was empty..."
If I try to list only the workstations dictionaries with:
```
        - name: (0) Listing all workstations
          debug:
                msg: 
                        "workstations: {{ domain_definition |
                                          selectattr('workstations') |
                                          list }}"
          ignore_errors: yes
```
I get:
ok: [localhost] => {
    "msg": "workstations: [{'name': 'server11', 'cluster': 'cluster1', 'port': '8080', 'workstations': [{'name': 'debian1', 'revision': '2016-06-21', 'url': 'http://debian1.example.com:9911/2016-06-21'}, {'name': 'ubuntu1', 'revision': '2017-08-16', 'url': 'http://ubuntu1.example.com:9912/2017-08-16'}]}, {'name': 'server12', 'cluster': 'cluster2', 'port': '8090', 'workstations': [{'name': 'debian2', 'revision': '2018-04-15', 'url': 'http://debian2.example.com:9921/2018-04-15'}, {'name': 'ubuntu2', 'revision': '2020-01-12', 'url': 'http://ubuntu2.example.com:9922/2020-01-12'}]}]"
}

On top of that, even if I were able to extract only the 'workstations' dictionaries instead of the whole structure, there would also probably be an issue with the empty one(s) since I would not be able to access their non-existent nested attributes (name, version and url).

jean-christophe manciot

unread,
Oct 16, 2020, 10:15:03 AM10/16/20
to Ansible Project
Regarding my last remark, I probably should rather define the empty workstations as:
```
                empty_workstations:
                        workstations:
                          - name:
                            revision:
                            url:
```
What are your thoughts?

Vladimir Botka

unread,
Oct 16, 2020, 10:49:12 AM10/16/20
to jean-christophe manciot, ansible...@googlegroups.com
It's up to you. If you need default values put them into the
empty_workstation, or better in this case, into the
default_workstations. For example

default_workstations:
workstations:
- name: name_default
revision: revision_default
url: url_default

Otherwise, you might want to simply skip the empty lists. For example

- debug:
msg: "{{ item.name }}:
{{ item.workstations|
selectattr('name', 'eq', os_name)|
selectattr('revision', 'eq', os_revision)|
map(attribute= 'url')|
list }}"
loop: "{{ domain_definition }}"
when: item.workstations|length > 0
vars:
os_name: ubuntu1
os_revision: '2017-08-16'

--
Vladimir Botka

jean-christophe manciot

unread,
Oct 16, 2020, 12:13:22 PM10/16/20
to Ansible Project
There are 2 issues:. 
  1. The loop can become costly as the number of entries rise: I have files with hundreds (potentially thousands) of servers/workstations.
  2. it also results in as many answers as the number of entries, although only one is non empty. I don't know how to set a variable with the single right answer.
```
        - debug:
                msg: "{{ item.workstations|
                           selectattr('name', 'eq', os_name)|
                           selectattr('revision', 'eq', os_revision)|
                           map(attribute= 'url')|
                           list }}"
          loop: "{{ domain_definition }}"
          when: item.workstations|length > 0
          vars:
                os_name: ubuntu1
                os_revision: '2017-08-16'
```
leads to:
```
ok: [localhost] => (item={'name': 'server01', 'cluster': 'cluster0', 'port': '8070', 'workstations': [{'name': 'name_default', 'revision': 'revision_default', 'url': 'url_default'}]}) => {
    "msg": []
}
ok: [localhost] => (item={'name': 'server11', 'cluster': 'cluster1', 'port': '8080', 'workstations': [{'name': 'debian1', 'revision': '2016-06-21', 'url': 'http://debian1.example.com:9911/2016-06-21'}, {'name': 'ubuntu1', 'revision': '2017-08-16', 'url': 'http://ubuntu1.example.com:9912/2017-08-16'}]}) => {
    "msg": [
    ]
}
ok: [localhost] => (item={'name': 'server12', 'cluster': 'cluster2', 'port': '8090', 'workstations': [{'name': 'debian2', 'revision': '2018-04-15', 'url': 'http://debian2.example.com:9921/2018-04-15'}, {'name': 'ubuntu2', 'revision': '2020-01-12', 'url': 'http://ubuntu2.example.com:9922/2020-01-12'}]}) => {
    "msg": []
}
```
Isn't there a loopless solution? 

Vladimir Botka

unread,
Oct 16, 2020, 3:32:48 PM10/16/20
to jean-christophe manciot, ansible...@googlegroups.com
On Fri, 16 Oct 2020 09:13:21 -0700 (PDT)
jean-christophe manciot <actionm...@gmail.com> wrote:

> - debug:
> msg: "{{ item.workstations|
> selectattr('name', 'eq', os_name)|
> selectattr('revision', 'eq', os_revision)|
> map(attribute= 'url')|
> list }}"
> loop: "{{ domain_definition }}"
> when: item.workstations|length > 0
> vars:
> os_name: ubuntu1
> os_revision: '2017-08-16'
>
> Isn't there a loopless solution?

Next option is a custom filter. For example,

shell> cat filter_plugins/custom_select_url.py
def custom_select_url(l, os_name, os_revision):
l1 = []
for i in l:
if 'workstations' in i.keys():
for j in i['workstations']:
if (j['name'] == os_name) and \
(j['revision'] == os_revision):
l1.append(j['url'])
return l1

class FilterModule(object):

def filters(self):
return {
'custom_select_url': custom_select_url
}

and the playbook

shell> cat pb.yml
- hosts: localhost
tasks:
- include_vars: domain_definition.yml
- debug:
msg: "{{ domain_definition|
custom_select_url('ubuntu1', '2017-08-16') }}"

give

msg:
- http://ubuntu1.example.com:9912/2017-08-16


--
Vladimir Botka
Reply all
Reply to author
Forward
0 new messages