Filtering a complex list of dictionaries into another list of dictionaries in an efficient way

734 views
Skip to first unread message

jean-christophe manciot

unread,
Jan 22, 2020, 1:32:56 PM1/22/20
to Ansible Project
ansible 2.9.4

Let's assume the following list of dictionaries:

list:
   
-   key1:
       
- 'abc'
       
- 'def'
        key2
: 'ghi'
        key3
: 'jkl'
   
-   key1:
       
- 'mno'
        - 'pqr'
        key3
: 'stu'
        key4
: 'dfg'
    
-   key1:
       
- 'vwx'
       
- 'yza'
        key3
: 'okl'
        key4
: 'azel'



The goal is threefold:
  • to extract only the records which match a regex criteria over one of the keys, for instance if "key3" contains a 'k'.
  • to keep only a selection of keys, for instance only "key1", "key3" and "key4", which may not always be present
  • to be as efficient as possible with thousands of records
In this case, we expect the resulting list:
result_list:
   
-   key1:
       
- 'abc'
       
- 'def'
        key3: 'jkl'
   
-   key1:
        
- 'vwx'
       
- 'yza'
        key3
: 'okl'
        key4
: 'azel'


The difficulty I'm experiencing concerns "key1" as a list. 

My solution works without considering "key1":

- name: Filtering a complex list of dictionaries into another list of dictionaries
  vars
:
       
"list": [
           
{
               
"key1": [
                   
"abc",
                   
"def"
               
],
               
"key2": "ghi",
               
"key3": "jkl"
           
},
           
{
               
"key1": [
                   
"mno",
                   
"pqr"
               
],
               
"key3": "stu",
               
"key4": "dfg"
           
},
           
{
               
"key1": [
                   
"vwx",
                   
"yza"
               
],
               
"key3": "okl",
               
"key4": "azel"
           
}
       
]
  set_fact
:
        result_list
: "{{ result_list|default([]) + [ {'key3': item.key3, 'key4': item.key4|default('')} ] }}"
  loop
: "{{ list }}"
 
when: item.key3 | regex_search('(^.*k.*$)')


leads to:

TASK [yang : Filtering a complex list of dictionaries into another list of dictionaries] *********************************************************************
task path
: test.yml:133
<172.16.136.116> attempting to start connection
<172.16.136.116> using connection plugin network_cli
<172.16.136.116> found existing local domain socket, using it!
<172.16.136.116> updating play_context for connection
<172.16.136.116>
<172.16.136.116> local domain socket path is .ansible/pc/521e859c25
ok
: [TEST] => (item={'key1': ['abc', 'def'], 'key2': 'ghi', 'key3': 'jkl'}) => {
   
"ansible_facts": {
       
"result_list": [
           
{
               
"key3": "jkl",
               
"key4": ""
           
}
       
]
   
},
   
"ansible_loop_var": "item",
   
"changed": false,
   
"item": {
       
"key1": [
           
"abc",
           
"def"
       
],
       
"key2": "ghi",
       
"key3": "jkl"
   
}
}
skipping
: [TEST] => (item={'key1': ['mno', 'pqr'], 'key3': 'stu', 'key4': 'dfg'})  => {
   
"ansible_loop_var": "item",
   
"changed": false,
   
"item": {
       
"key1": [
           
"mno",
           
"pqr"
       
],
       
"key3": "stu",
       
"key4": "dfg"
   
},
   
"skip_reason": "Conditional result was False"
}
ok
: [TEST] => (item={'key1': ['vwx', 'yza'], 'key3': 'okl', 'key4': 'azel'}) => {
   
"ansible_facts": {
       
"result_list": [
           
{
               
"key3": "jkl",
               
"key4": ""
           
},
           
{
               
"key3": "okl",
               
"key4": "azel"
           
}
       
]
   
},
   
"ansible_loop_var": "item",
   
"changed": false,
   
"item": {
       
"key1": [
           
"vwx",
           
"yza"
       
],
       
"key3": "okl",
       
"key4": "azel"
   
}
}


How can we insert "key1" in the picture?

Also, when the list contains thousands of records, it may be less compute intensive to use ```json_query```, but I don't know how to use it in this context.

Vladimir Botka

unread,
Jan 22, 2020, 3:49:58 PM1/22/20
to jean-christophe manciot, ansible...@googlegroups.com
On Wed, 22 Jan 2020 10:32:56 -0800 (PST)
jean-christophe manciot <actionm...@gmail.com> wrote:

> list:
> - key1:
> - 'abc'
> - 'def'
> key2: 'ghi'
> key3: 'jkl'
> - key1:
> - 'mno'
> - 'pqr'
> key3: 'stu'
> key4: 'dfg'
> - key1:
> - 'vwx'
> - 'yza'
> key3: 'okl'
> key4: 'azel'
> [...]
> - extract records if "key3" contains a 'k'
> - keep only keys "key1", "key3" and "key4"
> - to be as efficient as possible with thousands of records
>
> In this case, we expect the resulting list:
> result_list:
> - key1:
> - 'abc'
> - 'def'
> key3: 'jkl'
> - key1:
> - 'vwx'
> - 'yza'
> key3: 'okl'
> key4: 'azel'

The task below does the job

- set_fact:
result_list: "{{ result_list|
default([]) + [
dict(keys|
zip(keys|
map('extract', item)|
list))] }}"
vars:
keys: "{{ ['key1', 'key3', 'key4']|
intersect(item.keys()|list) }}"
loop: "{{ list|
selectattr('key3', 'regex', '^.*k.*$')|
list }}"

json_query is not much of use here, I think. If you have any large sets'
benchmarks I'll be interested to learn. Thank you.

HTH,

-vlado

Vladimir Botka

unread,
Jan 22, 2020, 5:25:56 PM1/22/20
to jean-christophe manciot, ansible...@googlegroups.com
On Wed, 22 Jan 2020 21:49:43 +0100
Vladimir Botka <vbo...@gmail.com> wrote:

> The task below does the job
>
> - set_fact:
> result_list: "{{ result_list|
> default([]) + [
> dict(keys|
> zip(keys|
> map('extract', item)|
> list))] }}"
> vars:
> keys: "{{ ['key1', 'key3', 'key4']|
> intersect(item.keys()|list) }}"
> loop: "{{ list|
> selectattr('key3', 'regex', '^.*k.*$')|
> list }}"

Custom filter shall improve the efficiency. For example a filter to select a
list of keys from a dictionary

$ cat filter_plugins/dict_utils.py
def dict_select_list(d, l):
d2 = {}
for k in l:
d2[k] = d[k]
return d2

class FilterModule(object):

def filters(self):
return {
'dict_select_list' : dict_select_list
}

The task below gives the same result

- set_fact:
result_list: "{{ result_list|
default([]) + [
item|dict_select_list(keys)] }}"
vars:
keys: "{{ ['key1', 'key3', 'key4']|
intersect(item.keys()|list) }}"
loop: "{{ list|
selectattr('key3', 'regex', '^.*k.*$')|
list }}"

HTH,

-vlado

jean-christophe manciot

unread,
Jan 23, 2020, 1:54:53 PM1/23/20
to Ansible Project
fantastic! :-)

My real use case is little more complex:
  1. there are other attributes like "key1" or similar to "list" (within top-level "list") which must be taken into account
  2. the regex filter on "key3" is a little more complex (a list of logical OR)
I tried to to implement the 2 points by expanding your solution, and it works beautifully, despite the fact that some new keys are themselves list of dictionaries, instead of simple lists or strings.
I used something like:
...
      vars
:
        keys
: "{{ ['key1', 'key3', 'key4', 'key5', 'key6']|
                  intersect(item.keys()|list) }}"


      loop
: "{{ list|
                selectattr('key3', 'regex', 'regex1|regex2|regex3')|
                list }}"




Finally, is there an online documentation that you would recommend to learn all the necessary tools to be able to perform such great filters (the ansible documentation is very sparse on that subject)?
For instance, if I need to add another constraint like another attribute must match another regex alongside "key3" (as an logical AND), I have no clue.

Vladimir Botka

unread,
Jan 23, 2020, 2:42:50 PM1/23/20
to jean-christophe manciot, ansible...@googlegroups.com
On Thu, 23 Jan 2020 10:54:52 -0800 (PST)
jean-christophe manciot <actionm...@gmail.com> wrote:

> loop: "{{ list|
> selectattr('key3', 'regex', 'regex1|regex2|regex3')|
> list }}"
> [...]
> For instance, if I need to add another constraint like another attribute
> must match another regex alongside "key3" (as an logical AND), I have no
> clue.

It's possible to extend the pipe. For example

loop: "{{ list|
selectattr('key3', 'regex', 'regex1|regex2|regex3')|
selectattr('key4', 'defined')|
list }}"

HTH,

-vlado
Reply all
Reply to author
Forward
0 new messages