Exclude 'None' from list comprehension of dicts

Loris Bennett

unread,

Aug 4, 2022, 7:51:43 AM8/4/22

to

Hi,

I am constructing a list of dictionaries via the following list
comprehension:

data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

However,

get_job_efficiency_dict(job_id)

uses 'subprocess.Popen' to run an external program and this can fail.
In this case, the dict should just be omitted from 'data'.

I can have 'get_job_efficiency_dict' return 'None' and then run

filtered_data = list(filter(None, data))

but is there a more elegant way?

Cheers,

Loris

--
This signature is currently under construction.

Loris Bennett

unread,

Aug 4, 2022, 8:58:47 AM8/4/22

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> "Loris Bennett" <loris....@fu-berlin.de> writes:
>>data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

> ...
>>filtered_data = list(filter(None, data))
>
> You could have "get_job_efficiency_dict" return an iterable
> that yields either zero dictionaries or one dictionary.
> For example, a list with either zero entries or one entry.
>
> Then, use "itertools.chain.from_iterable" to merge all those
> lists with empty lists effectively removed. E.g.,
>
> print( list( itertools.chain.from_iterable( [[ 1 ], [], [ 2 ], [ 3 ]])))
>
> will print
>
> [1, 2, 3]

'itertool' is a bit of a blind-spot of mine, so thanks for pointing that
out.

> . Or, consider a boring old "for" loop:
>
> data = []
> for job_id in job_ids:
> dictionary = get_job_efficiency_dict( job_id )
> if dictionary:
> data.append( dictionary )
>
> . It might not be "elegant", but it's quite readable to me.

To me to. However, 'data' can occasionally consist of many 10,000s of
elements. Would there be a potential performance problem here? Even if
there is, it wouldn't be so bad, as the aggregation of the data is not
time-critical and only occurs once a month. Still, I wouldn't want the
program to be unnecessarily inefficient.

Antoon Pardon

unread,

Aug 4, 2022, 2:36:20 PM8/4/22

to

Op 4/08/2022 om 13:51 schreef Loris Bennett:

> Hi,
>
> I am constructing a list of dictionaries via the following list
> comprehension:
>
> data = [get_job_efficiency_dict(job_id) for job_id in job_ids]
>
> However,
>
> get_job_efficiency_dict(job_id)
>
> uses 'subprocess.Popen' to run an external program and this can fail.
> In this case, the dict should just be omitted from 'data'.
>
> I can have 'get_job_efficiency_dict' return 'None' and then run
>
> filtered_data = list(filter(None, data))
>
> but is there a more elegant way?

Just wondering, why don't you return an empty dictionary in case of a failure?
In that case your list will be all dictionaries and empty ones will be processed
fast enough.

--
Antoon Pardon.

MRAB

unread,

Aug 4, 2022, 2:50:46 PM8/4/22

to

On 2022-08-04 12:51, Loris Bennett wrote:
> Hi,
>
> I am constructing a list of dictionaries via the following list
> comprehension:
>
> data = [get_job_efficiency_dict(job_id) for job_id in job_ids]
>
> However,
>
> get_job_efficiency_dict(job_id)
>
> uses 'subprocess.Popen' to run an external program and this can fail.
> In this case, the dict should just be omitted from 'data'.
>
> I can have 'get_job_efficiency_dict' return 'None' and then run
>
> filtered_data = list(filter(None, data))
>
> but is there a more elegant way?
>

I'm not sure how elegant it is, but:

data = [result for job_id in job_ids if (result :=
get_job_efficiency_dict(job_id)) is not None]

Weatherby,Gerard

unread,

Aug 4, 2022, 3:25:42 PM8/4/22

to

Or:

data = [d for d in [get_job_efficiency_dict(job_id) for job_id in job_ids] if d is not None]

or

for job_id in job_ids:
if (d := get_job_efficiency_dict(job_id)) is not None:
data.append(d)

Personally, I’d got with the latter in my own code.

—
Gerard Weatherby | Application Architect NMRbox | NAN | Department of Molecular Biology and Biophysics
UConn Health 263 Farmington Avenue, Farmington, CT 06030-6406 uchc.edu
On Aug 4, 2022, 2:52 PM -0400, MRAB <pyt...@mrabarnett.plus.com>, wrote:
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

On 2022-08-04 12:51, Loris Bennett wrote:

Hi,

I am constructing a list of dictionaries via the following list
comprehension:

data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

However,

get_job_efficiency_dict(job_id)

uses 'subprocess.Popen' to run an external program and this can fail.
In this case, the dict should just be omitted from 'data'.

I can have 'get_job_efficiency_dict' return 'None' and then run

filtered_data = list(filter(None, data))

but is there a more elegant way?

I'm not sure how elegant it is, but:

data = [result for job_id in job_ids if (result :=
get_job_efficiency_dict(job_id)) is not None]

--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!iqxhYMoHcYQY1xohGCpafpBKZIUcGEV6Zj1-RLzOCF61TUXGr-8oh9HLuL-H8w4gxgDCypcOYOYkqNXLJxUIqhWd$

Loris Bennett

unread,

Aug 5, 2022, 1:50:47 AM8/5/22

to

When the list of dictionaries is processed, I would have to check each
element to see if it is empty. That strikes me as being less efficient
than filtering out the empty dictionaries in one go, although obviously
one would need to benchmark that.

avi.e...@gmail.com

unread,

Aug 5, 2022, 4:45:05 PM8/5/22

to

Benchmarking aside, Lori, there are some ideas about such things.

You are describing a case, in abstract terms, where an algorithm grinds away
and produces results that may include an occasional or a common unwanted
result. The question is when to eliminate the unwanted. Do you eliminate
them immediately at the expense of some extra code at that point, or do you
want till much later or even at the end?

The answer is it DEPENDS and let me point out that many problems can start
multi-dimensional (as in processing a 5-D matrix) and produce a linear
output (as in a 1-D list) or it can be the other way around. Sometimes what
you want eliminated is something like duplicates. Is it easier to remove
duplicates as they happen, or later when you have some huge data structure
containing oodles of copies of each duplicate?

You can imagine many scenarios and sometimes you need to also look at costs.
What does it cost to check if a token is valid, as in can the word be found
in a dictionary? Is it cheaper to wait till you have lots of words including
duplicates and do one lookup to find a bad word then mark it so future
occurrences are removed without that kind of lookup? Or is it better to read
I the dictionary once and hash it so later access is easy?

In your case, you have a single simple criterion for recognizing an item to
leave out. So the above may not apply. But I note we often use pre-created
software that simply returns a result and then the only reasonable way to
remove things is after calling it. Empty or unwanted items may take up some
room, though, so a long-running process may be better off pruning as it
goes.

--
https://mail.python.org/mailman/listinfo/python-list

Antoon Pardon

unread,

Aug 15, 2022, 8:56:44 AM8/15/22

to

Op 5/08/2022 om 07:50 schreef Loris Bennett:

I may be missing something but why would you have to check each element
to see if it is empty? What would go wrong if you just treated empty
dictionaries the same as non-empty directories?

--
Antoon Pardon.

dn

unread,

Aug 15, 2022, 6:21:12 PM8/15/22

to

On 16/08/2022 00.56, Antoon Pardon wrote:
> Op 5/08/2022 om 07:50 schreef Loris Bennett:

> I may be missing something but why would you have to check each element
> to see if it is empty? What would go wrong if you just treated empty
> dictionaries the same as non-empty directories?

'Truthiness':-

>>> bool( {} )
False
>>> bool( { "a":1 } )
True

--
Regards,
=dn

Antoon Pardon

unread,

Aug 16, 2022, 3:32:32 PM8/16/22

to

Op 16/08/2022 om 00:20 schreef dn:

> On 16/08/2022 00.56, Antoon Pardon wrote:
>> Op 5/08/2022 om 07:50 schreef Loris Bennett:

>> I may be missing something but why would you have to check each element
>> to see if it is empty? What would go wrong if you just treated empty
>> dictionaries the same as non-empty directories?
> 'Truthiness':-

In what way is that relevant in this case?

--
Antoon Pardon