Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

best way to compare contents of 2 lists?

10 views
Skip to first unread message

Esmail

unread,
Apr 23, 2009, 9:31:23 PM4/23/09
to pytho...@python.org
What is the best way to compare the *contents* of two different
lists regardless of their respective order? The lists will have
the same number of items, and be of the same type.

E.g. a trivial example (my lists will be larger),

a=[1, 2, 3]

b=[2, 3, 1]

should yield true if a==b

I suppose I could sort them and then compare them. I.e.,

sorted(a)==sorted(b)


I am wondering if there is a more efficient/preferred way to do so.

Thanks,
Esmail

Hans DushanthaKumar

unread,
Apr 23, 2009, 9:37:09 PM4/23/09
to pytho...@python.org
'set' comes to mind, though I'm not sure if there are performance
inplications with large amopunts of data.

>>> a=[1, 2, 3]
>>> b=[2, 3, 1]

>>> set(a) == set(b)
True

Cheers,
Hans

a=[1, 2, 3]

b=[2, 3, 1]

sorted(a)==sorted(b)

Thanks,
Esmail

--
http://mail.python.org/mailman/listinfo/python-list

David Robinow

unread,
Apr 23, 2009, 9:41:17 PM4/23/09
to Esmail, pytho...@python.org
On Thu, Apr 23, 2009 at 9:31 PM, Esmail <ebo...@hotmail.com> wrote:
> What is the best way to compare the *contents* of two different
> lists regardless of their respective order? The lists will have
> the same number of items, and be of the same type.
>
> E.g. a trivial example (my lists will be larger),
>
> a=[1, 2, 3]
>
> b=[2, 3, 1]
>
> should yield true if a==b
>
> I suppose I could sort them and then compare them. I.e.,
>
> sorted(a)==sorted(b)
>
>
> I am wondering if there is a more efficient/preferred way to do so.
>
> Thanks,
> Esmail
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

set(a) == set(b) # test if a and b have the same elements

# check that each list has the same number of each element
# i.e. [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2]
for elem in set(a):
a.count(elem) == b.count(elem)

Esmail

unread,
Apr 23, 2009, 9:40:55 PM4/23/09
to pytho...@python.org
Esmail wrote:
> What is the best way to compare the *contents* of two different
> lists regardless of their respective order? The lists will have
> the same number of items, and be of the same type.
>
> E.g. a trivial example (my lists will be larger),
>
> a=[1, 2, 3]
>
> b=[2, 3, 1]
>
> should yield true if a==b
>
> I suppose I could sort them and then compare them. I.e.,
>
> sorted(a)==sorted(b)
>
>
> I am wondering if there is a more efficient/preferred way to do so.

oh, I forgot to mention that each list may contain duplicates. So I
suppose this means I would have to sort the two lists and compare them
afterall, unless there's another way?


Esmail

unread,
Apr 23, 2009, 9:51:42 PM4/23/09
to pytho...@python.org
David Robinow wrote:

> On Thu, Apr 23, 2009 at 9:31 PM, Esmail <ebo...@hotmail.com> wrote:
>> What is the best way to compare the *contents* of two different
>> lists regardless of their respective order? The lists will have
>> the same number of items, and be of the same type.
>>
>> E.g. a trivial example (my lists will be larger),
>>
>> a=[1, 2, 3]
>>
>> b=[2, 3, 1]
>>
>> should yield true if a==b
>>
>> I suppose I could sort them and then compare them. I.e.,
>>
>> sorted(a)==sorted(b)
>>
>>
>> I am wondering if there is a more efficient/preferred way to do so.
>>
>> Thanks,
>> Esmail
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
> set(a) == set(b) # test if a and b have the same elements
>
> # check that each list has the same number of each element
> # i.e. [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2]
> for elem in set(a):
> a.count(elem) == b.count(elem)

Ah .. this part would take care of different number of duplicates
in the lists. Cool.

thanks,
Esmail

> --
> http://mail.python.org/mailman/listinfo/python-list
>

MRAB

unread,
Apr 23, 2009, 9:52:44 PM4/23/09
to pytho...@python.org
Esmail wrote:

> Esmail wrote:
>> What is the best way to compare the *contents* of two different
>> lists regardless of their respective order? The lists will have
>> the same number of items, and be of the same type.
>>
>> E.g. a trivial example (my lists will be larger),
>>
>> a=[1, 2, 3]
>>
>> b=[2, 3, 1]
>>
>> should yield true if a==b
>>
>> I suppose I could sort them and then compare them. I.e.,
>>
>> sorted(a)==sorted(b)
>>
>>
>> I am wondering if there is a more efficient/preferred way to do so.
>
> oh, I forgot to mention that each list may contain duplicates. So I
> suppose this means I would have to sort the two lists and compare them
> afterall, unless there's another way?
>
You could use Raymond Hettinger's Counter class:

http://code.activestate.com/recipes/576611/

on both lists and compare them for equality.

Esmail

unread,
Apr 23, 2009, 9:54:02 PM4/23/09
to pytho...@python.org
Hans DushanthaKumar wrote:
> 'set' comes to mind,

Yes, I thought about that, but I wasn't sure how to potentially
deal with different number of duplicates in the lists. The post
by David seems to address that issue nicely.

> though I'm not sure if there are performance
> inplications with large amopunts of data.

Yes, I wonder about these sort of things too, not knowing too
much about how Python does things internally.

Thanks for the suggestion,
Esmail

John Yeung

unread,
Apr 23, 2009, 10:09:10 PM4/23/09
to
Esmail <ebo...@hotmail.com> wrote:
> What is the best way to compare the *contents* of two different
> lists regardless of their respective order? The lists will have
> the same number of items, and be of the same type.

"Best" can mean different things. Fastest? Shortest code? Most
readable?

> David Robinow wrote:
> > set(a) == set(b)    # test if a and b have the same elements
>
> > # check that each list has the same number of each element
> > # i.e.    [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2]
> > for elem in set(a):
> >   a.count(elem) == b.count(elem)
>
> Ah .. this part would take care of different number of duplicates
> in the lists. Cool.

It takes care of the duplicates, but so does your initial solution,
which I like best:

> sorted(a)==sorted(b)

This is concise, clear, and in my opinion, the most Pythonic. It may
well even be the fastest. (If you didn't have to match up the numbers
of duplicates, the set solution would be most Pythonic and probably
fastest.)

John

bearoph...@lycos.com

unread,
Apr 24, 2009, 2:12:38 AM4/24/09
to
Esmail:

> oh, I forgot to mention that each list may contain duplicates.

Comparing the sorted lists is a possible O(n ln n) solution:

a.sort()
b.sort()
a == b

Another solution is to use frequency dicts, O(n):

from itertools import defaultdict
d1 = defaultdict(int)
for el in a:
d1[el] += 1
d2 = defaultdict(int)
for el in b:
d2[el] += 1
d1 == d2

As the arrays (Python lists) become large the second solution may
become faster.

Bye,
bearophile

Arnaud Delobelle

unread,
Apr 24, 2009, 3:08:34 AM4/24/09
to
On Apr 24, 7:12 am, bearophileH...@lycos.com wrote:
[...]

> Another solution is to use frequency dicts, O(n):
>
> from itertools import defaultdict
> d1 = defaultdict(int)
> for el in a:
>     d1[el] += 1
> d2 = defaultdict(int)
> for el in b:
>     d2[el] += 1
> d1 == d2

Thanks to the power of negative numbers, you only need one dict:

d = defaultdict(int)
for x in a:
d[x] += 1
for x in b:
d[x] -= 1
# a and b are equal if d[x]==0 for all x in d:
not any(d.itervalues())

--
Arnaud

Piet van Oostrum

unread,
Apr 24, 2009, 3:43:22 AM4/24/09
to
>>>>> John Yeung <gallium....@gmail.com> (JY) wrote:

>JY> It takes care of the duplicates, but so does your initial solution,
>JY> which I like best:

>>> sorted(a)==sorted(b)

>JY> This is concise, clear, and in my opinion, the most Pythonic. It may
>JY> well even be the fastest. (If you didn't have to match up the numbers
>JY> of duplicates, the set solution would be most Pythonic and probably
>JY> fastest.)

But it may fail if the list cannot be sorted because it contains
incomparable elements:

>>> a = [1, 2, 3+4j]
>>> b = [2, 3+4j, 1]
>>> set(a) == set(b)
True
>>> sorted(a)==sorted(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

In Python 3 there are even more incomparable objects pairs.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Steven D'Aprano

unread,
Apr 24, 2009, 4:25:23 AM4/24/09
to
On Thu, 23 Apr 2009 21:51:42 -0400, Esmail wrote:

>> set(a) == set(b) # test if a and b have the same elements
>>
>> # check that each list has the same number of each element # i.e.
>> [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2] for elem in set(a):
>> a.count(elem) == b.count(elem)
>
> Ah .. this part would take care of different number of duplicates in the
> lists. Cool.

At significant cost of extra work.

Counting the number of times a single element occurs in the list is O(N).
Counting the number of times every element occurs in the list is O(N**2).
Sorting is O(N*log N), so for large lists, sorting will probably be much
cheaper.


--
Steven

Esmail

unread,
Apr 24, 2009, 6:44:18 AM4/24/09
to pytho...@python.org
Piet van Oostrum wrote:
>>>>>> John Yeung <gallium....@gmail.com> (JY) wrote:
>
>> JY> It takes care of the duplicates, but so does your initial solution,
>> JY> which I like best:
>
>>>> sorted(a)==sorted(b)
>
>> JY> This is concise, clear, and in my opinion, the most Pythonic. It may
>> JY> well even be the fastest. (If you didn't have to match up the numbers
>> JY> of duplicates, the set solution would be most Pythonic and probably
>> JY> fastest.)
>
> But it may fail if the list cannot be sorted because it contains
> incomparable elements:

Ah .. good point .. something I had not considered. Lucky for me in
this case it's not an issue, but something I'll keep in mind for the
future.

Esmail

unread,
Apr 24, 2009, 6:47:12 AM4/24/09
to pytho...@python.org
MRAB wrote:
>
> You could use Raymond Hettinger's Counter class:
>
> http://code.activestate.com/recipes/576611/
>
> on both lists and compare them for equality.

thanks for the pointer, I'll study the code provided.

Esmail

Esmail

unread,
Apr 24, 2009, 6:48:58 AM4/24/09
to pytho...@python.org
Thanks all, after reading all the posting and suggestions for
alternatives, I think I'll be going with

sorted(a)==sorted(b)

it seems fast, intuitive and clean and can deal with
duplicates too.

Best,
Esmail

Esmail

unread,
Apr 24, 2009, 6:46:10 AM4/24/09
to pytho...@python.org
John Yeung wrote:
> so does your initial solution,
> which I like best:
>
>> sorted(a)==sorted(b)
>
> This is concise, clear, and in my opinion, the most Pythonic. It may
> well even be the fastest.

Great .. I can live with that :-)

bearoph...@lycos.com

unread,
Apr 24, 2009, 7:20:16 AM4/24/09
to Arnaud Delobelle
Arnaud Delobelle:

> Thanks to the power of negative numbers, you only need one dict:
>
> d = defaultdict(int)
> for x in a:
>     d[x] += 1
> for x in b:
>     d[x] -= 1
> # a and b are equal if d[x]==0 for all x in d:
> not any(d.itervalues())

Very nice, I'll keep this for future use.
Someday I'll have to study this new new kind of numbers that can
represent borrowed items too.

Bye,
bearophile

Terry Reedy

unread,
Apr 24, 2009, 11:53:07 AM4/24/09
to pytho...@python.org
Steven D'Aprano wrote:
> On Thu, 23 Apr 2009 21:51:42 -0400, Esmail wrote:
>
>>> set(a) == set(b) # test if a and b have the same elements
>>>
>>> # check that each list has the same number of each element # i.e.
>>> [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2] for elem in set(a):
>>> a.count(elem) == b.count(elem)
>> Ah .. this part would take care of different number of duplicates in the
>> lists. Cool.
>
> At significant cost of extra work.
>
> Counting the number of times a single element occurs in the list is O(N).
> Counting the number of times every element occurs in the list is O(N**2).

A frequency dict should be O(n) also, and hence faster than sorting.

norseman

unread,
Apr 24, 2009, 1:39:39 PM4/24/09
to Esmail, pytho...@python.org
Esmail wrote:
> What is the best way to compare the *contents* of two different
> lists regardless of their respective order? The lists will have
> the same number of items, and be of the same type.
>
> E.g. a trivial example (my lists will be larger),
>
> a=[1, 2, 3]
>
> b=[2, 3, 1]
>
> should yield true if a==b
>
> I suppose I could sort them and then compare them. I.e.,
>
> sorted(a)==sorted(b)
>
>
> I am wondering if there is a more efficient/preferred way to do so.
>
=========================================
Technically, == is reserved for identical, as in byte for byte same

->If sorted(listA) == sorted(listB):
-> etc....
Is the preferred.

While there are other ways, the ones I know of are much more computer
intensive.

Things like:
get (next) line from file1
if checking it against each line of file2 yields a found
loop
else
...
are a real I/O killers.
Indexes are good but introduce a separate maintenance issue.


By the way - does original order of original files count?
If not, sorting AND KEEPING can speed up future things.

Of course each method has its time and place of use and Python has some
well oiled list search and list maintain routines of its own. List
comparisons are most accurate when using presorted ones. (Some things
don't lend themselves to sorting very well. Like paragraphs, books,
chapters, computer programs, manuals, etc... These need the searchers
(equivalent to the Unix diff) for checking equivalence.)


HTH

Steve

Steven D'Aprano

unread,
Apr 24, 2009, 1:52:00 PM4/24/09
to
On Fri, 24 Apr 2009 10:39:39 -0700, norseman wrote:

> Technically, == is reserved for identical, as in byte for byte same

Really? Then how do you explain these?

>>> u'abc' == 'abc'
True
>>> 1 == 1.0
True
>>> 2L == 2
True
>>>
>>> import decimal
>>> decimal.Decimal('42') == 42
True


Here's one to think about:

>>> d1 = {-1: None, -2: None}
>>> d2 = {-2: None, -1: None}
>>> print d1, d2
{-2: None, -1: None} {-1: None, -2: None}
>>> d1 == d2
True

If d1 and d2 are equal and both are dictionaries with the same keys and
same values, why do they print in different orders?

--
Steven

norseman

unread,
Apr 24, 2009, 3:41:13 PM4/24/09
to Steven D'Aprano, pytho...@python.org
=============
(How do I) ...explain these?
Python cooks the test function.
That explains a few things I thought were kinda weird.

...one to think about:
Either someone has a programming glitch or else likes stack use.
LoadA push, read-LILO, readB, compare, loop to read-LILO.
And they decided to just pre-load as a matter of course, but maybe
forgot to read-LILO in print... ie.. didn't finish cooking print.
(or left it out as a reminder or....)

Technically, == is reserved for identical, as in byte for byte same

(when == is used to check identical, uncooked)


As a matter of self - I don't favor cooking. I prefer 'cooked' things
have a different name so I can get what I want, when I want.
Choice on wall:
fish, deep fried in light beer batter
sashimi

in case you didn't notice - cooking takes longer :)

I can get the sashimi and take it home and BBQ it, I can roast it, I can
steam it, I can wok it,..., but the other is what it is. (greasy)

Besides, who says I like your cooking? :)

Steve

ajaksu

unread,
Apr 24, 2009, 6:09:05 PM4/24/09
to
On Apr 24, 4:41 pm, norseman <norse...@hughes.net> wrote:
> (How do I) ...explain these?
[...]

> I can get the sashimi and take it home and BBQ it, I can roast it, I can
> steam it, I can wok it,..., but the other is what it is. (greasy)
>
> Besides, who says I like your cooking?  :)

Err... http://mail.python.org/pipermail/python-list/2008-September/679360.html

Esmail

unread,
Apr 25, 2009, 7:39:09 PM4/25/09
to pytho...@python.org
norseman wrote:
>
> Of course each method has its time and place of use and Python has some
> well oiled list search and list maintain routines of its own. List
> comparisons are most accurate when using presorted ones. (Some things
> don't lend themselves to sorting very well. Like paragraphs, books,
> chapters, computer programs, manuals, etc... These need the searchers
> (equivalent to the Unix diff) for checking equivalence.)
>
>
> HTH
>
> Steve

Thanks Steve for bringing up various items to consider for these sort
of tasks, very helpful indeed.

Best,
Esmail

0 new messages