E.g. a trivial example (my lists will be larger),
a=[1, 2, 3]
b=[2, 3, 1]
should yield true if a==b
I suppose I could sort them and then compare them. I.e.,
sorted(a)==sorted(b)
I am wondering if there is a more efficient/preferred way to do so.
Thanks,
Esmail
>>> a=[1, 2, 3]
>>> b=[2, 3, 1]
>>> set(a) == set(b)
True
Cheers,
Hans
a=[1, 2, 3]
b=[2, 3, 1]
sorted(a)==sorted(b)
Thanks,
Esmail
set(a) == set(b) # test if a and b have the same elements
# check that each list has the same number of each element
# i.e. [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2]
for elem in set(a):
a.count(elem) == b.count(elem)
oh, I forgot to mention that each list may contain duplicates. So I
suppose this means I would have to sort the two lists and compare them
afterall, unless there's another way?
Ah .. this part would take care of different number of duplicates
in the lists. Cool.
thanks,
Esmail
http://code.activestate.com/recipes/576611/
on both lists and compare them for equality.
Yes, I thought about that, but I wasn't sure how to potentially
deal with different number of duplicates in the lists. The post
by David seems to address that issue nicely.
> though I'm not sure if there are performance
> inplications with large amopunts of data.
Yes, I wonder about these sort of things too, not knowing too
much about how Python does things internally.
Thanks for the suggestion,
Esmail
"Best" can mean different things. Fastest? Shortest code? Most
readable?
> David Robinow wrote:
> > set(a) == set(b) # test if a and b have the same elements
>
> > # check that each list has the same number of each element
> > # i.e. [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2]
> > for elem in set(a):
> > a.count(elem) == b.count(elem)
>
> Ah .. this part would take care of different number of duplicates
> in the lists. Cool.
It takes care of the duplicates, but so does your initial solution,
which I like best:
> sorted(a)==sorted(b)
This is concise, clear, and in my opinion, the most Pythonic. It may
well even be the fastest. (If you didn't have to match up the numbers
of duplicates, the set solution would be most Pythonic and probably
fastest.)
John
Comparing the sorted lists is a possible O(n ln n) solution:
a.sort()
b.sort()
a == b
Another solution is to use frequency dicts, O(n):
from itertools import defaultdict
d1 = defaultdict(int)
for el in a:
d1[el] += 1
d2 = defaultdict(int)
for el in b:
d2[el] += 1
d1 == d2
As the arrays (Python lists) become large the second solution may
become faster.
Bye,
bearophile
Thanks to the power of negative numbers, you only need one dict:
d = defaultdict(int)
for x in a:
d[x] += 1
for x in b:
d[x] -= 1
# a and b are equal if d[x]==0 for all x in d:
not any(d.itervalues())
--
Arnaud
>JY> It takes care of the duplicates, but so does your initial solution,
>JY> which I like best:
>>> sorted(a)==sorted(b)
>JY> This is concise, clear, and in my opinion, the most Pythonic. It may
>JY> well even be the fastest. (If you didn't have to match up the numbers
>JY> of duplicates, the set solution would be most Pythonic and probably
>JY> fastest.)
But it may fail if the list cannot be sorted because it contains
incomparable elements:
>>> a = [1, 2, 3+4j]
>>> b = [2, 3+4j, 1]
>>> set(a) == set(b)
True
>>> sorted(a)==sorted(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers
In Python 3 there are even more incomparable objects pairs.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org
>> set(a) == set(b) # test if a and b have the same elements
>>
>> # check that each list has the same number of each element # i.e.
>> [1,2,1,2] == [1,1,2,2], but [1,2,2,2] != [1,1,1,2] for elem in set(a):
>> a.count(elem) == b.count(elem)
>
> Ah .. this part would take care of different number of duplicates in the
> lists. Cool.
At significant cost of extra work.
Counting the number of times a single element occurs in the list is O(N).
Counting the number of times every element occurs in the list is O(N**2).
Sorting is O(N*log N), so for large lists, sorting will probably be much
cheaper.
--
Steven
Ah .. good point .. something I had not considered. Lucky for me in
this case it's not an issue, but something I'll keep in mind for the
future.
thanks for the pointer, I'll study the code provided.
Esmail
sorted(a)==sorted(b)
it seems fast, intuitive and clean and can deal with
duplicates too.
Best,
Esmail
Great .. I can live with that :-)
Very nice, I'll keep this for future use.
Someday I'll have to study this new new kind of numbers that can
represent borrowed items too.
Bye,
bearophile
A frequency dict should be O(n) also, and hence faster than sorting.
->If sorted(listA) == sorted(listB):
-> etc....
Is the preferred.
While there are other ways, the ones I know of are much more computer
intensive.
Things like:
get (next) line from file1
if checking it against each line of file2 yields a found
loop
else
...
are a real I/O killers.
Indexes are good but introduce a separate maintenance issue.
By the way - does original order of original files count?
If not, sorting AND KEEPING can speed up future things.
Of course each method has its time and place of use and Python has some
well oiled list search and list maintain routines of its own. List
comparisons are most accurate when using presorted ones. (Some things
don't lend themselves to sorting very well. Like paragraphs, books,
chapters, computer programs, manuals, etc... These need the searchers
(equivalent to the Unix diff) for checking equivalence.)
HTH
Steve
> Technically, == is reserved for identical, as in byte for byte same
Really? Then how do you explain these?
>>> u'abc' == 'abc'
True
>>> 1 == 1.0
True
>>> 2L == 2
True
>>>
>>> import decimal
>>> decimal.Decimal('42') == 42
True
Here's one to think about:
>>> d1 = {-1: None, -2: None}
>>> d2 = {-2: None, -1: None}
>>> print d1, d2
{-2: None, -1: None} {-1: None, -2: None}
>>> d1 == d2
True
If d1 and d2 are equal and both are dictionaries with the same keys and
same values, why do they print in different orders?
--
Steven
...one to think about:
Either someone has a programming glitch or else likes stack use.
LoadA push, read-LILO, readB, compare, loop to read-LILO.
And they decided to just pre-load as a matter of course, but maybe
forgot to read-LILO in print... ie.. didn't finish cooking print.
(or left it out as a reminder or....)
Technically, == is reserved for identical, as in byte for byte same
(when == is used to check identical, uncooked)
As a matter of self - I don't favor cooking. I prefer 'cooked' things
have a different name so I can get what I want, when I want.
Choice on wall:
fish, deep fried in light beer batter
sashimi
in case you didn't notice - cooking takes longer :)
I can get the sashimi and take it home and BBQ it, I can roast it, I can
steam it, I can wok it,..., but the other is what it is. (greasy)
Besides, who says I like your cooking? :)
Steve
Err... http://mail.python.org/pipermail/python-list/2008-September/679360.html
Thanks Steve for bringing up various items to consider for these sort
of tasks, very helpful indeed.
Best,
Esmail