Comparing huge lists in python

1,467 views
Skip to first unread message

Muhammad Irfan Ali Shah

unread,
Jun 28, 2012, 6:58:42 AM6/28/12
to networkx...@googlegroups.com
I have two very large lists of entries and i want to compare both of them for matches...the format of the list is in the form

list1=[('a.gretton', 'k.m.borgwardt'), ('a.gretton', 'd.zhou'), ('a.gretton', 'o.bousquet'),('a.gretton', 'b.scholkopf'),...,('a.gretton', 'm.j.rasch')]
list2=[('a.gretton', 'h.kriegel'), ('a.gretton', 'b.scholkopf'), ('a.gretton', 'j.weston'),..., ('a.gretton', 'a.j.smola')]

I want to compare both the list for matches...in this case the 4th entry of list1 and the 2nd entry of list2 are a match. I want to the match to be printed out/displayed and in-effect all such matches. Help me out please..

Raf Guns

unread,
Jun 28, 2012, 7:15:54 AM6/28/12
to networkx...@googlegroups.com
Hi,


Not really a networkx question? Anyway... If you only want to know which items are common to both lists (and you pay no special attention to order of elements or to an item appearing more than once in the same list), the fastest way is probably using set intersections:

 >>> list1=[('a.gretton', 'k.m.borgwardt'), ('a.gretton', 'd.zhou'), ('a.gretton', 'o.bousquet'),('a.gretton', 'b.scholkopf'),('a.gretton', 'm.j.rasch')]
>>> list2=[('a.gretton', 'h.kriegel'), ('a.gretton', 'b.scholkopf'), ('a.gretton', 'j.weston'), ('a.gretton','a.j.smola')]
>>> set(list1) & set(list2)
set([('a.gretton', 'b.scholkopf')])

Cheers,

Raf

Daπid

unread,
Jun 28, 2012, 7:26:21 AM6/28/12
to networkx...@googlegroups.com
You can use list comprehension:

[ x==y for x,y in zip(list1,list2)]

This will be a list of True/False if the value is equal or not. If the
lists are really huge, you may want to turn it or the zip into a
generator, to save memory, or unroll it (will be slower, but with
minimum memory consumption).

On Thu, Jun 28, 2012 at 12:58 PM, Muhammad Irfan Ali Shah
<09mscs...@seecs.edu.pk> wrote:
> --
> You received this message because you are subscribed to the Google Groups
> "networkx-discuss" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/networkx-discuss/-/wYOPFDASOkYJ.
> To post to this group, send email to networkx...@googlegroups.com.
> To unsubscribe from this group, send email to
> networkx-discu...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/networkx-discuss?hl=en.

franck kalala

unread,
Jun 28, 2012, 7:30:44 AM6/28/12
to networkx...@googlegroups.com
Not a networkx question.

Look here

http://stackoverflow.com/questions/1388818/how-can-i-compare-two-lists-in-python-and-return-matches

Salut

Franck


De : Muhammad Irfan Ali Shah <09mscs...@seecs.edu.pk>
À : networkx...@googlegroups.com
Envoyé le : Jeudi 28 juin 2012 11h58
Objet : [networkx-discuss] Comparing huge lists in python

Simon Knight

unread,
Jun 28, 2012, 10:28:21 PM6/28/12
to networkx...@googlegroups.com
It's not really a NetworkX question, but the designers of NetworkX
allow us to work with nodes/edges using existing Python approaches,
which is really powerful.

If you want to compare the lists for elements in the same position,
then a generator or functions from itertools would save memory.

If you want to test items in a sequence are in the other sequence,
then use a set. Checking existence in a list means traversing the
whole list, so this can take as long as the list. Using a set is much
faster: set elements are stored by their hash (the same as a
dictionary) making lookups very fast. This presentation explains a bit
more: http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona

Cheers
Simon

Alli Quaknaa

unread,
Jun 29, 2012, 5:49:13 AM6/29/12
to networkx...@googlegroups.com
Also note that David's approach ([ x==y for x,y in zip(list1,list2)])
doesn't work -- that only returns matches on the exact same position
(i.e. list1[i] == list2[i]). The set intersection approach is the best
in this case

al-Quaknaa
Reply all
Reply to author
Forward
0 new messages