You only need to check that the incremented tally is 5, which is to say,
that the about-to-be-incremented tally is 4.
t = tallies[ident]
if t < 4: tallies[ident] = t+1
else: return ident
I think you can use the itertools.groupby(L, lambda el: el[1]) to group
elements in your *sorted* list L by the value el[1] (i.e. the
identifier) and then iterate through these groups until you find the
desired number of instances grouped by the same identifier.
Let me exemplify this:
>>> from itertools import groupby
>>> instances = [(1, 'b'), (2, 'b'), (3, 'a'), (4, 'c'), (5, 'c'), (6, 'c'), (7, 'd')]
>>> k = 3
>>> grouped_by_identifier = groupby(instances, lambda el: el[1])
>>> grouped_by_identifier = ((identifier, list(group)) for identifier, group in grouped_by_identifier)
>>> k_instances = (group for identifier, group in grouped_by_identifier if len(group) == k)
>>> next(k_instances)
[(4, 'c'), (5, 'c'), (6, 'c')]
>>> next(k_instances)
Traceback (most recent call last):
File "<input>", line 1, in <module>
StopIteration
There are certainly millions of ways to do this and most of them will be
better than my proposal here, but you might like this approach. Another
approach would use itertools.takewhile() or itertools.ifilter() ... Just
have a look :-)
yours sincerely
--
.''`. Wolodja Wentland <went...@cl.uni-heidelberg.de>
: :' :
`. `'` 4096R/CAF14EFC
`- 081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC
This will generally not return the same result. It depends on whether OP
wants *any* item appearing at least 5 times or whether the order is
significant and the OP literally wants the first. Sorting the entire
list may also take a *lot* longer.
Terry Jan Reedy
Order is preserved by itertools.groupby - Have a look:
>>> instances = [(1, 'b'), (2, 'b'), (3, 'a'), (4, 'c'), (5, 'c'), (6, 'c'), (7, 'b'), (8, 'b')]
>>> grouped_by_identifier = groupby(instances, lambda el: el[1])
>>> grouped_by_identifier = ((identifier, list(group)) for identifier, group in grouped_by_identifier)
>>> k_instances = (group for identifier, group in grouped_by_identifier if len(group) == 2)
>>> for group in k_instances:
... print group
...
[(1, 'b'), (2, 'b')]
[(7, 'b'), (8, 'b')]
So the first element yielded by the k_instances generator will be the
first group of elements from the original list whose identifier appears
exactly k times in a row.
> Sorting the entire list may also take a *lot* longer.
Than what?
Am I missing something? Is the "*sorted*" the culprit? If yes -> Just
forget it as it is not relevant.
Sorting does not.
>
>>>> instances = [(1, 'b'), (2, 'b'), (3, 'a'), (4, 'c'), (5, 'c'), (6, 'c'), (7, 'b'), (8, 'b')]
>>>> grouped_by_identifier = groupby(instances, lambda el: el[1])
>>>> grouped_by_identifier = ((identifier, list(group)) for identifier, group in grouped_by_identifier)
>>>> k_instances = (group for identifier, group in grouped_by_identifier if len(group) == 2)
>>>> for group in k_instances:
> ... print group
> ...
> [(1, 'b'), (2, 'b')]
> [(7, 'b'), (8, 'b')]
>
> So the first element yielded by the k_instances generator will be the
> first group of elements from the original list whose identifier appears
> exactly k times in a row.
>
>> Sorting the entire list may also take a *lot* longer.
> Than what?
Than linearly scanning for the first 5x item, as in my corrected version
of the original code.
Terry Jan Reedy