python output [100, 200] overlaps with [150, 250] [250, 500] overlaps with [300, 400]
| 0 | 1 | overlaps_ix | |
|---|---|---|---|
| 0 | 100 | 200 | [?,?] |
| 1 | 150 | 250 | [?,?] |
| 2 | 300 | 400 | [?,?] |
| 3 | 250 | 500 | [?,?] |
This doesn't answer your question, but in case the reason you're looking at pandas is speed... a much, much more efficient algorithm (O(n log n) instead of O(n**2)) would be something like:
events = []
for (start, end) in intervals:
event = (start, end)
events.append((start, "start", event))
events.append((end - 0.5, "end", event))
events.sort()
current = set()
overlaps = []
for (_, edge, event) in events:
if edge == "start":
for other in current:
overlaps.append([other, event])
current.add(event)
else:
assert edge == "end"
current.remove(event)
(This assumes half-open intervals like python range. NB non-tested code typed on a phone, use for entertainment purposes only.)
-n
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.