I have a simple algorithm to identify this changepoint, but it looks
crude: is there a cleaner, more elegant way to do this?
flag = True
i=-1
j=0
while flag and i < len(retHist)-1:
i += 1
if retHist[i] == 0:
j = 0
else:
j += 1
if j == 5:
flag = False
del retHist[:i-4]
Thanks in advance for your help
Thomas Philips
from itertools import islice
def start_good1(alist, good_ones=4):
"""
Maybe more efficient for Python
>>> start_good = start_good1
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
4
>>> start_good([])
-1
>>> start_good([0, 0])
-1
>>> start_good([0, 0, 0])
-1
>>> start_good([0, 0, 0, 0, 1])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4])
4
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5])
4
>>> start_good([1, 2, 3, 4])
0
>>> start_good([1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 0, 4])
-1
"""
for i in xrange(len(alist) - good_ones + 1):
if all(islice(alist, i, i+good_ones)):
return i
return -1
def start_good2(alist, good_ones=4):
"""
Maybe more efficient for Psyco
>>> start_good = start_good2
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
4
>>> start_good([])
-1
>>> start_good([0, 0])
-1
>>> start_good([0, 0, 0])
-1
>>> start_good([0, 0, 0, 0, 1])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4])
4
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5])
4
>>> start_good([1, 2, 3, 4])
0
>>> start_good([1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 0, 4])
-1
"""
n_good = 0
for i, el in enumerate(alist):
if alist[i]:
if n_good == good_ones:
return i - good_ones
else:
n_good += 1
else:
n_good = 0
if n_good == good_ones:
return len(alist) - good_ones
else:
return -1
if __name__ == "__main__":
import doctest
doctest.testmod()
print "Doctests done\n"
Bye,
bearophile
With:
for i in xrange(len(alist)):
because Psyco doesn't digest enumerate well.
Bye,
bearophile
Maybe this will do?
reHist = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
count = 0
for i, d in enumerate(reHist):
if d == 0:
count = 0
else:
count += 1
if count == 5:
break
else:
raise Exception("No data found")
reHist = reHist[i-4:]
print reHist
-Matt
Here's my attempt:
LL = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
i = 0
while (i<len(LL)) and (0 in LL[i:i+5]):
i += 1
print i, LL[i:i+5]
##
## 4 [1, 2, 3, 4, 5]
##
>>> for ii,dummy in enumerate(retHist):
... if 0 not in retHist[ii:ii+5]:
... break
>>> del retHist[:ii]
Well, to the extent short and sweet is elegant...
Emile
With regular expressions:
import re
hist = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
hist_str = ''.join(str(i) for i in hist)
match = re.search(r'[1-9]{5, }', hist_str)
hist = hist[-5:] if match is None else hist[match.start():]
Or slightly more concise:
import re
hist = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
match = re.search(r'[1-9]{5, }', ''.join(str(i) for i in hist))
hist = hist[-5:] if match is None else hist[match.start():]
Tommy McDaniel
This is just what the doctor ordered. Thank you, everyone, for the
help.
Sincerely
Thomas Philips
Matthew Fitzgibbons wrote:
> tkp...@hotmail.com wrote:
>
> reHist = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> count = 0
> for i, d in enumerate(reHist):
> if d == 0:
> count = 0
> else:
> count += 1
> if count == 5:
> break
> else:
> raise Exception("No data found")
> reHist = reHist[i-4:]
> print reHist
This is what I would have suggested, except that the 'if count' test
should be left under the else clause, as in the original, so I consider
it the best of the responses ;-)
I thought of the repeated slicing alternative, but it would be slightly
slower. However, for occasional runs, the difference would be trivial.
Worrying about what Psyco does for this problem is rather premature
optimization.
My quarter's worth....
tjr
Here is one that can go iterator-to-iterator:
def started(source):
src = iter(source)
lead = []
for x in src:
if x:
lead.append(x)
if len(lead) == 5:
return itertools.chain(lead, src)
else:
lead = []
print list(started([0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
--Scott David Daniels
Scott....@Acm.Org
> On Aug 26, 5:49 pm, tkp...@hotmail.com wrote:
>> I have a list that starts with zeros, has sporadic data, and then has
>> good data. I define the point at which the data turns good to be the
>> first index with a non-zero entry that is followed by at least 4
>> consecutive non-zero data items (i.e. a week's worth of non-zero data).
>> For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], I
>> would define the point at which data turns good to be 4 (1 followed by
>> 2, 3, 4, 5).
...
> With regular expressions:
Good grief. If you're suggesting that as a serious proposal, and not just
to prove it can be done, that's surely an example of "when all you have
is a hammer, everything looks like a nail" thinking.
In this particular case, your regex "solution" gives the wrong result,
indicating that you didn't test your code before posting. Hint:
re.search(r'[1-9]{5, }', "123456")
returns None.
The obvious fix for that specific bug is to use r'[1-9]{5,5}', but even
that will fail. Hint: what happens if an item has more than one digit?
Before posting another regex solution, make sure it does the right thing
with this:
[0, 0, 101, 0, 1002, 203, 3050, 4105, 5110, 623, 777]
--
Steven
data = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def itergood(indata):
indata = iter(indata)
buf = []
while len(buf) < 4:
buf.append(indata.next())
if buf[-1] == 0:
buf[:] = []
for x in buf:
yield x
for x in indata:
yield x
for d in itergood(data):
print d
Note that the version above (as well as most others posted) fail for
boundary cases; check out bearophile's doctest to see some of them.
Below are two more versions that pass all the doctests: the first
works only for lists and modifies them in place and the second works
for arbitrary iterables:
def clean_inplace(seq, good_ones=4):
start = 0
n = len(seq)
while start < n:
try: end = seq.index(0, start)
except ValueError: end = n
if end-start >= good_ones:
break
start = end+1
del seq[:start]
def clean_iter(iterable, good_ones=4):
from itertools import chain, islice, takewhile, dropwhile
iterator = iter(iterable)
is_zero = float(0).__eq__
while True:
# consume all zeros up to the next non-zero
iterator = dropwhile(is_zero, iterator)
# take up to `good_ones` non-zeros
good = list(islice(takewhile(bool,iterator), good_ones))
if not good: # iterator exhausted
return iterator
if len(good) == good_ones:
# found `good_ones` consecutive non-zeros;
# chain them to the rest items and return them
return chain(good, iterator)
HTH,
George
This seems the most efficient so far for arbitrary iterables. With a
few micro-optimizations it becomes:
from itertools import chain
def itergood(indata, good_ones=4):
indata = iter(indata); get_next = indata.next
buf = []; append = buf.append
while len(buf) < good_ones:
next = get_next()
if next: append(next)
else: del buf[:]
return chain(buf, indata)
$ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
itergood import itergood" "list(itergood(x))"
100 loops, best of 3: 3.09 msec per loop
And with Psyco enabled:
$ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
itergood import itergood" "list(itergood(x))"
1000 loops, best of 3: 466 usec per loop
George
This one probably scores well with Psyco ;-)
def start_good3(seq, good_ones=4):
"""
>>> start_good = start_good3
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
4
>>> start_good([])
-1
>>> start_good([0, 0])
-1
>>> start_good([0, 0, 0])
-1
>>> start_good([0, 0, 0, 0, 1])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4])
4
>>> start_good([0, 0, 1, 0, 1, 2, 3, 4, 5])
4
>>> start_good([1, 2, 3, 4])
0
>>> start_good([1, 2, 3])
-1
>>> start_good([0, 0, 1, 0, 1, 2, 0, 4])
-1
"""
n_good = 0
pos = 0
for el in seq:
if el:
if n_good == good_ones:
return pos - good_ones
else:
n_good += 1
elif n_good:
n_good = 0
pos += 1
if n_good == good_ones:
return pos - good_ones
else:
return -1
Bye,
bearophile
There, that's the regular machine for it. Too much thinking in
objects, and you can't even write a linked list anymore, right?
I think if you update this so that it returns the "good" iterable
instead of the starting index, it is equivalent to Gerard's solution.
George
And you're still wondering why do people killfile you or think you're
a failed AI project...
Just jumping on the bandwagon, George. And you see, everyone else's
passed the doctests perfectly. Were all the running times O( n* k )?
I always forget the 'del slice' method for clearing a list, thanks.
I think that returning a `chain` means that the function is not itself a
generator. And so if the indata has length less than or equal
to the threshold (good_ones), an unhandled StopIteration is raised
before the return statement is reached.
G.
You gave me an idea-- maybe an arbitrary 'lookahead' iterable could be
useful. I haven't seen them that much on the newsgroup, but more than
once. IOW a buffered consumer. Something that you could check a
fixed number of next elements of. You might implement it as a
iterator with a __getitem__ method.
Example, unproduced:
>>> import itertools
>>> a= itertools.count( )
>>> a.next()
0
>>> a.next()
1
>>> a[ 3 ]
5
>>> a.next()
2
>>> a[ 3 ]
6
Does this make sense at all?
Hey, it's clearer than a lot of the other proposals here. Too bad it
doesn't work. This is why you don't post after 8 p.m. after being at
work all day. I was seeing what I now recall as incorrect answers, but
at the time I was in the midst of a brainfart and for some reason took
them to be right. It can be made to work by removing the space in
"{5, }", inserting some kind of marker between the numbers, and using
the right regular expression to recognize nonzero numbers between the
markers, but I think I've already said too much in this thread.
Tommy McDaniel
Maybe I'm stumbling into a "REs are evil" flamewar here. Anyway:
He has a point though: this *can* be seen as a regex problem. Only a
solution which builds a string first is only good for laughs or
(possibly) quick hacks. What's missing is an RE library for lists of
objects, rather than just strings and Unicode strings.
Not sure such a library would be worth implementing -- problems like
this one are rare, I think.
/Jorgen
--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
Every now and then, you see a proposal or a package for a finite state
machine--- how would you encode comparing of values into a string, if
you're not comparing a string?