The second one's complexity is O(N), while the first one's is O(N *
log N).
Still, the second one works much better, because C code is being used
instead of pythons.
Still, being a programmer, using the first way (a.insert(x);
a.sort()), does not feel right.
What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?
Correct me if I am wrong here but isn't the second one is O(log N)?
Binary search?
That is when you have an already sorted list from somewhere and you
are inserting just one new value.
In case you are building the whole list yourself it's the same (N * log N)
>
> Still, the second one works much better, because C code is being used
> instead of pythons.
>
> Still, being a programmer, using the first way (a.insert(x);
> a.sort()), does not feel right.
>
> What has the community to say about this ? What is the best (fastest)
> way to insert sorted in a list ?
> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
Regards
Shashank Singh
http://www.cse.iitb.ac.in/~shashanksingh
Check out the bisect module.
~Ethan~
Finding the position to insert is O(log n), but the actual insert is
O(n). This is because Python lists are implemented with arrays and
everything after the inserted item has to be moved in memory to make
space for the insert.
This is not quite right; see below.
>Still, the second one works much better, because C code is being used
>instead of pythons.
>
>Still, being a programmer, using the first way (a.insert(x);
>a.sort()), does not feel right.
>
>What has the community to say about this ? What is the best (fastest)
>way to insert sorted in a list ?
In this case, the "best" way is most likely "don't do that at all".
First, we should note that a python list() data structure is actually
an array. Thus, you can locate the correct insertion point pretty
fast, by using a binary or (better but not as generally applicable)
interpolative search to find the proper insertion point.
Having found that point, though, there is still the expense of
the insertion, which requires making some room in the array-that-
makes-the-list (I will use the name "a" as you did above):
position = locate_place_for_insert(a, the_item)
# The above is O(log n) for binary search,
# O(log log n) for interpolative search, where
# n is len(a).
a.insert(position, the_item)
# This is still O(log n), alas.
Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with:
# wrong way:
# for item in large_list:
# a.append(item)
# right way, but fundamentally still the same cost (constant
# factor is much smaller due to built-in append())
a.append(large_list)
If len(large_list) is m, this is O(m). Inserting each item in
the "right place" would be O(m log (n + m)). But we still
have to sort:
a.sort()
This is O(log (n + m)), hence likely better than repeatedly inserting
in the correct place.
Depending on your data and other needs, though, it might be best
to use a red-black tree, an AVL tree, or a skip list. You might
also investigate radix sort, radix trees, and ternary search trees
(again depending on your data).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
~Ethan~
Surely you mean O((n + m) log (n + m)).
In article <mailman.96.13083486...@python.org>
Ethan Furman <et...@stoneleaf.us> wrote:
>> a.append(large_list)
> ^- should be a.extend(large_list)
Er, right. Posted in haste (had to get out the door). I also
wrote:
>> If len(large_list) is m, this is O(m). Inserting each item in
>> the "right place" would be O(m log (n + m)). But we still
>> have to sort:
>>
>> a.sort()
In article <mailman.98.13083536...@python.org>,
Ian Kelly <ian.g...@gmail.com> wrote:
>> This is O(log (n + m)), hence likely better than repeatedly inserting
>> in the correct place.
>Surely you mean O((n + m) log (n + m)).
Er, "maybe"? (It depends on the relative values of m and n, and
the underlying sort algorithm to some extent. Some algorithms are
better at inserting a relatively small number of items into a
mostly-sorted large list. As I recall, Shell sort does well with
this.) But generally, yes. See "posted in haste" above. :-)
There are a lot of other options, such as sorting just the list of
"items to be inserted", which lets you do a single merge pass:
# UNTESTED
def merge_sorted(it1, it2, must_copy = True):
"""
Merge two sorted lists/iterators it1 and it2.
Roughly equivalent to sorted(list(it2) + list(it2)),
except for attempts to be space-efficient.
You can provide must_copy = False if the two iterators
are already lists and can be destroyed for the purpose
of creating the result.
"""
# If it1 and it1 are deque objects, we don't need to
# reverse them, as popping from the front is efficient.
# If they are plain lists, popping from the end is
# required. If they are iterators or tuples we need
# to make a list version anyway. So:
if must_copy:
it1 = list(it1)
it2 = list(it2)
# Reverse sorted lists (it1 and it2 are definitely
# lists now) so that we can pop off the end.
it1.reverse()
it2.reverse()
# Now accumulate final sorted list. Basically, this is:
# take first (now last) item from each list, and put whichever
# one is smaller into the result. When either list runs
# out, tack on the entire remaining list (whichever one is
# non-empty -- if both are empty, the two extend ops are
# no-ops, so we can just add both lists).
#
# Note that we have to re-reverse them to get
# them back into forward order before extending.
result = []
while it1 and it2:
# Note: I don't know if it might be faster
# to .pop() each item and .append() the one we
# did not want to pop after all. This is just
# an example, after all.
last1 = it1[-1]
last2 = it2[-1]
if last2 < last1:
result.append(last2)
it2.pop()
else:
result.append(last1)
it1.pop()
it1.reverse()
it2.reverse()
result.extend(it1)
result.extend(it2)
return result
So, now if "a" is the original (sorted) list and "b" is the not-yet-
sorted list of things to add:
a = merge_sorted(a, sorted(b), must_copy = False)
will work, provided you are not required to do the merge "in place".
Use the usual slicing trick if that is necessary:
a[:] = merge_sorted(a, sorted(b), must_copy = False)
If list b is already sorted, leave out the sorted() step. If list
b is not sorted and is particularly long, use b.sort() to sort in
place, rather than making a sorted copy.
> What has the community to say about this ? What is the best (fastest)
> way to insert sorted in a list ?
if you're doing repeated insertions into an already sorted list, there's
no question that the bisect module is the right way to do it.
(Unless you have a very old version of Python, version 2.3 or older.)
>>> from timeit import Timer
>>> setup = """
... L = list(range(10000)) + list(range(10100, 30000))
... from bisect import insort
... def sort_insert(a, x):
... a.append(x)
... a.sort()
... """
>>> t_bisect = Timer("for i in range(10000, 10100): insort(L, i)", setup)
>>> t_sort = Timer("for i in range(10000, 10100): sort_insert(L, i)", setup)
>>>
>>> t_sort.timeit(number=100)
19.430757999420166
>>> t_bisect.timeit(number=100)
0.3741610050201416
(For those unfamiliar with timeit, small numbers are faster.)
--
Steven
> There are basically two ways to go about this.
[...]
> What has the community to say about this ? What is the best (fastest)
> way to insert sorted in a list ?
a third way maybe using a SkipList instead of a list
on http://infohost.nmt.edu/tcc/help/lang/python/examples/pyskip/ you can
find a pure python implementation (but afaik there are many others)
--
ZeD