Ordering.sortedCopy() & immutableSortedCopy()

466 views
Skip to first unread message

eric.giese

unread,
Jun 4, 2012, 8:50:18 AM6/4/12
to guava-discuss
Hello, just a question out of curiosity: Why does Ordering first
create an Arraylist and then give this to Collections.sort?

@Override
public <E extends T> List<E> sortedCopy(Iterable<E> iterable) {
List<E> list = Lists.newArrayList(iterable);
Collections.sort(list, this);
return list;
}

@Override
public <E extends T> ImmutableList<E> immutableSortedCopy(Iterable<E>
iterable) {
return ImmutableList.copyOf(sortedCopy(iterable));
}



As far as I now, Collections.sort(List list) has to be a rather
inefficient implementation since it must at first create an Array out
of the list, sort it and then has to reinserts the Array back again in
the List.

Shouldn't it be more efficient to create and sort an object array
directly and then create the datastructures right out of the array?
For example like this:

/* Creates an Object array but cast it to E[]. Safe only for internal
use with object-based collections */
private <E extends T> Object[] sortedArray(Iterable<E> iterable,
Class<) {
@SuppressWarnings("unchecked")
val arr = (E[]) Iterables.toArray(iterable, Object.class);
Arrays.sort(arr, this);
return arr;
}

@Override
public <E extends T> List<E> sortedCopy(Iterable<E> iterable) {
return Lists.newArrayList(sortedArray(iterable));
}

@Override
public <E extends T> ImmutableList<E> immutableSortedCopy(Iterable<E>
iterable) {
return ImmutableList.copyOf(sortedArray(iterable));
}

Louis Wasserman

unread,
Jun 4, 2012, 9:31:30 AM6/4/12
to eric.giese, guava-discuss
The overhead is linear, and dominated by the expense of sorting in the first place, no?

In any event, if we're doing those kinds of optimizations, there are still better optimizations to make; I'm working on some.  (In particular, why copy the array a second time when we can use it to back the ImmutableList directly?)

eric.giese

unread,
Jun 4, 2012, 1:32:01 PM6/4/12
to guava-discuss
Great, thanks for the quick reply.

The performance impact of the copying is neglible compared to the non-
linear sorting time, of course. I was just wondering since the change
above is so simple and has no real drawbacks. But yes, since Ordering
is in the same package, it could just use ImmutableLists internal
method to avoid the copying.

Looking forward to see these other changes you are talking about. I
will never be able to measure these in my projects. But it makes fun
to see things improving anyway. :-)

Maaartin G

unread,
Jun 4, 2012, 4:04:30 PM6/4/12
to guava-...@googlegroups.com
On Monday, June 4, 2012 7:32:01 PM UTC+2, eric.giese wrote:
Great, thanks for the quick reply.

The performance impact of the copying is neglible compared to the non-
linear sorting time, of course.

I'm not sure about it being really negligible. It's linear, but sorting with O(n*log(n)) comes quite close. Moreover, the input iterable might be already nearly sorted and then TimSort gets very fast. Additionally, the increased memory footprint might lead to a lot of cache misses.... I can't tell how big the performance impact may get in typical case, can you?

Louis Wasserman

unread,
Jun 4, 2012, 4:49:34 PM6/4/12
to Maaartin G, guava-...@googlegroups.com
I mean, on one end, we could reimplement ArrayList completely for ourselves so we do only one copy overall, but that would...be overkill, I think it's fair to say.

All that said, I know these days sorting has a pretty low constant factor, but I think a single System.arraycopy -- which is what it reduces to -- has a pretty low constant factor itself.

Martin Buchholz

unread,
Jun 4, 2012, 8:57:50 PM6/4/12
to Louis Wasserman, Maaartin G, guava-...@googlegroups.com
On Mon, Jun 4, 2012 at 1:49 PM, Louis Wasserman <wasserm...@gmail.com> wrote:
I mean, on one end, we could reimplement ArrayList completely for ourselves so we do only one copy overall, but that would...be overkill, I think it's fair to say.

All that said, I know these days sorting has a pretty low constant factor, but I think a single System.arraycopy -- which is what it reduces to -- has a pretty low constant factor itself.

I would say that general purpose libraries that aim to be used by everyone should try harder to achieve close to optimal performance.  Using arrays in the implementation to minimize copies is a reasonable technique. 

The biggest cost of copying is pressure on the garbage collector rather than the cycles due to the memory copy itself (which modern computers do very well these days). 

Martin

eric.giese

unread,
Jun 5, 2012, 3:26:58 AM6/5/12
to guava-discuss
I have to defend Louis standpoint. I had created a collection library
once by myself, and I know it can be very tricky to achieve a
"perfect" implementation without large amounts of duplicate code or
overcomplex APIs (at least internally).
The most important design guideline for a commonly used API should be
a simple API, a bugfree implementation and reasonable performance. I
guess guava handles these priorities pretty well. A high performance
can be achieved later on, if needed.

I guess what guava in general might need is a package-internal
structure which roughly resembles ArrayList, but is more an
ArrayBuilder. In a lot of cases, first an ArrayList is created which
is later on used to create an Array. Internal Builders might help
there, to reuse data buffers when possible. I used these a lot to
prevent redundant copying when possible.

Without such a change, I guess my suggested optimization above is
rather senseless, since Iterables.toArray() first creates an ArrayList
and then calls toArray. So at least for the mutable sortedCopy
implementation, the improvement has no effect.

Maaartin G

unread,
Jun 5, 2012, 10:04:25 AM6/5/12
to guava-...@googlegroups.com
On Tuesday, June 5, 2012 9:26:58 AM UTC+2, eric.giese wrote:
I guess what guava in general might need is a package-internal
structure which roughly resembles ArrayList, but is more an
ArrayBuilder. In a lot of cases, first an ArrayList is created which
is later on used to create an Array. Internal Builders might help
there, to reuse data buffers when possible. I used these a lot to
prevent redundant copying when possible.

Using an own re-implementation of ArrayList would leave us in most cases with an internal array larger then needed. Such an array could be actually used for the subsequent sorting with some hassle. Alternatively, there's a simple trick for the ArrayBuilder eliminating most of the copying during the list creation: Use ArrayList<Object[]> where the elements' sizes grow themselves. The only operations needed are "add" and "toArray", both rather trivial. Or do you use something smarter?
 
Without such a change, I guess my suggested optimization above is
rather senseless, since Iterables.toArray() first creates an ArrayList
and then calls toArray. So at least for the mutable sortedCopy
implementation, the improvement has no effect.

I'm afraid, you're right. If it were possible for the standard ArrayList to take over the supplied array, the final copying could be saved.

Kevin Bourrillion

unread,
Jun 5, 2012, 10:25:05 AM6/5/12
to Maaartin G, guava-...@googlegroups.com
On Tue, Jun 5, 2012 at 7:04 AM, Maaartin G <graj...@seznam.cz> wrote:
Using an own re-implementation of ArrayList would leave us in most cases with an internal array larger then needed. Such an array could be actually used for the subsequent sorting with some hassle. Alternatively, there's a simple trick for the ArrayBuilder eliminating most of the copying during the list creation: Use ArrayList<Object[]> where the elements' sizes grow themselves. The only operations needed are "add" and "toArray", both rather trivial. Or do you use something smarter?

Note: I prototyped pretty much exactly this a year or two ago, and could dig up that code if anyone plans to follow up on the idea. It wasn't clear enough to me at the time that it would be a big win for me to continue with it


--
Kevin Bourrillion @ Google

Louis Wasserman

unread,
Jun 5, 2012, 11:04:22 AM6/5/12
to eric.giese, guava-discuss
Actually, FYI, Ordering.immutableSortedCopy is already implemented with a toArray(), sort the array, and then use that array to directly back the ImmutableList.  I don't know why I didn't check this myself.
On Mon, Jun 4, 2012 at 8:50 AM, eric.giese <eric...@googlemail.com> wrote:

eric.giese

unread,
Jun 5, 2012, 1:42:12 PM6/5/12
to guava-discuss
Duh. I always use guava together with a source package of the latest
version (12), but I did not look up the current source in the
repository.

Sorry, but great to that it had already been changed. :-)
Reply all
Reply to author
Forward
0 new messages