Results for: C#, F# Algorithm: Faster than QuickSort?

177 views
Skip to first unread message

Ben

unread,
Aug 29, 2012, 7:34:06 PM8/29/12
to orc...@googlegroups.com
Tentative results:

1 - bawr, AvgRatio: 23.303, SD: 0.062, WorstCase: 23.201
2 - Zaher, AvgRatio: 17.245, SD: 0.556, WorstCase: 16.822
3 - c6c, AvgRatio: 11.495, SD: 0.413, WorstCase: 10.428
4 - SoftwareBender, AvgRatio: 10.259, SD: 0.033, WorstCase: 10.204
5 - ErwinReid, AvgRatio: 9.879, SD: 0.052, WorstCase: 9.785
6 - V_Tom_R, AvgRatio: 9.713, SD: 0.022, WorstCase: 9.666
7 - Mihai, AvgRatio: 9.656, SD: 0.097, WorstCase: 9.425
8 - Renze, AvgRatio: 9.219, SD: 0.303, WorstCase: 8.438
9 - MoustafaS, AvgRatio: 9.069, SD: 0.368, WorstCase: 8.362
10 - aus1, AvgRatio: 5.025, SD: 0.025, WorstCase: 4.961

All the code submissions and benchmarks can be downloaded from here.

I would be interested to see what results others get on their machine so if you can publish your results on this thread that would be great.

Some thoughts:
- I was surprised no one used unmanaged code.
- Has anyone tried a "Bucket sort" implementation? (if so what was the result?):

Something along the lines of: grab the first and last DateTime values and subdivide that into n buckets, then use an insertion sort over each bucket. Each bucket shouldn't contain more than 15 or so elements.

- CLR via C# has a good chapter on arrays and how to get the best performance out of them.


Next step will be to come up with better benchmarking data and perhaps have a competition which ranks entries by taking an average over multiple different benchmark data sets.
If you have suggestions for possible data set candidates please post them on this thread. (I already have a few which you will find in the code GetDateRanges2() and GetDateRanges3()).

The winner for this competition will be announced once everyone has had a chance to look at the code and provide feedback.

Please keep your comments and discussions concerning this competition on this thread as it is easier to manage and read.

Cheers,

Ben

Ben

unread,
Aug 29, 2012, 7:39:47 PM8/29/12
to orc...@googlegroups.com
Forgot to mention. Please double check that the code included in the project is indeed your last submitted code.

Renze de Waal

unread,
Aug 30, 2012, 3:35:32 AM8/30/12
to orc...@googlegroups.com
Hello,

A comment:
bawr's solution returns a LINQ expression without a ToArray or ToList at the end, postponing part of the work (creating the DateTime instances) until it is requested. This removes part of the work to obtain the result out of the time measurement. Adding a ToList() or ToArray() (pulling this work inside the timing) will change the order of the results. 

Is this covered by the "There are no rules" rule, or should bawr's solution be revised?

I hope to reply later today with some test cases that cause some of the solutions to produce incorrect results.

Cheers,
Renze.




Renze de Waal

unread,
Aug 30, 2012, 5:27:11 AM8/30/12
to orc...@googlegroups.com
Hello,

Below you will find a simple test case. Please fill in the following names for X, and you will find that the test case fails:
- Zaher
- ErwinReid
- Mihai
- bawr

I have not been able to find test cases for which the other solutions fail (I have deliberately excluded the empty input list from the test case candidates).

Cheers,
Renze.

[TestMethod]
        public void GetSortedDateTimes_SimpleInput_X()
        {
            DateTime dateNow = DateTime.Now;

            var orderedDateRanges = new List<DateRange>();
            orderedDateRanges.Add(new DateRange(dateNow, dateNow.AddMinutes(9)));            
            orderedDateRanges.Add(new DateRange(dateNow.AddMinutes(69), dateNow.AddMinutes(69)));
            var result = orderedDateRanges.GetSortedDateTimes("X");

            var correctDateTimes = new List<DateTime>();
            orderedDateRanges.ForEach(x => correctDateTimes.AddRange(new List<DateTime> { x.StartTime, x.EndTime }));
            correctDateTimes.Sort();

            Assert.Equal(correctDateTimes, result);
        }

Ben Orca

unread,
Aug 30, 2012, 6:43:28 AM8/30/12
to orc...@googlegroups.com
Hi Renze,

Your assumption is correct. The LINQ queries must be executed within the benchmark measurement.
I will have a look and modify the code.

Cheers,

Ben





--
 
 

Ben Orca

unread,
Aug 30, 2012, 6:47:29 AM8/30/12
to orc...@googlegroups.com
Interesting. I thought the unit test provided in the competition covered a moment in time, but it obviously needs to be right at the end of the list.
Thanks. I will include it in the unit tests, as well as the empty list scenario.

Cheers,
Ben

--
 
 

Renze de Waal

unread,
Aug 30, 2012, 6:57:59 AM8/30/12
to orc...@googlegroups.com
Ben,

I assume it was by accident, but the size of the benchmark (numberOfDateRanges) has changed from 1000000 to 100000.

Cheers,
Renze.

Ben Orca

unread,
Aug 30, 2012, 6:59:53 AM8/30/12
to orc...@googlegroups.com
Yes, that is an accident! I remember setting it to 1E5 because 1E6 was taking too long. I don't think it would change the rankings that much though... Let me see

--
 
 

Moustafa El-Sayed

unread,
Aug 30, 2012, 7:49:19 AM8/30/12
to orc...@googlegroups.com
Yeah, I think you should check all the solutions that use linq , I created complete linq solution and it gave me a ratio of 120000 :) .

Zaher Al-Sibai

unread,
Aug 30, 2012, 8:53:09 AM8/30/12
to orc...@googlegroups.com
H Ben,
 
Actually when I was working on the contest I was using the Release configuration, the avg ratio was more than 24, and when run in Debug mode I was getting more than 14, so I submitted my code knowing that it is faster than the other implementations in both configurations. When you updated the results, I knew you was using the Debug configuration from the results, it was similar to my results.
 
Now when I set the old orcomp solution to use the Release config, I get more than 24 avg ratio, and when I use the new solution that have the code of all contestants in release mode I get more than 18 only.
 
So I deleted the other contestant codes and run it again, this time I got more than 21 avg ratio!! I believe that this is caused by the compiler or jitter being unable to properly optimize the code, so this is not the proper way to compare the performance of each implementation against QuickSort, each implementation code should be tested seperately or at least should reside in a different class.
 
I'm interested to hear what the others think about this issue.
 
Kind regards,
 
Zaher.

Renze de Waal

unread,
Aug 30, 2012, 3:17:57 PM8/30/12
to orc...@googlegroups.com
Zaher, Ben, all,

To test this I have separated the solutions into separate methods (not one method with a lot of ifs, but 10 methods with a different name and (in other aspects) the same signature). I then altered the benchmark so that the contestants method is passed as a delegate. The second thing I did was alter the benchmark size from 1e5 to 1e6 (what it was). I did not change bawrs' solution (ToArray/ToList).

Below I have gathered 4 different timings

I agree that keeping the methods separate is cleaner and yields different results than combining them. From the results I suspect that you have done your measurements with the 1e5 benchmark size, but I cannot be sure.

Friday to Sunday I will probably not be able to reply (I'm going off for a long weekend).

Cheers,

Renze.

(A) Separate methods (delegates), benchmark size 1e6

1 - bawr, AvgRatio: 24,667, SD: 0,621, WorstCase: 23,670
2 - Zaher, AvgRatio: 21,096, SD: 0,318, WorstCase: 20,623
3 - Renze, AvgRatio: 18,332, SD: 0,294, WorstCase: 17,752
4 - SoftwareBender, AvgRatio: 14,956, SD: 0,321, WorstCase: 14,465
5 - V_Tom_R, AvgRatio: 13,270, SD: 0,145, WorstCase: 13,121
6 - Mihai, AvgRatio: 13,208, SD: 0,200, WorstCase: 12,991
7 - c6c, AvgRatio: 10,290, SD: 0,516, WorstCase: 8,948
8 - MoustafaS, AvgRatio: 9,560, SD: 0,133, WorstCase: 9,294
9 - ErwinReid, AvgRatio: 9,232, SD: 0,150, WorstCase: 9,039
10 - aus1, AvgRatio: 6,081, SD: 0,139, WorstCase: 5,776

(B) Separate methods (delegates), benchmark size 1e5

1 - bawr, AvgRatio: 22,135, SD: 0,580, WorstCase: 20,942
2 - Zaher, AvgRatio: 19,785, SD: 0,406, WorstCase: 18,805
3 - Renze, AvgRatio: 14,846, SD: 0,629, WorstCase: 13,735
4 - SoftwareBender, AvgRatio: 13,241, SD: 0,197, WorstCase: 12,847
5 - V_Tom_R, AvgRatio: 12,305, SD: 0,376, WorstCase: 11,519
6 - Mihai, AvgRatio: 11,489, SD: 0,622, WorstCase: 9,942
7 - c6c, AvgRatio: 10,318, SD: 0,476, WorstCase: 9,118
8 - ErwinReid, AvgRatio: 9,355, SD: 0,340, WorstCase: 8,676
9 - MoustafaS, AvgRatio: 9,223, SD: 0,397, WorstCase: 8,400
10 - aus1, AvgRatio: 5,835, SD: 0,145, WorstCase: 5,502

(C) One method (ifs), benchmark size 1e6

1 - bawr, AvgRatio: 27,362, SD: 0,516, WorstCase: 26,433
2 - Zaher, AvgRatio: 21,361, SD: 0,052, WorstCase: 21,277
3 - SoftwareBender, AvgRatio: 14,435, SD: 0,194, WorstCase: 14,088
4 - Mihai, AvgRatio: 13,951, SD: 0,207, WorstCase: 13,561
5 - Renze, AvgRatio: 12,300, SD: 0,375, WorstCase: 11,871
6 - V_Tom_R, AvgRatio: 11,188, SD: 1,290, WorstCase: 8,921
7 - c6c, AvgRatio: 10,487, SD: 0,269, WorstCase: 9,992
8 - MoustafaS, AvgRatio: 10,357, SD: 0,294, WorstCase: 9,978
9 - ErwinReid, AvgRatio: 10,043, SD: 0,171, WorstCase: 9,756
10 - aus1, AvgRatio: 6,261, SD: 0,188, WorstCase: 5,862

(D) One method (ifs), benchmark size 1e5

1 - bawr, AvgRatio: 22,148, SD: 0,244, WorstCase: 21,606
2 - Zaher, AvgRatio: 16,193, SD: 0,188, WorstCase: 15,801
3 - SoftwareBender, AvgRatio: 12,877, SD: 0,172, WorstCase: 12,608
4 - Renze, AvgRatio: 12,520, SD: 0,195, WorstCase: 12,196
5 - Mihai, AvgRatio: 11,774, SD: 0,111, WorstCase: 11,592
6 - V_Tom_R, AvgRatio: 11,299, SD: 0,029, WorstCase: 11,237
7 - c6c, AvgRatio: 10,211, SD: 0,152, WorstCase: 9,903
8 - MoustafaS, AvgRatio: 9,940, SD: 0,929, WorstCase: 7,564
9 - ErwinReid, AvgRatio: 9,511, SD: 0,326, WorstCase: 8,882
10 - aus1, AvgRatio: 5,851, SD: 0,102, WorstCase: 5,599

V_TOM_R

unread,
Aug 30, 2012, 3:42:20 PM8/30/12
to orc...@googlegroups.com
Hello,

In my opinion using GetDateRanges3 to generat benchmarking data would be much better, because in this contest most submissions (mine too) using the fact that EndDates are already sorted. So fastest array merging algorithm wins  (not the best sorting algorithm). 
I tried to use "bucket sort", but it gave worse results than Array.Sort - perhaps using it with multithreading would give better results.

/Tom

Ben Orca

unread,
Aug 31, 2012, 6:52:51 PM8/31/12
to orc...@googlegroups.com
Thanks for pointing this out. Looks like Renze has found the cause and provided a solution. I will incorporate his solution into the project.

--
 
 

Ben Orca

unread,
Aug 31, 2012, 6:53:59 PM8/31/12
to orc...@googlegroups.com
Thanks for showing the updates. Hopefully I will have some time over the weekend to have a look at this.
Cheers,

Ben


--
 
 

Ben Orca

unread,
Aug 31, 2012, 6:58:27 PM8/31/12
to orc...@googlegroups.com
Hi Tom,

Yes, you make some good observations and thanks for the feedback on the "bucket sort" result.

I would like to set up another competition that uses different benchmarking data, and am open to suggestions. GetDateRanges3() is the worst case scenario I could think of, but would be interested to see what benchamrk data others come up with.

Cheers,

Ben


/Tom

--
 
 

c6c

unread,
Sep 2, 2012, 7:14:59 AM9/2/12
to orc...@googlegroups.com
Hi Ben,

I propose the following benchmarks. GetDateRanges4() generates a list based on randomly generated dates. GetDateRanges5() create a list with EndTimes in descending order. I also suggest to use the the same amount of data in every benchmark. Currently GetDateRanges3() generates a list of only 250000 elements.

public static IEnumerable<DateRange> GetDateRanges4(int count)
        {
            var date = DateTime.Now;
            var r = new Random();
            List<DateRange> dateRanges = new List<DateRange>(count);

            for (int i = 0; i < count; i++)
            {
                DateTime startTime = date.AddSeconds(r.Next(-1000000, 1000000));
                DateTime endTime = startTime.AddSeconds(r.Next(0, 1000000));
                dateRanges.Add(new DateRange(startTime, endTime));
            }

            dateRanges.Sort();
            return dateRanges;
        }

        public static IEnumerable<DateRange> GetDateRanges5(int count)
        {
            var date = DateTime.Now;
            var dateRanges = new DateRange[count];

            for (int i = 0; i < count; i++)
            {
                dateRanges[count - i - 1] = new DateRange(date.AddMinutes(-i), date.AddMinutes(i));
            }

            return new List<DateRange>(dateRanges);
        }

Cheers

Robert

Ben

unread,
Sep 5, 2012, 7:33:50 AM9/5/12
to orc...@googlegroups.com

Hi Robert,

Thanks for the methods. I will make sure that all benchmark data are the same length.
We can then take the average of 4 or 5 different benchmarks. I will try and work on this over the weekend and post back here before starting a new competition.

Cheers,

Ben

Renze de Waal

unread,
Sep 6, 2012, 4:08:25 PM9/6/12
to orc...@googlegroups.com
Hello,

I would like to suggest two things:

1) The random benchmarks may increase the variance of the ratio or time duration if the different entries to the competition are presented with different data. This can be solved by entering a fixed seed for the random number generator that is used in each benchmark. Alternatively, it is possible to use the random generator to generate a benchmark and then serialize it (so it can be deserialized for the actual benchmark measurements).

2) Please add a test case that randomly generates a number of test cases (say 1000000) and compares the result of each entry with the reference implementation. This way, entries can be checked and fixed before they are submitted.

Cheers,
Renze.

Ben Orca

unread,
Oct 2, 2012, 6:07:27 AM10/2/12
to orc...@googlegroups.com
Hi everybody,

Sorry for the long delay. I have finally had time to merge all the changes Renze and c6c suggested, and created a github account which will make a it a lot easier in the future. You can now find the latest Orcomp library at https://github.com/Orcomp/Orcomp

I have also run the benchmarks with the delegates approach and these are the final results.

1 - Zaher, AvgRatio: 18.167, SD: 0.804, WorstCase: 16.136
2 - Renze, AvgRatio: 16.415, SD: 0.036, WorstCase: 16.343
3 - V_Tom_R, AvgRatio: 14.073, SD: 0.114, WorstCase: 13.919
4 - SoftwareBender, AvgRatio: 12.829, SD: 0.058, WorstCase: 12.739
5 - c6c, AvgRatio: 12.526, SD: 0.069, WorstCase: 12.380
6 - ErwinReid, AvgRatio: 11.452, SD: 0.036, WorstCase: 11.372
7 - Mihai, AvgRatio: 10.425, SD: 3.642, WorstCase: 5.725
8 - MoustafaS, AvgRatio: 10.024, SD: 0.186, WorstCase: 9.761
9 - aus1, AvgRatio: 7.972, SD: 0.186, WorstCase: 7.489
10 - bawr, AvgRatio: 5.317, SD: 0.240, WorstCase: 4.835


Congratulations to Zaher for winning this competition. (I know there is an issue with one of the unit tests, however the failing unit test was not part of the initial competition so it does not count. However it is now part of the new competition :-)

The next proposed competition is already on github under the "FasterThanQuicksortExtended" branch. Please feel free to fork this project. When I get a bit of time I will be creating a competition on vWorker.

Cheers,

Ben
Reply all
Reply to author
Forward
0 new messages