MergeSort returns unsorted array for large arrays

Victor Campmany

unread,

Jun 24, 2017, 3:04:08 PM6/24/17

to CUDPP

I am trying to use the mergeSort function in large array sizes (i.e. 10^8) but the output array that I get turns out not to be sorted. I tried to compile my file with CUDA 8 and CUDA 6.5. I am using a GTX Titan X (compute capability 5.2). I attachthe code that I'm using, I took the code from /app/simpleCUDPP/simpleCUDPP.cu and adapted it to call the mergesort algorithm. I might be doing something wrong, but I cannot discover what is wrong in the code.

Thanks!

mergesort_test.cu

John Owens

unread,

Jun 24, 2017, 3:06:22 PM6/24/17

to cu...@googlegroups.com

What does the output look like? Is it identical to the input, for instance? Does mergeSort do anything?

I'd actually advise moderngpu's merge sort these days. It's, well, more modern.

https://moderngpu.github.io/mergesort.html

JDO

> On Jun 24, 2017, at 10:43 AM, Victor Campmany <vcam...@gmail.com> wrote:
>
> I am trying to use the mergeSort function in large array sizes (i.e. 10^8) but the output array that I get turns out not to be sorted. I tried to compile my file with CUDA 8 and CUDA 6.5. I am using a GTX Titan X (compute capability 5.2). I attachthe code that I'm using, I took the code from /app/simpleCUDPP/simpleCUDPP.cu and adapted it to call the mergesort algorithm. I might be doing something wrong, but I cannot discover what is wrong in the code.
>
> Thanks!
>
>

> --
> You received this message because you are subscribed to the Google Groups "CUDPP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cudpp+un...@googlegroups.com.
> To post to this group, send email to cu...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cudpp.
> For more options, visit https://groups.google.com/d/optout.
> <mergesort_test.cu>

Victor Campmany

unread,

Jun 24, 2017, 5:57:37 PM6/24/17

to CUDPP

Thanks for pointing me to moderngpu, but I for now I want to focus on this code since I'm planning to do some modifications on it.

The output is different to the input. I checked with cuda-memcheck and there is no memory error and with nvprof and the kernels are being executed.
I does not work for 10^8 array size, however it does work for sizes of 10^7 and lower.

John Owens

unread,

Jun 27, 2017, 12:56:18 AM6/27/17

to cu...@googlegroups.com

It's possible there's a limit somewhere in our code. What kind of keys are you using? I note in the paper, figure 7a shows our results up to ~16M fixed-size keys. (It may actually be that you're going beyond a memory-size limit that we faced 5 years ago, actually, and we never got to test with anything that big.)

Does the output appear partially sorted? If you run the same input many times, is the output always the same, or different? It would probably help to know at what size it stops working (just a guess, (2^24)-1 vs. (2^24)+1 would be worth testing).

JDO

Victor Campmany

unread,

Jun 27, 2017, 6:17:47 PM6/27/17

to CUDPP

I'm using unsigned ints, ints and floats for the keys.

For 2^24 and (2^24)+1 works with unsigned int and int types. It stops working somewhere between ( 2^24 + (2^24)/4 ) and ( 2^24 + (2^24)/2 ). The array is almost sorted, for example for size ( 2^24 + (2^24)/2 ) there is 181 keys that are wrong. By wrong I mean that the previous key is bigger than the current one, in the case of ascending sorting.

I also tried with float keys and it works for 2^21 but it does not work for (2^21)+1. In this case, for (2^21)+1 elements there is 60 wrong keys.

In all the cases running the code multiple times leads to the same output results.

John Owens

unread,

Jun 30, 2017, 11:51:49 PM6/30/17

to cu...@googlegroups.com

I'm afraid I'm going to have to say "we're not going to be able to debug this code". The student who wrote it has graduated a couple of years ago (and has a new baby). Our NSF support for maintaining this library expired last summer. I'm sympathetic and curious, but we just don't have anyone who can sit down and grind through this and figure out what's wrong.

JDO

Marco

unread,

Jul 3, 2017, 7:11:40 PM7/3/17

to CUDPP

This answer is disappointing (understandable, and not unusual, but still ... disappointing - does this mean that CUDPP in general is basically abandoned?)

I tried to have a look at the code, but probably won't be able to chew through all this in reasonable time. There are some `define`d constants, and one could probably mess around with them just "to see whether it works then", but a systematic approach would involve more effort.

Just one minor question: The testrig, around https://github.com/cudpp/cudpp/blob/master/apps/cudpp_testrig/test_mergesort.cpp#L228 , seems to test mergesort with larger input sizes - do these tests work for you?
(Also: For which compute capability did you compile the library?)

John Owens

unread,

Jul 3, 2017, 8:08:39 PM7/3/17

to cu...@googlegroups.com

We'll try to answer CUDPP questions, but yes, the funding to maintain it expired last summer (2016) and while I can sometimes persuade *current* students who wrote code in it to look if there's a bug report, it's not reasonable to ask someone who's graduated to go back and debug old code on his/her own time.

The testrig, upon every release we've done, has passed (completely), on whatever machines we've tested it on (I think those are in the release notes). The cmake script shows how we've compiled it (in terms of compute capability), but of course you can add all your own. It takes a long time to compile, as you know, so that's why we try to do it once (at compile time) instead of doing runtime compilation (for a compute capability we didn't target).

JDO

> On Jul 3, 2017, at 4:11 PM, Marco <goo...@javagl.de> wrote:
>
> This answer is disappointing (understandable, and not unusual, but still ... disappointing - does this mean that CUDPP in general is basically abandoned?)
>
> I tried to have a look at the code, but probably won't be able to chew through all this in reasonable time. There are some `define`d constants, and one could probably mess around with them just "to see whether it works then", but a systematic approach would involve more effort.
>

> Just one minor question: The testrig, around https://github.com/cudpp/cudpp/blob/master/apps/cudpp_testrig/test_mergesort.cpp#L228, seems to test mergesort with larger input sizes - do these tests work for you?

Reply all

Reply to author

Forward