Lab 3 Table 2

Mainak Sarkar

unread,

Aug 5, 2010, 2:01:31 PM8/5/10

to VSCSE Many-core Processors 2010

I have some questions regarding filling up the Table 2 in Lab 3. This
is the output I get when I run with parameters 80 and 16.

Reading parameters
Number of samples = 30144488
Grid Size = 128x128x128
Kernel Width = 3.000000
GPU Binsize = 80
Stddev: 16.00
Reading input data from files
Running CUDA version
Time taken for binning is 19.676022
Time taken for GPU computing is 0.000064
Size of CPU workload is 11535370 samples
Time taken for CPU computing is 19.590949
Total CUDA runtime is 41.121781
Pass
Parboil parallel benchmark suite, version 0.2

From this, how do I calculate
% of wasted Bin Location ( Size of CPU workload/Number of samples
*100 ??)

GPU computing time (0.000064 or 41.121781?)

Total execution time =?

Thanks,
Mainak

Heeseok Kim

unread,

Aug 5, 2010, 2:34:21 PM8/5/10

to vscse-many-core...@googlegroups.com

Hi,

The number of bins in the simulation volume equals to the "Grid Size" in the message below. Each bin is capable of holding at most 80 elements in it. Thus, the total size of memory allocated would be (128*128*128) * 80 * sizeof(element).

Meanwhile, the number of elements that are captured by bins is 30144488(Number of samples) - 11535370(Size of CPU workload) = 18609118. In this case, the utilization of bins is 18609118 / (128*128*128*80) ~= 0.11.

I agree that the message below needs some interpretation. As for GPU computing time, it refers to the time spent for gridding, step 5) of the algorithm as shown in page 31 of the manual. Total CUDA runtime is actually total runtime that sums up both GPU time and CPU time.

Mario DAngelo

unread,

Aug 5, 2010, 2:58:42 PM8/5/10

to VSCSE Many-core Processors 2010

So with each bin maxing out at 80 elements, it sounds like the 112 bin
size row of Table 2 is it correct in
returning a memory allocation error or does it require a memFree()?

Heeseok Kim

unread,

Aug 5, 2010, 3:21:12 PM8/5/10

to vscse-many-core...@googlegroups.com

Using 112 instead of 80 will increase the capacity of the bin as such. It wouldn't fail - I hope - allocating the memory. The expected observation from it is that even with 40% bigger capacity of the bin, the performance increase would not be as dramatic. Due to the non uniformity of the input distribution, only small portion of the entire bins will enjoy the increased capacity.

Christoph Kirsch

unread,

Aug 5, 2010, 4:25:35 PM8/5/10

to VSCSE Many-core Processors 2010

I get a "Memory allocation error!" in the output for bin size 112 too.
It works for bin sizes up to 80.

david chen

unread,

Aug 5, 2010, 4:39:00 PM8/5/10

to vscse-many-core...@googlegroups.com

I had the same error.

=======================================

David (Wei) Chen, Ph.D.

Mario DAngelo

unread,

Aug 5, 2010, 5:09:35 PM8/5/10

to VSCSE Many-core Processors 2010

From discussion, think I understood it was a resource limitation given
the
current number of concurrent users. Maybe off peak hours might be
the
better time to run the bin 112 set.

John Stratton

unread,

Aug 5, 2010, 5:41:56 PM8/5/10

to vscse-many-core...@googlegroups.com

The real answer is we're not sure: it could possibly be that some AC cluster nodes has slightly different configurations causing them to fail those allocations, or it could be concurrent users, or a combination, or something else entirely we haven't thought of.

--John
================
John Stratton
217-621-9501
507 W Green St Apt 10
Champaign, IL 61820

ly...@lsu.edu

unread,

Aug 5, 2010, 6:01:11 PM8/5/10

to VSCSE Many-core Processors 2010

It failed with the same error message even when I ran it within an
interactive job, so it doesn't appear to be a contention problem.

Out of quick calculations, a bin size of 112 translates to
128*128*128*112*24 ~= 5.2 GB, which exceeds the 4 GB global memory
limit shown by deviceQuery, while 80 is roughly 3.8 GB. Could this be
the root of the problem?

Cheers,
Le

On Aug 5, 4:41 pm, John Stratton <john.a.strat...@gmail.com> wrote:
> The real answer is we're not sure: it could possibly be that some AC cluster
> nodes has slightly different configurations causing them to fail those
> allocations, or it could be concurrent users, or a combination, or something
> else entirely we haven't thought of.
>
> --John
> ================
> John Stratton
> 217-621-9501
> 507 W Green St Apt 10
> Champaign, IL 61820
>

Heeseok Kim

unread,

Aug 5, 2010, 6:09:57 PM8/5/10

to vscse-many-core...@googlegroups.com

Yes, I think so.

The design is simply too realistic(?) to be used in the labs where hundreds of students are running their applications on the limited resource. The real MRI image reconstruction is actually this size, by the way.

John Stratton

unread,

Aug 5, 2010, 6:11:58 PM8/5/10

to vscse-many-core...@googlegroups.com

Huh, I didn't know that it was 24 bytes of data per point. Yeah, that would do it.

--John
================
John Stratton
217-621-9501
507 W Green St Apt 10
Champaign, IL 61820

On Thu, Aug 5, 2010 at 5:01 PM, ly...@lsu.edu <ly...@lsu.edu> wrote:

Nasser Anssari

unread,

Aug 6, 2010, 4:25:31 AM8/6/10

to VSCSE Many-core Processors 2010

A bin capacity of 112 was deliberately chosen to give a "memory
allocation error". As mentioned in one of the replies, the size of the
bin array in this case exceeds the size of the memory available to a
GPU on the AC cluster nodes. This was supposed to shed some light on
the limitation of the binning approach of Lab2b, as far as its memory
requirements are concerned, when used with non-uniform data. An
explicit note should probably be added in the manual to acknowledge
the error in this case.

As you may have seen already, Part 2 of the lab starts with using a
bin capacity of 80 and goes all the way up to 176 without any memory
allocation errors. This is one advantage of the second approach.

Sorry for any confusion...

Reply all

Reply to author

Forward