Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Lab 1 question
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Xiaowei Zhan  
View profile  
 More options Aug 6 2010, 3:38 pm
From: Xiaowei Zhan <zha...@gmail.com>
Date: Fri, 6 Aug 2010 15:38:20 -0400
Local: Fri, Aug 6 2010 3:38 pm
Subject: Lab 1 question
Hello,

My idea is to split the xy planes by 16x16 blocks,  and
in each GPU block, there are 18x18 threads, so all threads can read in
data to the shared memory within one GPU clock.
However, the code below does not work as I expected.
Can TA help with this lab 1 ?

In main.cu:

  dim3 block_new(BLOCK_SIZE_TOTAL,BLOCK_SIZE_TOTAL);
  dim3 grid_new( (nx-2)/BLOCK_SIZE_CORE, (ny-2)/BLOCK_SIZE_CORE);
  block2D_opt_2<<<grid_new, block_new>>>(fac, d_A0, d_Anext, nx, ny,  nz);

In kernel1.2.cu:

__global__ void block2D_opt_2(float fac,float *A0,float *Anext, int
nx, int ny, int nz)
{
    //For optimization 2
    int inx=threadIdx.x;
    int iny=threadIdx.y;

    // coordinates at original nx * ny matrix
    int i = blockIdx.x*(blockDim.x-2)+inx;
    int j = blockIdx.y*(blockDim.y-2)+iny;

    __shared__ float sh_A0[BLOCK_SIZE_TOTAL][BLOCK_SIZE_TOTAL];
    sh_A0[iny][inx]=0.0f;
    __syncthreads();

    bool w_region = (i > 0) && (j > 0) && (i < nx-1) && (j < ny-1);
    for(int k=1;k<nz-1;k++) {
        sh_A0[iny][inx]=A0[Index3D (nx, ny, i, j, k)];
        __syncthreads();

        if(w_region)
        {
        Anext[Index3D (nx, ny, i, j, k)] =
        A0[Index3D (nx, ny, i, j, k + 1)] +
        A0[Index3D (nx, ny, i, j, k - 1)] +
        sh_A0[iny+1][inx  ]  +
        sh_A0[iny-1][inx  ]  +
        sh_A0[iny  ][inx+1]  +
        sh_A0[iny  ][inx-1]
        - 6.0f * sh_A0[iny][inx] / (fac*fac);
        }
        __syncthreads();
    }

}

Xiaowei

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Liwen Chang  
View profile  
 More options Aug 6 2010, 5:59 pm
From: Liwen Chang <ddd...@gmail.com>
Date: Fri, 6 Aug 2010 16:59:18 -0500
Local: Fri, Aug 6 2010 5:59 pm
Subject: Re: [Many-core Processors] Lab 1 question

 Ok, in these codes, you assigned your block size as 18x18 threads.
but in each block, you will only compute 16x16 data for output.
That means you need add if condition before you do the computation.

Liwen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Stratton  
View profile  
 More options Aug 6 2010, 6:09 pm
From: John Stratton <john.a.strat...@gmail.com>
Date: Fri, 6 Aug 2010 17:09:40 -0500
Local: Fri, Aug 6 2010 6:09 pm
Subject: Re: [Many-core Processors] Lab 1 question

Remember that your threads overlap the 16x16 tile.  So the input index
thread (0,0) should access isn't blockIdx*16 + tid, but blockIdx*16 + tid -
1.

What Li-wen said also applies.  When computing out of the shared memory or
writing output, you need to check not just that the output is inside the
grid, but that the input is as well.  More than that even, you only want
thread inside the 16x16 centered tile (threads 1 to 16 in x and y) to
compute and write output.

--John
================
John Stratton
217-621-9501
507 W Green St Apt 10
Champaign, IL 61820


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »