Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
My implementation on Lab1.2, with question and code attached
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Wei Lu  
View profile  
 More options Aug 3 2010, 3:38 pm
From: Wei Lu <learza2...@gmail.com>
Date: Tue, 3 Aug 2010 14:38:20 -0500
Local: Tues, Aug 3 2010 3:38 pm
Subject: My implementation on Lab1.2, with question and code attached

Hey everyone,

I am trying to be the first one to discuss the lab ex with some real code.
The reason for doing this is that some bugs are really annoying and subtle, and they
can be better fixed by showing the code.

My idea for implementing the kernel is attached in this thread.
The current result is that it compiles but has wrong computation result. By a small-sized
failing case (5-by-5-by-5), I found that everything seemed fine except that the shared memory
variable which holds the current frame,
sh_current[(j + 1) % BLOCK_SIZE_TOTAL][i % BLOCK_SIZE_TOTAL]
did not compute correctly with a certain (j,k). However, I got its right value by removing
the i < nx - 1 condition from w_core_boundary.
Really weird and I even does not understand why the modification would affect the result.
Does anyone have a interest in looking at it?

Btw, I also changed the tx and ty in the main.cu to BLOCK_SIZE_TOTAL for my kernel.
I can attach my source code and the failing test case if anyone requests.

Thanks,
Wei


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Liwen Chang  
View profile  
 More options Aug 3 2010, 3:45 pm
From: Liwen Chang <ddd...@gmail.com>
Date: Tue, 3 Aug 2010 14:45:04 -0500
Local: Tues, Aug 3 2010 3:45 pm
Subject: Re: [Many-core Processors] My implementation on Lab1.2, with question and code attached

You seem to forget to copy data from current to bottom, and from top to
current.

Liwen

  block2D_opt_2.png
41K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rayne  
View profile  
 More options Aug 3 2010, 3:57 pm
From: Rayne <learza2...@gmail.com>
Date: Tue, 3 Aug 2010 12:57:07 -0700 (PDT)
Local: Tues, Aug 3 2010 3:57 pm
Subject: Re: My implementation on Lab1.2, with question and code attached
I think it has been implemented in the "Update shared memory frames"
block.

On Aug 3, 2:45 pm, Liwen Chang <ddd...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xiao-Long Wu  
View profile  
 More options Aug 3 2010, 4:01 pm
From: Xiao-Long Wu <xiaol...@illinois.edu>
Date: Tue, 3 Aug 2010 15:01:30 -0500
Local: Tues, Aug 3 2010 4:01 pm
Subject: Re: [Many-core Processors] My implementation on Lab1.2, with question and code attached

Another issue I saw is the syncthread() statement inside the if-statement. Basically you must be sure all threads will go into the if-statement. Otherwise, your kernel probably will hang there.

Xiao-Long

On 08/03/2010 02:45 PM, Liwen Chang wrote:

You seem to forget to copy data from current to bottom, and from top to current.

Liwen

On Tue, Aug 3, 2010 at 2:38 PM, Wei Lu <learza2008@gmail.com> wrote:
Hey everyone,

I am trying to be the first one to discuss the lab ex with some real code.
The reason for doing this is that some bugs are really annoying and subtle, and they
can be better fixed by showing the code.

My idea for implementing the kernel is attached in this thread.
The current result is that it compiles but has wrong computation result. By a small-sized
failing case (5-by-5-by-5), I found that everything seemed fine except that the shared memory
variable which holds the current frame,
sh_current[(j + 1) % BLOCK_SIZE_TOTAL][i % BLOCK_SIZE_TOTAL] 
did not compute correctly with a certain (j,k). However, I got its right value by removing
the i < nx - 1 condition from w_core_boundary.
Really weird and I even does not understand why the modification would affect the result.
Does anyone have a interest in looking at it?

Btw, I also changed the tx and ty in the main.cu to BLOCK_SIZE_TOTAL for my kernel.
I can attach my source code and the failing test case if anyone requests.

Thanks,
Wei


  image_png_part
41K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Liwen Chang  
View profile  
 More options Aug 3 2010, 4:01 pm
From: Liwen Chang <ddd...@gmail.com>
Date: Tue, 3 Aug 2010 15:01:35 -0500
Local: Tues, Aug 3 2010 4:01 pm
Subject: Re: [Many-core Processors] Re: My implementation on Lab1.2, with question and code attached

Ok, I am wrong.

But in each iteration, you actually read 3-plane data from global memory.

In the optimization 2, we want you to practice register tiling along z
direction.
In it, you only need to read 1 plane data from global memory.

Liwen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wei Lu  
View profile  
 More options Aug 3 2010, 4:54 pm
From: Wei Lu <learza2...@gmail.com>
Date: Tue, 3 Aug 2010 15:54:50 -0500
Local: Tues, Aug 3 2010 4:54 pm
Subject: Re: [Many-core Processors] My implementation on Lab1.2, with question and code attached

Yes, the program passed my previous failing case now. Thanks Xiao-Long.

On Aug 3, 2010, at 3:01 PM, Xiao-Long Wu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wei Lu  
View profile  
 More options Aug 3 2010, 4:58 pm
From: Wei Lu <learza2...@gmail.com>
Date: Tue, 3 Aug 2010 15:58:16 -0500
Local: Tues, Aug 3 2010 4:58 pm
Subject: Re: [Many-core Processors] Re: My implementation on Lab1.2, with question and code attached

Thanks Liwen.
Now I see that your suggestion is better because the read speed from register is faster than that from the share memory.

On Aug 3, 2010, at 3:01 PM, Liwen Chang wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »