Reduce IntValue values are always equal to zero

9 views
Skip to first unread message

kindohm

unread,
Jun 8, 2011, 9:59:20 PM6/8/11
to Twister4Azure
When I run the word count example, or implement my own variation on
word count, the list of IntValues being passed into the reducer are
always equal to zero. Is there something I'm doing wrong? Here are the
Reduce and Map methods being used:

public override int Reduce(
StringKey key,
List<IntValue> values,
string programArgs,
IOutputCollector<StringKey, StringValue> outputCollector)
{
var count = 0;
foreach (var value in values)
{
// *** these values are always zero
count += value.Value;
}

var newValue = new StringValue();
newValue.value = count.ToString();
outputCollector.Collect(key, newValue);

return 0;
}

Here is the corresponding Map method from the Mapper:

protected override int Map(
IntKey key,
StringValue value,
string programArgs,
List<KeyValuePair<NullKey, NullValue>> dynamicData,
IOutputCollector<StringKey, IntValue> outputCollector)
{
// sample line:
// 1/4/2011; John Doe; 2
var line = value.GetTextValue();
var parts = line.Split(';');
var newKey = StringKey.GetInstance(parts[1].Trim());
var newValue = IntValue.GetInstance(int.Parse(parts[2].Trim()));
outputCollector.Collect(newKey, newValue);
return 0;
}


Thilina Gunarathne

unread,
Jun 9, 2011, 2:48:22 AM6/9/11
to twiste...@googlegroups.com
Hi Mike,
Thanks for the feedback. It was due to a bug (introduced by a method extension experiment gone wrong)  that managed to slip through our hands when we were testing the alpha-3 release. It seems we focused too much on the iterative mapreduce features for that release :(..
 
The word count package on [1] should fix this particular issue.  I added the corrected output format to the word count project it self, so that you can get alone by just updating only that project. We are planning on doing a more stable release tested on Azure SDK 1.4 very soon. My apologies for the inconvenience.
 
thanks,
Thilina
 


--
http://salsahpc.indiana.edu/twister4azure/
You received this message because you are subscribed to the Google Groups "Twister4Azure" group.
To unsubscribe from this group, send email to twister4azur...@googlegroups.com



--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina

kindohm

unread,
Jun 9, 2011, 8:50:14 AM6/9/11
to Twister4Azure
That worked great, thanks!

-Mike

Thilina Gunarathne

unread,
Jun 9, 2011, 4:20:44 PM6/9/11
to twiste...@googlegroups.com
Great...Good to hear that..  Let us know any issues/suggestions you come across. 

When we have a new release, I'll make sure to announce it in this list.

thanks,
Thilina

On Thu, Jun 9, 2011 at 8:50 AM, kindohm <mike.h...@gmail.com> wrote:
That worked great, thanks!

-Mike

--
http://salsahpc.indiana.edu/twister4azure/
You received this message because you are subscribed to the Google Groups "Twister4Azure" group.
To unsubscribe from this group, send email to twister4azur...@googlegroups.com

kindohm

unread,
Jun 9, 2011, 5:26:19 PM6/9/11
to Twister4Azure
Thilina -

I'm now receiving some exceptions in the SequenceOutputFormat class
that you provided. In the UploadValues method, an exception is thrown
when resultBlob.UploadText() is called.

StorageClientException
"The specified blob already exists."

I tried modifying the code to use a new, unique Guid for the blob
name, but the exception still gets thrown:

string newRef = Guid.NewGuid().ToString();
CloudBlockBlob resultBlob = container.GetBlockBlobReference(newRef +
".txt");
resultBlob.UploadText(buffer.ToString());

Any ideas?

I started running in to this problem when I started supplying more
than one input file in the input blob container. I haven't yet tried
removing the extra files and only processing one file to see if the
problem goes away.

-Mike

Thilina Gunarathne

unread,
Jun 9, 2011, 6:06:17 PM6/9/11
to twiste...@googlegroups.com
That's bit strange. Before sending the fix your way, I was able to test successfully with multiple input files. In fact, I did it even now in my local development fabric. I'm using the Azure SDK 1.4.1 (April 2011). The strange thing is that, even when I use the same output container (the intermediate container name is derived from the out container name) for multiple jobs, the blobs get overridden without any complains. 

Are you using a large input with the local development storage? I have read in some forums(eg: [1]) that the dev storage throws this error intermittently  for large files. 

thanks,
Thilina



-Mike

--
http://salsahpc.indiana.edu/twister4azure/
You received this message because you are subscribed to the Google Groups "Twister4Azure" group.
To unsubscribe from this group, send email to twister4azur...@googlegroups.com

kindohm

unread,
Jun 9, 2011, 10:41:17 PM6/9/11
to Twister4Azure
When I broke up the input data into a larger number of smaller files,
the "blob already exists" exceptions went away. Thanks for pointing
out this quirk with dev storage.

I'm going to try running my test scenario with a larger amount of data
in my real Azure environment tomorrow. I'm really eager to tinker with
the instance count and thread count settings and see how the system
responds.

I appreciate all your help!

-Mike

On Jun 9, 5:06 pm, Thilina Gunarathne <cset...@gmail.com> wrote:
> That's bit strange. Before sending the fix your way, I was able to
> test successfully with multiple input files. In fact, I did it even now in
> my local development fabric. I'm using the Azure SDK 1.4.1 (April 2011). The
> strange thing is that, even when I use the same output container (the
> intermediate container name is derived from the out container name) for
> multiple jobs, the blobs get overridden without any complains.
>
> Are you using a large input with the local development storage? I have read
> in some forums(eg: [1]) that the dev storage throws this
> error intermittently  for large files.
>
> thanks,
> Thilina
>
> [1]http://social.msdn.microsoft.com/Forums/hu-HU/windowsazuredata/thread...

Thilina Gunarathne

unread,
Jun 10, 2011, 11:22:03 AM6/10/11
to twiste...@googlegroups.com
Hi Mike,
Glad to know that the error went away. It's great if you can keep me posted about your results. I'm really interested as I have never got a chance to bench-mark the word-count, though I have bench-marked all the other samples. Let me know if you come across any issues.

thanks,
Thilina
Reply all
Reply to author
Forward
0 new messages