Fetching Counters Progressively

42 views
Skip to first unread message

ANIKET MORE

unread,
Aug 3, 2016, 10:27:04 AM8/3/16
to cascading-user

Hi,

In my project, I have sub assemblies to process the data and finding the number of tuples processed by each assembly.

I have used Counter to calculate the tuples processed by an assembly by using Each pipe.

I am retrieving the count by getting counter groups from flowstepstats. For eg:

 

Long RecordCount = 0L;

for (String counter : flowStepStats.getCountersFor(COUNTER_GROUP)) {

RecordCount = flowStepStats.getCounterValue(COUNTER_GROUP, counter);

}

 

If I tried to access getCountersFor() method while flowstep is running, I am getting empty list.

Counters are available only after flowstep completes.

 

The above scenario is for hadoop local mode as we cannot get counters at flownode level on local mode.

In remote mode, we are able to retrieve counters at flownode level by getting counter groups from flownodeStats. For eg:

 

Long recordCount = 0L;

for (String counter : flowNodeStats.getCountersFor(COUNTER_GROUP)) {

recordCount = flowNodeStats.getCounterValue(COUNTER_GROUP, counter);

}

 

In remote mode, if we tried to retrieve counters while any of the flownode keeps running in MR job,I am getting empty list.

Counters can only be retrieved after a flownode completes.

 

In hadoop local mode and remote mode, I am not able to get counters progressively.

I am getting counters only after a flownode(for remote) or flowstep(for local) completes.

Is there any way to fetch the counters progressively(as SubAssembly progresses) in Hadoop remote and local mode ?


Can someone please suggest me some solution for the above use case?


Cascading Version : 3.1.0

 

Thanks!!

Andre Kelpe

unread,
Aug 3, 2016, 11:14:57 AM8/3/16
to cascading-user
Please find the answers inline.

On Wed, Aug 3, 2016 at 4:27 PM, ANIKET MORE <dr.mes...@gmail.com> wrote:
> Hi,
>
> In my project, I have sub assemblies to process the data and finding the
> number of tuples processed by each assembly.
>
> I have used Counter to calculate the tuples processed by an assembly by
> using Each pipe.
>
> I am retrieving the count by getting counter groups from flowstepstats. For
> eg:
>
>
>
> Long RecordCount = 0L;
>
> for (String counter : flowStepStats.getCountersFor(COUNTER_GROUP)) {
>
> RecordCount = flowStepStats.getCounterValue(COUNTER_GROUP, counter);
>
> }
>
>
>
> If I tried to access getCountersFor() method while flowstep is running, I am
> getting empty list.
>
> Counters are available only after flowstep completes.

There is no way around it. We cannot hammer the ResourceManager with
queries for counters every 100 ms. If you have 1000s of slices
running, these calls are quite expensive, so we populate the data when
it makes sense. Also, while a job is running, the counters may not be
available yet. The only time they are reliable, is when the step is
done.


>
> The above scenario is for hadoop local mode as we cannot get counters at
> flownode level on local mode.

If this is on MapReduce, then you are out of luck. There are no
taskattempts in local mode and therefore we have no way to return the
counters on such a low level.


>
> In remote mode, we are able to retrieve counters at flownode level by
> getting counter groups from flownodeStats. For eg:
>
>
>
> Long recordCount = 0L;
>
> for (String counter : flowNodeStats.getCountersFor(COUNTER_GROUP)) {
>
> recordCount = flowNodeStats.getCounterValue(COUNTER_GROUP, counter);
>
> }
>
>
>
> In remote mode, if we tried to retrieve counters while any of the flownode
> keeps running in MR job,I am getting empty list.
>
> Counters can only be retrieved after a flownode completes.
>

See above.


>
> In hadoop local mode and remote mode, I am not able to get counters
> progressively.
>
> I am getting counters only after a flownode(for remote) or flowstep(for
> local) completes.
>
> Is there any way to fetch the counters progressively(as SubAssembly
> progresses) in Hadoop remote and local mode ?

No.

>
> Can someone please suggest me some solution for the above use case?

What is the actual use case here? What type of systems needs those
counters in-flight? I don't fully understand the need for the
requirment tbh.

- André

>
>
> Cascading Version : 3.1.0
>
>
>
> Thanks!!
>
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/2b52e688-4946-40db-bf57-40605012b064%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

ANIKET MORE

unread,
Aug 8, 2016, 11:03:02 AM8/8/16
to cascading-user
Thanks for reply Andre.

Below is my use-case:
I am developing application through which one can develop data flow using cascading pipes/sub-assemblies. Through this entire flow I 
want to log number of records processed by each pipes/sub-assemblies.
Reply all
Reply to author
Forward
0 new messages