I think sir only wants us to measure the time of the application on a cpu as well as a gpu and wouldnt mind if we use any other method for that purpose.Many of the sample open cl applications use this method for performance measurement.
I have attached a sample project and you could use that code if you like.
This is how I did it.
//for perf. counters
#include <Windows.h>
#include <conio.h>
#include <iostream>
LARGE_INTEGER g_PerfFrequency;
LARGE_INTEGER g_PerformanceCountNDRangeStart;
LARGE_INTEGER g_PerformanceCountNDRangeStop;
/*to be placed on top of the call
clenqueueNDrangekernel(which sets the work items and work groups per kernel)*/
QueryPerformanceCounter(&g_PerformanceCountNDRangeStart);
/*to be placed before clenqueuereadbuffer(which reads the result of the output)*/
QueryPerformanceCounter(&g_PerformanceCountNDRangeStop);
/*After execution of the kernel this call can be placed anywhere in main to print*/
QueryPerformanceFrequency(&g_PerfFrequency);
printf("NDRange perf. counter time %f ms.\n", 1000.0f*(float)(g_PerformanceCountNDRangeStop.QuadPart - g_PerformanceCountNDRangeStart.QuadPart)/(float)g_PerfFrequency.QuadPart);