PAPI usage with OMP

7 views
Skip to first unread message

McCoy, Xandra

unread,
Sep 21, 2025, 11:26:26 AMSep 21
to ptools-...@icl.utk.edu, Cameron, Kirk
Good morning, 

I am using PAPI for my research in modeling performance on HPC systems. I am interested in profiling an OpenMP implementation of the Lulesh parallel benchmark. I want cycle counts, cache miss counts, and stall counts for the entire execution of the workload for the program. The workload of the LULESH benchmark contains many parallel regions. Will leaving the counters running across multiple omp parallel regions yield correct total counts for those threads, or is the only way to get correct counts to initialize an event set and counters within each parallel region? Do the performance counter values that each thread sees at the read stage represent only the thread that calls read?

I would like to do something like the following: 

Using thread private EventSet and papi_start variables, 


   /* PAPI initialization */
   init_papi_library();
   init_papi_threads();

#pragma omp parallel
   {
      std::cerr << "There are " << omp_get_num_threads() << " threads" "\n";
      std::cerr << "Starting counters for thread" << omp_get_thread_num() << "\n";

      /* Turn on papi for thread */
      papi_start = start_papi(&EventSet) ; //Adds events to the event set and starts counters
   }

   /* Work of LULESH, contains multiple parallel regions */
   while((locDom->time() < locDom->stoptime()) && (locDom->cycle() < opts.its)) {

      TimeIncrement(*locDom) ;
      LagrangeLeapFrog(*locDom) ;

      // if ((opts.showProg != 0) && (opts.quiet == 0) && (myRank == 0)) {
      //
      //    std::cout << "cycle = " << locDom->cycle()       << ", "
      //              << std::scientific
      //              << "time = " << double(locDom->time()) << ", "
      //              << "dt="     << double(locDom->deltatime()) << "\n";
      //    std::cout.unsetf(std::ios_base::floatfield);
      // }
   }

#pragma omp parallel
   {
      /* Read Counters */
      long long values[4];
      long long papi_end = complete_papi(values, EventSet); //stops counters and reads events into values array

      pretty_print(values, papi_start, papi_end);
      std::cerr << "Ending counters for thread" << omp_get_thread_num() << "\n";

   }

Thank you,

Xandra McCoy
Computer Science
Virginia Tech



Reply all
Reply to author
Forward
0 new messages