OpenMP multithreading degrading performance

6 views
Skip to first unread message

Jonathan Takeshita

unread,
Jul 8, 2021, 1:31:59 PM7/8/21
to sup...@graphene-project.io, Colin McKechney, Taeho Jung, Justin Pajak
We have a scenario where using OpenMP to parallelize some code running on SGX via Graphene is actually degrading performance. Is there a known reason for such behavior?

Thanks,


Jonathan S. Takeshita
Department of Computer Science and Engineering
University of Notre Dame
Notre Dame, IN

Taeho Jung

unread,
Jul 8, 2021, 2:23:13 PM7/8/21
to Jonathan Takeshita, sup...@graphene-project.io, Colin McKechney, Justin Pajak
As a further information:

We noticed the sum calculation of 1M floating numbers takes the least time when we use one thread only, and it takes more time when we use more threads to do divide-and-conquer (i.e., with 2 threads, we compute the sum of halves in parallel and add them at the end; with 4 threads, we compute the sum of quarters and add them at the end).

t refers to the number of threads used.

t = 1: 4.1195 seconds
t = 2: 4.548463 seconds
t = 4: 5.010339 seconds
t = 8: 6.182503 seconds
t = 16: 6.207797 seconds

p.s., we used read() to read in all the floating points from a file instead of fread() to measure the impacts of system calls, and then summed up the floating numbers using the divide-and-conquer approach with multiple threads except in case t=1.

Best,
Taeho
--
Assistant Professor
Department of Computer Science and Engineering, University of Notre Dame
Notre Dame, IN, United States

Michał Kowalczyk

unread,
Jul 11, 2021, 10:13:24 AM7/11/21
to Taeho Jung, Jonathan Takeshita, sup...@graphene-project.io, Colin McKechney, Justin Pajak
Hi,

Could you also share the timings for both native run and graphene-direct? (i.e. non-SGX)

Best,
Michał
--
You received this message because you are subscribed to the Google Groups "Graphene Support Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphene-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/graphene-support/CA%2BYg0BCsifZQrEk6417dCkmf1Hbq-hF%3DiO5y_uCcbRVxbzsnKA%40mail.gmail.com.


Taeho Jung

unread,
Jul 11, 2021, 10:52:58 AM7/11/21
to Michał Kowalczyk, Colin McKechney, Jonathan Takeshita, Justin Pajak, sup...@graphene-project.io
Hi Michał,

Do you mean the plain running wo anything and plain running with graphene but no sgx?

Best,
Taeho

Michał Kowalczyk

unread,
Jul 11, 2021, 11:03:42 AM7/11/21
to Taeho Jung, Colin McKechney, Jonathan Takeshita, Justin Pajak, sup...@graphene-project.io
Yes.

Taeho Jung

unread,
Jul 11, 2021, 11:05:45 AM7/11/21
to Michał Kowalczyk, Colin McKechney, Jonathan Takeshita, Justin Pajak, sup...@graphene-project.io
Okie dokie. Will try that and share the results with you -- might take some time because my team is on leave now :)

Thanks for looking into this.

Best,
Taeho

Taeho Jung

unread,
Aug 12, 2021, 7:30:38 PM8/12/21
to Michał Kowalczyk, Colin McKechney, Jonathan Takeshita, Justin Pajak, sup...@graphene-project.io
Hi Michał,

We did what you suggested, and dug more deeply into the program. We found the issue was in the IO that was not parallelized properly. After handling that, we observed speedup with multithreading the applications running inside SGX via Graphene.

Best,
Taeho

Sankaranarayanan Venkatasubramanian

unread,
Aug 12, 2021, 11:21:36 PM8/12/21
to Taeho Jung, Michał Kowalczyk, Colin McKechney, Jonathan Takeshita, Justin Pajak, sup...@graphene-project.io
Hi Taeho,

Could you also use numactl to limit the execution to only one NUMA (both CPU and memory)? Also, could you use the KMP_AFFINTY parameters? Something like below

KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 numactl --cpubind=0 --membind=0 graphene-sgx...

~Sankar V 

Reply all
Reply to author
Forward
0 new messages