ck_ring : consumer slower, throughly confused: kindly guide

204 views
Skip to first unread message

Aval Sarri

unread,
May 20, 2014, 11:35:07 AM5/20/14
to concurr...@googlegroups.com
Hello;

First thanks for wonderful library, this is allowing novice like me to experiment and learn a great deal.

I am a newbie and not able to grasp why consumer is not eating up items as fast as producer producing them. In my attached code, I have one producer thread that calls gettimeofday and enqueue it. Consumer thread removes data and calculates difference between two; that is when was data inserted and when removed. On my machine I can see this difference going up to 1 mili-seconds so I am thinking that producer spins trying to enqueue data, but then consumer running on second core should be able to dequeue it as fast as possible?  Is gettimeofday call messing things up? Am I really doing something stupid?  Should I not even try ck_ring with 0.2.17? 

I am trying to see how fast enqueue and dequeue can be done to understand and learn more about threads and how long data stays in buffer since in my application I want to dequeue data as fast as possible with minimum buffer in-time. Also with each run max time varies greatly, I have tried clearing cache using

sync; echo 3 > /proc/sys/vm/drop_caches

Once I get  ctrl +c pressed:  Total enq:68692296 total deq:68692296 max_time:1527 difference of enq and deq: 0
another time I get
      ctrl +c pressed:  Total enq:62748264 total deq:62748264 max_time:493 difference of enq and deq: 0


I am experimenting and using ck-0.2.17.tar.gz.
I am using Ubuntu 12.04.3 LTS, with 3.8.0-32-generic #47~precise1-Ubuntu SMP on 64 bit machine, which is Intel(R) Core(TM) i3-2350M CPU @ 2.30GHz.
Also I am not able to compile attached code against 0.4.2 release.

Kindly guide on what more I should be learning and reading and what am I doing wrong here. Thanks for reading this email and your time.

test-ck.c

Samy Al Bahra

unread,
May 22, 2014, 12:38:35 PM5/22/14
to concurr...@googlegroups.com
Hi,

The version of Concurrency Kit that you're using is ancient. Could you
please use the latest version? We can work from there. However, it
seems what you're experiencing is a side effect of jitter / etc... and
not ck_ring itself (which is extremely fast).
> --
> You received this message because you are subscribed to the Google Groups
> "Concurrency Kit" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to concurrencyki...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Samy Al Bahra [http://repnop.org]

Aval Sarri

unread,
May 26, 2014, 5:46:32 AM5/26/14
to concurr...@googlegroups.com
On Thursday, May 22, 2014 10:08:35 PM UTC+5:30, sbahra wrote:
Hi,

The version of Concurrency Kit that you're using is ancient. Could you
please use the latest version? We can work from there. However, it
seems what you're experiencing is a side effect of jitter / etc... and
not ck_ring itself (which is extremely fast).



Thank you for your response. It would take time for me to understand 0.4 and able to compile code against it, since ##name functions have changed.  I wanted to understand earlier we could directly assign structure entry to the queue now we have memcpy in ck_ring, so is memcpy better than earlier method?  Also now we have sizeof  call along with generic functions, as a programmer I liked earlier version better. 

Thank you for your time and patience.



Aval Sarri

unread,
May 27, 2014, 8:14:46 AM5/27/14
to concurr...@googlegroups.com


On Monday, May 26, 2014 3:16:32 PM UTC+5:30, Aval Sarri wrote:
On Thursday, May 22, 2014 10:08:35 PM UTC+5:30, sbahra wrote:
Hi,

The version of Concurrency Kit that you're using is ancient. Could you
please use the latest version? We can work from there. However, it
seems what you're experiencing is a side effect of jitter / etc... and
not ck_ring itself (which is extremely fast).




I am having usages issues doubts with new version.  Please correct me if my understanding is wrong:

we have now

ck_ring_t              my_ring;
ck_ring_buffer_t    *my_buff;

which we initialize using say following two statements

my_buff = malloc(sizeof(ck_ring_buffer_t) * 1024); 
ck_ring_init (&my_ring, 1024)

So when I do

ck_ring_enqueue_spsc(&my_ring, my_buff, &tmp) 

here tmp is my structure of size ~ 64 bytes. 

My doubt is when we malloc for my_buff it has to malloc as per sizeof ck_ring_buffer_t  or size of my structure?

In the ck_ring I can see memcpy onto buffer at producer offset.  So another doubt is that do I have to maintain separate array for my elements?

I am trying to code using latency.c file and I am not able clearly understand user buffer, ring_buffer_t and how these are mapped to each other?

I am attaching my effort to understand 0.4.2.




















 
test-ck.c

Samy Al Bahra

unread,
May 27, 2014, 10:39:51 AM5/27/14
to concurr...@googlegroups.com
Hi Aval,

If you are going to be enqueuing pointers then you should be passing
in a pointer to storage space whose lifetime matches that of the
lifetime of the object. In this case ck_ring_buffer_t has sufficient
storage for a buffer of pointers. In your example, it appears you are
trying to store the object value directly in the ring. I do not
recommend doing that at 64 bytes or at least make sure to benchmark
things. If you want additional examples, look at
http://facebook.github.io/libphenom/

If you wish to store the objects directly inside of the ring buffer
(which may not be beneficial at 64 bytes an entry) then please use the
upper-case family of ck_ring beginning with CK_RING_PROTOTYPE. This
defines a ring whose individual storage elements are of the type of
the value you are attempting to store.

Aval Sarri

unread,
Jun 5, 2014, 11:13:44 AM6/5/14
to concurr...@googlegroups.com


On Tuesday, May 27, 2014 8:09:51 PM UTC+5:30, sbahra wrote:
Hi Aval,

If you are going to be enqueuing pointers then you should be passing
in a pointer to storage space whose lifetime matches that of the
lifetime of the object. In this case ck_ring_buffer_t has sufficient
storage for a buffer of pointers. In your example, it appears you are
trying to store the object value directly in the ring. I do not
recommend doing that at 64 bytes or at least make sure to benchmark
things. If you want additional examples, look at
http://facebook.github.io/libphenom/

If you wish to store the objects directly inside of the ring buffer
(which may not be beneficial at 64 bytes an entry) then please use the
upper-case family of ck_ring beginning with CK_RING_PROTOTYPE. This
defines a ring whose individual storage elements are of the type of
the value you are attempting to store.


Thank you for your guidance, few of my doubts got cleared.  I am attaching my code; earlier we had CK_RING_INIT macro but (I hope I am correct) now we have function that takes name and size, while buffer is initialized and set using ck_ring_buffer_t.

So now with old version I get  :  max time is in micro.

 ctrl +c pressed:  Total enq:50000003 total deq:50000001 max_time:714 difference of enq and deq: 2

and new version:

  ctrl +c pressed:  Total enq:50000003 total deq:50000001 max_time:317 difference of enq and deq: 2

I need to run few more times after clearing cache but this are initial results.

I am still doing structure copy, but on your point of passing pointer, I have doubt (this is nothing to do with CK but my application).  I get a compressed (XceedZip of Xceed.com) buffer over a TCP/IP which gets uncompressed into chunk containing 1 to 300 packets (of 65 bytes size).  I start queuing this data from this thread into two ck_ring_buffer (I am not able to use multiple consumer since each chunk needs to be processed in sequence). The processing would take any where between 1 to 20 micro seconds.  And I am reusing buffer for decompression in reading thread.  So in this case 

1)  ck_ring_* is correct data structure to use or should I look at other things?

2) what is the best method to give up cpu at time - for example my decompression is in progress and data is not ready than I want consumers to wait.  So using ck_pr_stall is correct in busy waiting or something else can be done. Is it better to use same buffer and make producer wait in while (CK_ENQUE...) loop?

Thanks again for your time and guidance.
Regards
Aval S.















test-ck.c

sbahra

unread,
Sep 16, 2014, 12:14:07 PM9/16/14
to concurr...@googlegroups.com
Hi Aval,

ck_ring is the right structure to use if you want bounded FIFO semantics, overwrite is currently not supported but is possible to support. For question 2, consider using the "try*" family of functions. In this way you can block in any way you wish, spinning unbounded on CPU may not be the best bet (to see sane blocking semantics, google around for adaptive mutex implementations).

Michael Haberler

unread,
Sep 12, 2015, 7:18:17 AM9/12/15
to Concurrency Kit
Hi Samy,


On Tuesday, May 27, 2014 at 4:39:51 PM UTC+2, sbahra wrote:
Hi Aval,

If you are going to be enqueuing pointers then you should be passing
in a pointer to storage space whose lifetime matches that of the
lifetime of the object. In this case ck_ring_buffer_t has sufficient
storage for a buffer of pointers. In your example, it appears you are
trying to store the object value directly in the ring. I do not
recommend doing that at 64 bytes or at least make sure to benchmark
things. If you want additional examples, look at
http://facebook.github.io/libphenom/

If you wish to store the objects directly inside of the ring buffer
(which may not be beneficial at 64 bytes an entry) then please use the
upper-case family of ck_ring beginning with CK_RING_PROTOTYPE. This
defines a ring whose individual storage elements are of the type of
the value you are attempting to store.

could you shed light on what the issue is with storing objects in the ring directly instead of pointers? 

'not be beneficial at 64 bytes an entry' - why, too large or too small to make sense?

background: I need rings for rather short structs - a tad larger than a uint64 but generally < say 32bytes, for coordination between threads, and I would like to avoid managing the buffers at all, so writing those structs directly to the ring would make most sense to me

- Michael

Samy Al Bahra

unread,
Sep 12, 2015, 9:39:29 AM9/12/15
to Concurrency Kit

Hi Michael,

Really depends on use-case, requires you measure it. Just need to make sure that copy overhead is smaller than dereference overhead (and sometimes expensive overhead of allocator).

Samy Al Bahra

unread,
Sep 12, 2015, 10:17:23 AM9/12/15
to concurr...@googlegroups.com

Besides correctness issues here, granularity of gettimeofday is unclear nor is cost relative to the dequeue.


Reply all
Reply to author
Forward
0 new messages