Specialized HyperLoops/RingBuffers

33 zobrazení
Přeskočit na první nepřečtenou zprávu

Rajiv Kurian

nepřečteno,
19. 2. 2014 18:26:4919.02.14
komu: la...@googlegroups.com
I like how the project has specialized (int and long) versions of the HyperLoop. Just thinking out loud, but one could use code-gen to generate specialized hyper-loop/ring-buffer code for classes that are solely composed of primitives. With the proper getters and setters generated, it would be easy for an end user to re-use the ring-buffer entries trivially. This will allow for a C struct like layout.

Example use case:

Say we use the epoll support in Landz to create a game server. A producer thread could accept connections and read data. It could use the specialized hyperloop to transfer data to thread that updates game state. An example event could be a struct composed of FD of the remote client and a few ints/longs required for the player's game control data. All of this would be laid out sequentially like a C struct without any indirection. The consumer thread could consume these events, update game state and send any necessary updates (using the FD) to the client.

Jin Mingjian

nepřečteno,
19. 2. 2014 20:41:2119.02.14
komu: Rajiv Kurian, la...@googlegroups.com
Hi, Rajiv, 

thanks for coming again:) I accidently recently read some of your posts at Dmitriy's lock-free. I think we are reaching to one place in some keys of the backend engineering:) 

As for this idea, I am not sure I fully understand you. But for allowing a C struct like slot in HL/RB, it is a not trivial thing. 

As you may recall we discussed in your first post(and this group's first post^_^), it is not trivial to guarantee the atomicity(then the thread-safe) of consuming an Object slot. This also applies to your C struct like slot. 

Assumed we just return the address of one slot,  who knows the whole slot has been consumed by consumer? we need a interaction from consumer to indicate this. If we rely on copying, then we lose the direction. Although copying may be possible, this is related to many memory operations on bus plus a mfence. This makes non-full-fence HL full-fence-ed like the Disruptor. 

If you have some reading to landz's source, I use one off-heap area(array) with corresponding to the HyperLoop in z's http module. This just adds a indirection cost of one memory addressing. (We also indeed discuss this schema in your first post, but we do not discuss any practical case there.) 

But this has not been finished:( But I have not seen any barrier for this. I just to finish the primary module supporting now. I plan to continue the http module work today or tomorrow.

Finally, your this request is specially interesting for some reasons. I admit that I have not fully thought whether the interaction from consumer to producer become simple or stay complex. But I, personally, do not like the whole complex work to drive Disruptor's RB(I just say this for the current structure of Disruptor, not for the idea of Disruptor and the men behind that idea). Maybe I or we can brain more in some day.

best regards,
Jin



--
You received this message because you are subscribed to the Google Groups "landz" group.
To unsubscribe from this group and stop receiving emails from it, send an email to landz+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rajiv Kurian

nepřečteno,
19. 2. 2014 20:53:3719.02.14
komu: la...@googlegroups.com, Rajiv Kurian


On Wednesday, February 19, 2014 5:41:21 PM UTC-8, Jin Mingjian wrote:
Hi, Rajiv, 

thanks for coming again:) I accidently recently read some of your posts at Dmitriy's lock-free. I think we are reaching to one place in some keys of the backend engineering:)
:) 
 

As for this idea, I am not sure I fully understand you. But for allowing a C struct like slot in HL/RB, it is a not trivial thing. 

As you may recall we discussed in your first post(and this group's first post^_^), it is not trivial to guarantee the atomicity(then the thread-safe) of consuming an Object slot. This also applies to your C struct like slot. 

Assumed we just return the address of one slot,  who knows the whole slot has been consumed by consumer? we need a interaction from consumer to indicate this. If we rely on copying, then we lose the direction. Although copying may be possible, this is related to many memory operations on bus plus a mfence. This makes non-full-fence HL full-fence-ed like the Disruptor.
Yup you need an interaction from the consumer to indicate that you have consumed a complete slot. Definitely won't work without that. I guess you don't like the Disruptor's technique of indicating when it has finished consuming an event.
 

If you have some reading to landz's source, I use one off-heap area(array) with corresponding to the HyperLoop in z's http module. This just adds a indirection cost of one memory addressing. (We also indeed discuss this schema in your first post, but we do not discuss any practical case there.)
I'll look at the source to figure out what you mean exactly. Not completely clear ATM. 
 

But this has not been finished:( But I have not seen any barrier for this. I just to finish the primary module supporting now. I plan to continue the http module work today or tomorrow.

Finally, your this request is specially interesting for some reasons. I admit that I have not fully thought whether the interaction from consumer to producer become simple or stay complex. But I, personally, do not like the whole complex work to drive Disruptor's RB(I just say this for the current structure of Disruptor, not for the idea of Disruptor and the men behind that idea). Maybe I or we can brain more in some day.
Any more details on why you don't like the API exactly?

Jin Mingjian

nepřečteno,
19. 2. 2014 22:10:3819.02.14
komu: Rajiv Kurian, la...@googlegroups.com
Rajiv, if you read the source of Disruptor(I show some small parts in your first post), it uses at least a full fence(CAS) and many branches(if/else) in xxxProcessor to guarantee the whole worked. The current HL is not-full-fence based. The advantage of not-full-fence is in the latency(but with care designing higher throughput is there of course). Landz's HL has (substantially) smaller round-trip latency than that of Disruptor in two-threads ping-pong(The Disruptor has only one ping-pong test, so I feel not safe to add more-threads contrasts).  

From API aspect, leaking the internal to external is not very good if we assume clients may do some stupid. Then, we need a similar xxxProcessor to wrap the consuming logic to get things done. Then, we need a similar facility again for make xxxProcessor thread-safe. Then, the question is why you and I does not Disruptor?:) One answer is the Disruptor does not offer offheap object. So, I say " your this request is specially interesting".

I will further evaluate your request:) I want to do some practices to see whether this further-complex structure gives us more. I may ask you to discuss more again in the next or next next wee:)

very thanks,
Jin

Rajiv Kurian

nepřečteno,
19. 2. 2014 22:54:3519.02.14
komu: la...@googlegroups.com, Rajiv Kurian


On Wednesday, February 19, 2014 7:10:38 PM UTC-8, Jin Mingjian wrote:
Rajiv, if you read the source of Disruptor(I show some small parts in your first post), it uses at least a full fence(CAS) and many branches(if/else) in xxxProcessor to guarantee the whole worked. The current HL is not-full-fence based. The advantage of not-full-fence is in the latency(but with care designing higher throughput is there of course). Landz's HL has (substantially) smaller round-trip latency than that of Disruptor in two-threads ping-pong(The Disruptor has only one ping-pong test, so I feel not safe to add more-threads contrasts).
I know I am bike-shedding a bit but which CAS are you talking about? Is it this one? That seems like an extraneous feature where you might migrate a consumer from one thread to the other (via halt). It's only called once during the lifetime of a typical consumer. A simple putOrderedLong should be enough to let the producer know where the consumer is at in the SPSC use case.
 
 

From API aspect, leaking the internal to external is not very good if we assume clients may do some stupid. Then, we need a similar xxxProcessor to wrap the consuming logic to get things done. Then, we need a similar facility again for make xxxProcessor thread-safe. Then, the question is why you and I does not Disruptor?:) One answer is the Disruptor does not offer offheap object. So, I say " your this request is specially interesting".
Right it definitely adds more responsibility on the producer and consumer side to keep the contract up. 

I will further evaluate your request:) I want to do some practices to see whether this further-complex structure gives us more. I may ask you to discuss more again in the next or next next wee:)
I was just thinking out loud. Definitely not a priority feature. Also as you noted it is fundamentally impossible unless the API is more disruptor like with a separate markConsumed function.

Looking forward to more progress. It would be really interesting to see the HTTP module done, so one could compare the performance and the API with other NIO based libraries out there.

Jin Mingjian

nepřečteno,
20. 2. 2014 0:25:0120.02.14
komu: Rajiv Kurian, la...@googlegroups.com
> It's only called once during the lifetime of a typical consumer.
right for your usage case. That is, you wrap all logic in the EventProcessor. But, this is not good for one common ITC case: I have two threads want to talk each other in some point. In almost time, they do not care each others until they want to talk some. The two threads do very different things. We can not wrap the whole logic in one Processor for many reasons. 

The possible way I can see is still to wrap the consuming logic to the Processor but invoke yourself somewhere. (Or you have some suggestion?) Or you customize own processing logic to use(maybe this is your case shown in your example). But now you leave the Disruptor, so Disruptor can not guarantee your safety more. You guarantee yourself. This is only good for the man who is familiar with the Disruptor framework. And this is why pure data structure like z's HL lives, but framework-RB screws up. But again, now, it indeed does not directly cover your request:)

your idea is yes: putOrderedLong should be OK, but this adds n(n=num of consumer) memory addressing and corresponding checking form the side of producer. OK, these prices are very small. But sometimes, if you can guarantee the safety of your consuming to one object for some reason, then all of these operation may be not necessary. "All devils are in small details". The landz is designed to be bare-metal-in-mind. That is the meaning of current base data structures in Channel APIs.

I am also so excited to wait the next week or so 's benchmark:) 
 
Jin



Odpovědět všem
Odpověď autorovi
Přeposlat
0 nových zpráv