more on the peer code

5 views

Skip to first unread message

borislav...@gmail.com

unread,

Feb 9, 2009, 2:29:43 PM2/9/09

to HyperGraphDB

Cipri,

I'll keep posting questions as I encounter them. Feel free to respond whenever you have time. So, two more:

1) Looks like every new task takes up a Java thread until it's complete? Why is that? It's probably simpler to implement, but it doesn't seem right to me. Threads are generally used only during processing, and should be free for other tasks while waiting for another peer's message to arrive.

2) Looks like broadcasting a message to all peers in the group is implemented by looping over all know peers to the current peer. At least that's how QueryTaskClient is implemented. Doesn't JXTA have a broadcast mechanism already implemented? If so, we should probably have a "BroadcastActivity" analogous to the SendActivity, no?

3) The (performative, action) pair that identifies task types is working fine, but looking back at it, it doesn't seem very clean conceptually :( Why the need for such a composite key? The performative by itself is of no use....maybe there's a need to rethink this a little bit. The semantics of the performatives are not implemented at their level of abstraction

I'm thinking of implementing some sort of HGDB based identity mechanism on top of JXTA where the unique ID is an auto-generated UUID, instead of a user assigned peer name. The latter still remains and has an informative value, but I think it's going to be problematic to ensure uniqueness of those names. One of the reasons for this sort of ID is persistence: other peers should be able to store information about a particular peer and have some level of confidence so that when that peer comes online, the information they've stored remains valid. So the ID should be tied to the underlying HGDB instance.

The ID is going to be represented by the HGPeerIdentity class and it will contain the UUID by which everybody else should be able to refer to the peer. It'll also contain the user assigned name, the hostname, ipaddress and directory location of the underlying HGDB instance. A problem arises when a HGDB instance is copied on the filesystem from one location to another - this is a very common pattern (at least in my work), because it's much faster to load a database once and then copy it byte-for-byte than loading it separately at multiple locations. But when that happens, the original peer ID gets replicated and there's a conflict which must be resolved by some sort of negotiation: the original peer can be identified by the extra identity attributes such as hostname and directory path to the HGDB and it should then keep it's UUID; copies will simply generate a new UUID for themselves. That's roughly my thinking for now, but it's not without problems because peers don't come up online in a synchronous fashion. It could be simplified so that when a peer detects that its hostname and/or graph location has changed, it assigns itself a new UUID right away. But that won't work in the very valid use-case where a HGDB is moved (as opposed to copied over) to a different location, but it makes sense to preserve its identity.

Cheers,
Boris

Ciprian Costa

unread,

Feb 10, 2009, 4:21:59 PM2/10/09

to hyperg...@googlegroups.com

Hi Boris

On Mon, Feb 9, 2009 at 9:29 PM, <borislav...@gmail.com> wrote:

Cipri,

I'll keep posting questions as I encounter them. Feel free to respond whenever you have time. So, two more:

1) Looks like every new task takes up a Java thread until it's complete? Why is that? It's probably simpler to implement, but it doesn't seem right to me. Threads are generally used only during processing, and should be free for other tasks while waiting for another peer's message to arrive.

Yes, I agree. I was thinking that we could do this transparently to the tasks. We could have an executor that manages the threads in the background.

2) Looks like broadcasting a message to all peers in the group is implemented by looping over all know peers to the current peer. At least that's how QueryTaskClient is implemented. Doesn't JXTA have a broadcast mechanism already implemented? If so, we should probably have a "BroadcastActivity" analogous to the SendActivity, no?

Our broadcast has a filter in front of it, we do not just publish to everyone. Eve if in this case QueryTaskClient just loops over all the peers, I think that it will be very important to control the traffic in the network, so it makes sense to have our own tools for initializing conversations with various peers.

3) The (performative, action) pair that identifies task types is working fine, but looking back at it, it doesn't seem very clean conceptually :( Why the need for such a composite key? The performative by itself is of no use....maybe there's a need to rethink this a little bit. The semantics of the performatives are not implemented at their level of abstraction

The idea was that the same performative could apply to diferent actions that would be handled by different tasks. For example Request can be for atom interests or query. Also, the same action can be invoked with different performatives, for example atom interests could be invoked with Request or Inform.
Do you think we should remove the performative from the key, have multiple tasks per action and a way to choose between the various tasks associated with an action based on the performative and other components of the message?

I'm thinking of implementing some sort of HGDB based identity mechanism on top of JXTA where the unique ID is an auto-generated UUID, instead of a user assigned peer name. The latter still remains and has an informative value, but I think it's going to be problematic to ensure uniqueness of those names. One of the reasons for this sort of ID is persistence: other peers should be able to store information about a particular peer and have some level of confidence so that when that peer comes online, the information they've stored remains valid. So the ID should be tied to the underlying HGDB instance.

The ID is going to be represented by the HGPeerIdentity class and it will contain the UUID by which everybody else should be able to refer to the peer. It'll also contain the user assigned name, the hostname, ipaddress and directory location of the underlying HGDB instance. A problem arises when a HGDB instance is copied on the filesystem from one location to another - this is a very common pattern (at least in my work), because it's much faster to load a database once and then copy it byte-for-byte than loading it separately at multiple locations. But when that happens, the original peer ID gets replicated and there's a conflict which must be resolved by some sort of negotiation: the original peer can be identified by the extra identity attributes such as hostname and directory path to the HGDB and it should then keep it's UUID; copies will simply generate a new UUID for themselves. That's roughly my thinking for now, but it's not without problems because peers don't come up online in a synchronous fashion. It could be simplified so that when a peer detects that its hostname and/or graph location has changed, it assigns itself a new UUID right away. But that won't work in the very valid use-case where a HGDB is moved (as opposed to copied over) to a different location, but it makes sense to preserve its identity.

We could change the membership service to not require an id that we generate, and use the ids that are generated by JXTA. These are not stored in the database, but instead configured in the configuration file (if you do not specify one, one is generated by JXTA). Maybe in this way it would also be easier to manage the scenarios you mentioned by requiring the user to change the configuration depending on what he actually intends to do (I think this is safer than trying to guess what he wants).

Regards,
Cipri

Cheers,
Bori

Borislav Iordanov

unread,

Feb 15, 2009, 10:55:41 PM2/15/09

to hyperg...@googlegroups.com

Hi Cipri,

>> 1) Looks like every new task takes up a Java thread until it's complete?
>> Why is that? It's probably simpler to implement, but it doesn't seem right
>> to me. Threads are generally used only during processing, and should be free
>> for other tasks while waiting for another peer's message to arrive.
>
> Yes, I agree. I was thinking that we could do this transparently to the
> tasks. We could have an executor that manages the threads in the background.

There's already an executor that manages threads in the background,
the executor service for the PeerInterface (hmm, actually I don't
remember anymore if I added this). But the TaskActivity.doRun method
is designed based on the fact that a task instance takes up a thread
and doesn't release it until it reaches an end state. I spend some
time trying to understand how tasks and conversations interact. It's a
bit complicated, but that's not necessarily due to the design which
after all deals with decoupled asynchronous processes :) We'll have to
see if the flow can be made more easy to follow from the code. For
example, one other thing that confuses me a little bit is why should
conversations be activities since all the work and transition methods
are supposed to be in the task? I'm also having trouble with the
assumptions and rules behind the logic of the framework - it's not so
easy to see by reading the code. For example, I can't answer a very
simple question for myself: what do I need to do to implement a
conversation b/w two peers and when, during the workflow within a
task, is it appropriate to do so? There's
TaskActivity.createNewConversation method that returns null, but from
the wiki page (http://code.google.com/p/hypergraphdb/wiki/HowToCreateTasks)
I was under the assumption that I only need to register conversation
handlers and in order to have a conversation going with some peer.

Another note is that the TaskActivity.doRun method, essentially the
method that eats up a thread for the lifetime of a task, has a logic
that seems to make potentially dangerous assumptions:

- if there's activity queue for the current state then block and wait
for an activity to appear in that queue. This assumes that an activity
will eventually appear in the queue. It also assumes that the state
would not change in the meantime.
- if there's no activity for the current state and waits for the state
to change, but what if in the meantime a new activity appears in the
queue and the state never changes before that activity is executed.

But, maybe I simply didn't understand how the framework works in
enough detail. Basically, I tried to see how I can make that method
disappear, but I don't understand what needs to be done. It doesn't
look so easy, because once we give up the threads associated with each
task, we'd need to basically maintain a slew of data structure of
scheduled activities for different tasks and conversations, prioritize
them and assign them to threads in the pool by monitoring state
changes. My first reaction was to create and submit a Runnable
instance in the TaskActivity.stateChanged method instead of submitting
an activity to the queue for the new state. But I'm not sure that's
gonna work because the "interestedState" (I'm referring to a variable
in that method) might not be current state of the task. In summary,
the whole design might need to be changed. Or?

Finally, a smaller design point: new tasks are created and submitted
for execution the first time a message is received, and from then on
new messages are handled with the 'handleMessage'. So the first is
with a special status, it's passed in the constructor and then
'startTask' uses it to respond. But it seems to me that the pattern
should be that all messages be dealt with in 'handleMessage' and the
shouldn't be a need to have a 'msg' member variable in the task just
so that the first message may be responded. It's bit unnatural...

Cheers,
Boris

Borislav Iordanov

unread,

Feb 15, 2009, 11:06:23 PM2/15/09

to hyperg...@googlegroups.com

>> 3) The (performative, action) pair that identifies task types is working
>> fine, but looking back at it, it doesn't seem very clean conceptually :( Why
>> the need for such a composite key? The performative by itself is of no
>> use....maybe there's a need to rethink this a little bit. The semantics of
>> the performatives are not implemented at their level of abstraction
>
> The idea was that the same performative could apply to diferent actions that
> would be handled by different tasks. For example Request can be for atom
> interests or query. Also, the same action can be invoked with different
> performatives, for example atom interests could be invoked with Request or
> Inform.
> Do you think we should remove the performative from the key, have multiple
> tasks per action and a way to choose between the various tasks associated
> with an action based on the performative and other components of the
> message?

I think that it comes down to finding an appropriate factory when a
new message is received. And I would say that the action/task type
should be enough to identify the TaskFactory. Then if the TaskFactory
is implemented to create different types of tasks for different
performatives, that's irrelevant to the framework since after that
tasks are looked up by ID.

Probably because of my un-familiarty with JXTA, I simply did my
separate thing where peers establish identities through an
AffirmIdentityTask whenever a new peer joins the network. Currently,
conflict is unlikely since upon startup each peer ensures that
hostname/ip/graphlocation remain the same, otherwise it changes
identity by generating a new UUID.