cs5012009-CHAPTER 6 Synchronization Problems

7,742 views

Skip to first unread message

Hongfei Yan

unread,

Apr 21, 2009, 11:04:36 PM4/21/09

to cs50...@googlegroups.com

1. Q: Name at least three sources of delay that can be introduced between WWV broadcasting the time and the processors in a distributed system setting their internal clocks.

2. Q: Consider the behavior of two machines in a distributed system. Both have clocks that are supposed to tick 1000 times per millisecond. One of them actually does, but the other ticks only 990 times per millisecond. If UTC updates come in once a minute, what is the maximum clock skew that will occur?

4. Q: When a node synchronizes its clock to that of another node, it is generally a good idea to take previous measurements into account as well. Why? Also, give an example of how such past readings could be taken into account.

6. Q: To achieve totally-ordered multicasting with Lamport timestamps, is it strictly necessary that each message is acknowledged?

7. Q: Consider a communication layer in which messages are delivered only in the order that they were sent. Give an example in which even this ordering is unnecessarily restrictive.

8. Q: Many distributed algorithms require the use of a coordinating process. To what extent can such algorithms actually be considered distributed? Discuss.

9. Q: In the centralized approach to mutual exclusion (Fig. 6-0), upon receiving a message from a process releasing its exclusive access to the resources it was using, the coordinator normally grants permission to the Œrst process on the queue. Give another possible algorithm for the coordinator.

10. Q: Consider Fig. 6-0 again. Suppose that the coordinator crashes. Does this always bring the system down? If not, under what circumstances does this happen? Is there any way to avoid the problem and make the system able to tolerate coordinator crashes?

11. Q: Ricart and Agrawala's algorithm has the problem that if a process has crashed and does not reply to a request from another process to access a resources, the lack of response will be interpreted as denial of permission. We suggested that all requests be answered immediately to make it easy to detect crashed processes. Are there any circumstances where even this method is insufŒcient? Discuss.

12. Q: How do the entries in Fig. 6-0 change if we assume that the algorithms can be implemented on a LAN that supports hardware broadcasts?

13. Q: A distributed system may have multiple, independent resources. Imagine that process 0 wants to access resource A and process 1 wants to access resource B. Can Ricart and Agrawala's algorithm lead to deadlocks? Explain your answer.

14. Q: Suppose that two processes detect the demise of the coordinator simultaneously and both decide to hold an election using the bully algorithm. What happens?

15. Q: In Fig. 6-0 we have two ELECTION messages circulating simultaneously. While it does no harm to have two of them, it would be more elegant if one could be killed off. Devise an algorithm for doing this without affecting the operation of the basic election algorithm.

Hongfei Yan

unread,

Jun 8, 2009, 4:12:35 AM6/8/09

to cs50...@googlegroups.com

1. Q: Name at least three sources of delay that can be introduced between WWV broadcasting the time and the processors in a distributed system setting their internal clocks.

A: First we have signal propagation delay in the atmosphere. Second we might have collision delay while the machines with the WWV receivers fight to get on the Ethernet. Third, there is packet propagation delay on the LAN. Fourth, there is delay in each processor after the packet arrives, due to interrupt processing and internal queueing delays.

2. Q: Consider the behavior of two machines in a distributed system. Both have clocks that are supposed to tick 1000 times per millisecond. One of them actually does, but the other ticks only 990 times per millisecond. If UTC updates come in once a minute, what is the maximum clock skew that will occur?

A: The second clock ticks 990,000 times per second, giving an error of 10 msec per second. In a minute this error has grown to 600 msec. Another way of looking at it is that the second clock is one percent slow, so after a minute it is off by 0. 01 ´ 60 sec, or 600 msec.

4. Q: When a node synchronizes its clock to that of another node, it is generally a good idea to take previous measurements into account as well. Why? Also, give an example of how such past readings could be taken into account.

A: The obvious reason is that there may be an error in the current reading. Assuming that clocks need only be gradually adjusted, one possibility is to consider the last N values and compute a median or average. If the measured value falls outside a current interval, it is not taken into account (but is added to the list). Likewise, a new value can be computed by taking a weighted average, or an aging algorithm.

6. Q: To achieve totally-ordered multicasting with Lamport timestamps, is it strictly necessary that each message is acknowledged?

A: No, it is sufficient to multicast any other type of message, as long as that message has a timestamp larger than the received message. The condition for delivering a message m to the application, is that another message has been received from each other process with a large timestamp. This guarantees that there are no more messages underway with a lower timestamp.

7. Q: Consider a communication layer in which messages are delivered only in the order that they were sent. Give an example in which even this ordering is unnecessarily restrictive.

A: Imagine the transfer of a large image which, to that end, has been divided into consecutive blocks. Each block is identified by its position in the original image, and possibly also its width and height. In that case, FIFO ordering is not necessary, as the receiver can simply paste each incoming block into the correct position.

8. Q: Many distributed algorithms require the use of a coordinating process. To what extent can such algorithms actually be considered distributed? Discuss.

A: In a centralized algorithm, there is often one, fixed process that acts as coordinator. Distribution comes from the fact that the other processes run on different machines. In distributed algorithms with a nonfixed coordinator, the coordinator is chosen (in a distributed fashion) among the processes that form part of the algorithm. The fact that there is a coordinator does not make the algorithm less distributed.

9. Q: In the centralized approach to mutual exclusion (Fig. 6-14), upon receiving a message from a process releasing its exclusive access to the resources it was using, the coordinator normally grants permission to the Œrst process on the queue. Give another possible algorithm for the coordinator.
A: Requests could be associated with priority levels, depending on their importance. The coordinator could then grant the highest priority request first.

10. Q: Consider Fig. 6-0 again. Suppose that the coordinator crashes. Does this always bring the system down? If not, under what circumstances does this happen? Is there any way to avoid the problem and make the system able to tolerate coordinator crashes?

A: Suppose that the algorithm is such that every request is answered immediately, either with permission or with denial. If there are no processes accessing resources and no processes queued, then a crash is not fatal. The next process to request permission will fail to get any reply at all, and can initiate the election of a new coordinator. The system can be made even more robust by having the coordinator store every incoming request on disk before sending back a reply. In this way, in the event of a crash, the new coordinator can reconstruct the list of accessed resources and the queue by reading the file from the disk.

11. Q: Ricart and Agrawala's algorithm has the problem that if a process has crashed and does not reply to a request from another process to access a resources, the lack of response will be interpreted as denial of permission. We suggested that all requests be answered immediately to make it easy to detect crashed processes. Are there any circumstances where even this method is insuffcient? Discuss.
A: Suppose that a process denies permission and then crashes. The requesting process thinks that it is alive, but permission will never come. One way out is to have the requester not actually block, but rather go to sleep for a fixed period of time, after which it polls all processes that have denied permission to see if they are still running.

12. Q: How do the entries in Fig. 6-17 change if we assume that the algorithms can be implemented on a LAN that supports hardware broadcasts?
A: Only the entries for the distributed case change. Because sending a point-to-point message is as expensive as doing a broadcast, we need only send one broadcast message to all processes requesting access to the resource. Likewise, only one release broadcast message is needed. The delay becomes 1 + (n - 1): one delay coming from the broadcast request, and an additional n -1 as we
still need to receive a message from each other process before being allowed to access the resource.

13. Q: A distributed system may have multiple, independent resources. Imagine that process 0 wants to access resource A and process 1 wants to access resource B. Can Ricart and Agrawala's algorithm lead to deadlocks? Explain your answer.

A: It depends on the ground rules. If processes access resources strictly sequentially, that is, a process holding a resource may not attempt to access another one, then there is no way that it can block while holding a resource that some other process wants. The system is then deadlock free. On the other hand, if process 0 may hold resource A and then try to access resource B, a deadlock can occur if some other process tries to acquire them in the reverse order. The Ricart and Agrawala algorithm itself does not contribute to deadlock
since each resource is handled independently of all the others.

14. Q: Suppose that two processes detect the demise of the coordinator simultaneously and both decide to hold an election using the bully algorithm. What happens?

A: Each of the higher-numbered processes will get two ELECTION messages, but will ignore the second one. The election will proceed as usual.

15. Q: In Fig. 6-21 we have two ELECTION messages circulating simultaneously. While it does no harm to have two of them, it would be more elegant if one could be killed off. Devise an algorithm for doing this without affecting the operation of the basic election algorithm.
A: When a process receives an ELECTION message, it checks to see who started it. If it itself started it (i.e., its number is at the head of the list), it turns the message into a COORDINATOR message as described in the text. If it did not start any ELECTION message, it adds its process number and forwards it along the ring. However, if it did send its own ELECTION message earlier and it has just discovered a competitor, it compares the originator's process number with its own. If the other process has a lower number, it discards the message
instead of passing it on. If the competitor is higher, the message is forwarded in the usual way. In this way, if multiple ELECTION messages are started, the one whose first entry is highest survives. The rest are killed off along the route.

Reply all

Reply to author

Forward

0 new messages