A Generic Fault-Tolerant Architecture for Real-Time by David Powell

The layout of computers to be embedded in serious real-time functions is a posh job. Such platforms mustn't ever simply warrantly to fulfill demanding real-time closing dates imposed by means of their actual surroundings, they have to warrantly to take action dependably, regardless of either actual faults (in undefined) and layout faults (in or software). A fault-tolerance strategy is needed for those promises to be commensurate with the security and reliability standards of many lifestyles- and mission-critical purposes. This booklet explains the motivations and the result of a collaborative project', whose aim used to be to seriously lessen the lifecycle expenses of such fault­ tolerant platforms. The end-user businesses partaking during this undertaking already install fault-tolerant platforms in serious railway, house and nuclear-propulsion functions. even if, those are proprietary structures whose architectures were adapted to satisfy domain-specific standards. This has resulted in very high priced, rigid, and sometimes hardware-intensive recommendations that, by the point they're built, established and authorized to be used within the box, can already be out-of-date when it comes to their underlying and software program technology.

Inter-Channel Communication Network 36 We have thus chosen the second alternative. This implies that, in the case n = 3, the probability of occurrence of a Byzantine clock (expected to be extremely small) should be considered for very critical applications. 2 The GUARDS Algorithm The algorithm actually implemented in GUARDS is a convergence-averaging algorithm, with a fault-tolerant averaging function F that depends on the number of active nodes (it is actually the LL algorithm applied to the four-node case).

The adopted synchronisation algorithm is then detailed, including the way by which initial synchronisation is achieved. 1 Definitions and Notations We consider a set of fully-connected channels or nodes. Each node has a physical clock and computes a logical clock time. Definition 1: Clock Time Each node maintains a logical clock time T = C(t), meaning that at real-time t the clock time of the local node is T. By convention, variables associated with clock time (local to a given node) are in uppercase, whereas variables associated with real time (measured in an assumed global Newtonian frame) are in lowercase.

2 TypIcal value p Maximum drift rate of all non-faulty physical clocks f Quartz frequency of physical clocks 10 MHz 0 Maximum skew between any two non-faulty logical clocks 100 IlS d Upper bound on the transmission delay SOIlS error1 E Upper bound on the read n Number of clocks m Maximum number of (arbitrarily) faulty clocks R Resynchronisation interval 10-4 seclsec 11lS :;-;4 1 500ms Existing Algorithms Many clock synchronisation algorithms are described in the literature. [Ramanathan et al. 1990] provides a good survey of these algorithms.

