Redundancy to mask failures
Types of redundancy
Information: Add extra bits to the data units so that the errors can be recovered when the bits are damaged.
Temporal: Design the system in such a way that an action can be performed again in case of something wrong happens. Usually used when the failures are transient or intermittent.
Physical: Adding equipment or processes in such a way to allow that one or more components can fail. This is typically used in distributed systems.
Process resilience
Protecting against the malfunction of a process through the replication of processes, and organizing multiple processes in process groups. Distinguishing between plain groups and hierarchical groups.


Groups and failure masks
K-fault tolerant group
When a group can concurrently mask any failure of its members (k is the tolerance degree to failures).
How big does it need to be?
With terminal failures (crash/omission/temporal): we need k+1 members, and no member will produce an incorrect result provided that a single member is sufficient.
With arbitrary failures: we need 2k+1 members so that the correct result can be obtained by a majority of votes.
Important assumptions
All the members are equal.
All the members process the commands in the same order.
We need to be sure that all the processes do the same.
Last updated