Redundancy to mask failures
Last updated
Last updated
Information: Add extra bits to the data units so that the errors can be recovered when the bits are damaged.
Temporal: Design the system in such a way that an action can be performed again in case of something wrong happens. Usually used when the failures are transient or intermittent.
Physical: Adding equipment or processes in such a way to allow that one or more components can fail. This is typically used in distributed systems.
Protecting against the malfunction of a process through the replication of processes, and organizing multiple processes in process groups. Distinguishing between plain groups and hierarchical groups.
When a group can concurrently mask any failure of its members (k is the tolerance degree to failures).
With terminal failures (crash/omission/temporal): we need k+1 members, and no member will produce an incorrect result provided that a single member is sufficient.
With arbitrary failures: we need 2k+1 members so that the correct result can be obtained by a majority of votes.
All the members are equal.
All the members process the commands in the same order.
We need to be sure that all the processes do the same.