Redundancy to mask failures

Types of redundancy

  • Information: Add extra bits to the data units so that the errors can be recovered when the bits are damaged.

  • Temporal: Design the system in such a way that an action can be performed again in case of something wrong happens. Usually used when the failures are transient or intermittent.

  • Physical: Adding equipment or processes in such a way to allow that one or more components can fail. This is typically used in distributed systems.

Process resilience

Protecting against the malfunction of a process through the replication of processes, and organizing multiple processes in process groups. Distinguishing between plain groups and hierarchical groups.

Groups and failure masks

K-fault tolerant group

When a group can concurrently mask any failure of its members (k is the tolerance degree to failures).

How big does it need to be?

  • With terminal failures (crash/omission/temporal): we need k+1 members, and no member will produce an incorrect result provided that a single member is sufficient.

  • With arbitrary failures: we need 2k+1 members so that the correct result can be obtained by a majority of votes.

Important assumptions

  • All the members are equal.

  • All the members process the commands in the same order.

We need to be sure that all the processes do the same.

Last updated