Notes - MIECT
Computação Distribuída
Notes - MIECT
Computação Distribuída
  • Computação Distribuída
  • Introduction / Architecture
    • Distributed Systems
    • Architecture
    • Middleware Organizations
    • Processes
    • Threads
    • Virtualization
    • Clients
    • Servers
    • Migration
  • Communications
    • OSI Model
    • Middleware Layer
    • Types of Communication
    • Remote Call Procedure (RPC)
    • Sockets
    • Application-level Multicasting
  • Naming
    • Names
    • Addresses
    • Identifiers
    • Naming Systems
      • Flat Naming
      • Structured Naming
    • Internet Domain Name System (DNS)
    • Attribute-based naming - LDAP
  • Coordination
    • Clocks
      • Synchronizing without UTC
    • Reference Broadcast Synchronization – RBS
    • Happened-Before Relation
      • Logical Clocks
      • Vector Clocks
    • Mutual Exclusion Algorithms
    • Election Algorithms
    • Distributed Events Correspondance
  • Consistency & Replication
    • Replication
    • Performance and Scalability
    • Client-centric models
    • Replicates
    • Unicasting vs. Multicasting
    • Continuous Consistency
    • Protocols
  • Flaw Tolerance
    • Dependability
    • Terminology
    • Confidence vs. Security
    • Halting failures
    • Redundancy to mask failures
    • Consensus
      • Realistic
      • Consensus in arbitrary failures
      • Achieving failure tolerance
      • Distributed consensus
    • Failure Detection
    • Reliable RPCs
    • Distributed commit protocols
  • Python asyncio & Friends
    • Async
    • Sync vs. Async
    • Tools
  • Flask
    • Introduction
    • Python Requests
  • Containers
    • VM's vs Containers
    • OS Support
    • Building a container
    • Tools
    • Portability
    • Docker
      • Container
  • Map Reduce
    • Map Recude
    • Hadoop
    • Software Architecture
    • Task Scheduling
    • Comparison With Traditional Models
  • Cloud Computing
    • Cloud Computing
    • IaaS – Infrastructure as a Service
    • PaaS – Platform as a Service
    • SaaS – Software as a Service
    • Business Models
Powered by GitBook
On this page
  • What can go wrong?
  • Solutions
  • Server crashes
  • Problem
  • Two approaches
  • Failure recovery in transparent servers
  • Why is it impossible to recover from a failure?
  • Message lost
  • Solution (partial)
  • Client crashes
  • Problem
  • Solution
  1. Flaw Tolerance

Reliable RPCs

PreviousFailure DetectionNextDistributed commit protocols

Last updated 1 year ago

What can go wrong?

  • The client does not find the server.

  • The message with the request from the client to the server can get lost.

  • The server crashes after receiving the request.

  • The response from the server gets lost.

  • The client crashes after sending the request.

Solutions

  • Locating failure: report to the client.

  • Lost request: re-send.

Server crashes

Problem

While (a) is the normal case, (b) and (c) need different solutions. In spite of all that, what happened is still unknown

Two approaches

  • At-least-once-semantics: The server ensures the execution of the operation at least one time.

  • At-most-once-semantics: The server ensures that it will execute the operation one time, at most.

Failure recovery in transparent servers

Why is it impossible to recover from a failure?

Three distinct events in the server:

  • M: send a complete message.

  • P: completes the document processing.

  • C: crash.

Six different orders:

  • M->P->C: Crash after reporting the finish.

  • M->C->P: Crash after reporting the finish, but before updating.

  • P->M->C: Crash after reporting the finish, and after updating.

  • P->C(->M): The update occurred, but then it crashed.

  • C(->P->M): Crash before any action.

  • C(->M->P): Crash before any action.

Message lost

What the client notices is that is not receiving an answer, but it has no way of knowing what is causing the loss, if the server crashed, or if the response got lost.

Solution (partial)

Design the server in such a way that the operations are idempotent: repeating an operation is the same as running it only once.

  • Pure reading operations.

  • Restrict substitution operations.

Many operations are idempotent by nature, such as bank transactions.

Client crashes

Problem

The server is working and using resources without any reason (orphan computation).

Solution

The orphan is killed by the client when it recovers.

The client sends a broadcast with a new number from the time when it recovers -> server kills orphans of the client.

Request that a computation end in a maximum of T units of times. Older ones are just removed.

Non-simple broadcast:

  • Reliable communication in the presence of failed processes.

    • Communication is said to be reliable when it can ensure that the received message is subsequently delivered to all non-failing members of the groups.

  • Difficulty

    • An agreement about who is in the group is needed before the message gets delivered.

Reliable and simple multicast.
Reliable and simple group communication