Notes - MIECT
Computação Distribuída
Notes - MIECT
Computação Distribuída
  • Computação Distribuída
  • Introduction / Architecture
    • Distributed Systems
    • Architecture
    • Middleware Organizations
    • Processes
    • Threads
    • Virtualization
    • Clients
    • Servers
    • Migration
  • Communications
    • OSI Model
    • Middleware Layer
    • Types of Communication
    • Remote Call Procedure (RPC)
    • Sockets
    • Application-level Multicasting
  • Naming
    • Names
    • Addresses
    • Identifiers
    • Naming Systems
      • Flat Naming
      • Structured Naming
    • Internet Domain Name System (DNS)
    • Attribute-based naming - LDAP
  • Coordination
    • Clocks
      • Synchronizing without UTC
    • Reference Broadcast Synchronization – RBS
    • Happened-Before Relation
      • Logical Clocks
      • Vector Clocks
    • Mutual Exclusion Algorithms
    • Election Algorithms
    • Distributed Events Correspondance
  • Consistency & Replication
    • Replication
    • Performance and Scalability
    • Client-centric models
    • Replicates
    • Unicasting vs. Multicasting
    • Continuous Consistency
    • Protocols
  • Flaw Tolerance
    • Dependability
    • Terminology
    • Confidence vs. Security
    • Halting failures
    • Redundancy to mask failures
    • Consensus
      • Realistic
      • Consensus in arbitrary failures
      • Achieving failure tolerance
      • Distributed consensus
    • Failure Detection
    • Reliable RPCs
    • Distributed commit protocols
  • Python asyncio & Friends
    • Async
    • Sync vs. Async
    • Tools
  • Flask
    • Introduction
    • Python Requests
  • Containers
    • VM's vs Containers
    • OS Support
    • Building a container
    • Tools
    • Portability
    • Docker
      • Container
  • Map Reduce
    • Map Recude
    • Hadoop
    • Software Architecture
    • Task Scheduling
    • Comparison With Traditional Models
  • Cloud Computing
    • Cloud Computing
    • IaaS – Infrastructure as a Service
    • PaaS – Platform as a Service
    • SaaS – Software as a Service
    • Business Models
Powered by GitBook
On this page
  • What is it?
  • Algorithm
  • Example
  • Advantages
  • Parallel processing
  • Information placement
  1. Map Reduce

Map Recude

PreviousContainerNextHadoop

Last updated 1 year ago

What is it?

A programming paradigm that allows processing large quantities of information with parallelism.

It is divided into two stages:

  1. "MAP" is responsible for soliciting/ordering/processing information independently/isolated. The result is a set of ordered key-value pares.

  2. "Reduce" is responsible for combining/grouping the information from the previous stage into a very reduced set of data (at most one).

Algorithm

Chunks of information are processed by Mappers in an isolated and because of that, these can be distributed in and network.

The result of the Mapper's work is an intermediate sub-product that is supplied to the Reducers in a process denominated shuffling process.

The result of the Reducers is the final result.

Example

Histogram of word in Lusiadas.

  • Begin by creating chunks - to simplify, each verse is a chunk.

  • Map - for each verse, a key-value pair list with each word and the number of times it appears, is created.

  • Reduce - the previous result will be summed and aggregated.

  • In the end, the histogram is the result.

Advantages

Parallel processing

Each task is completely independent, we divide the problem to simplify it.

Information placement

The data is not centralized, but distributed by all the computation nodes.

Instead of transmitting data between nodes, the Map and Reduce are the ones that migrate to the location of the data.