Notes - MIECT
Redes E Sistemas Autónomos
Notes - MIECT
Redes E Sistemas Autónomos
  • Redes e Sistemas Autónomos
  • Peer-to-Peer Systems and Networks
    • Content Distribution Networks
    • Peer-to-peer networks
      • Types
    • Structured vs Unstructured
    • Fully Decentralized Information System
    • FastTrack/KaZaA
    • OpenNAP/Napster
    • BitTorrent
  • InterPlanetary File System (IPFS)
    • IPFS
      • Bitswap
    • Connecting an IPFS node to the P2P network
    • Searching in DHTs (Structured)
    • File Search
    • Security
  • Ad-Hoc Networks
    • Mobile Ad-hoc networks
    • Application Scenarios
    • Routing
      • AODV - Ad Hoc On-Demand Distance Vector Routing
      • OLSR - Optimized Link State Routing Protocol
      • LAR – Location Aided Routing
      • Batman
    • IP Address Assignment
  • Self-organized systems: Data, learning and decisions
    • Use Cases and Data
    • Machine Learning
      • Supervised Learning
      • Neural Networks
      • Reinforcement Learning
      • Unsupervised Learning: K-means
    • Learning
  • Vehicular Networks
    • Vehicular Ad Hoc Networks
    • How do they work?
    • SPAT: Signal Phase And Timing
    • MAP: MAP
    • Manoeuvre Coordination Message (MCM)
    • Communication Technologies
  • QoS and Security
    • TCP- and UDP-based applications
      • TCP-Cubic
    • QUIC
    • TCP-Vegas
    • Classification of Transport protocols
    • Exploiting Buffering Capabilities
    • QoS in UDP: trade-offs
    • Transmission Quality (Batman v.3)
    • QoS-OLSR
    • Security
      • Key Management
      • RSA (Rivest-Shamir-Adleman) Key
      • Key Management in ad-hoc networks
      • Self-organized public key management (SOPKM)
      • Self-securing ad-hoc wireless networks (SSAWN)
Powered by GitBook
On this page
  1. Self-organized systems: Data, learning and decisions
  2. Machine Learning

Reinforcement Learning

PreviousNeural NetworksNextUnsupervised Learning: K-means

Last updated 2 years ago

Type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

  1. Environment - Physical world in which the agent operates

  2. State - Current situation of the agent.

  3. Reward - Feedback from the environment.

  4. Policy - Method to map agent's state to actions.

  5. Value - Future reward that an agent would receive by taking an action in a particular state.

Q-learning

Updates Q values which denote the value of performing action a in state s. The following value update rule is the core of the Q-learning algorithm.

Reward example: Best path with resources - path bandwidth / path length

Learning rate and discount factor: ]0, 1[