Reinforcement Learning
Last updated
Last updated
Type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
Environment - Physical world in which the agent operates
State - Current situation of the agent.
Reward - Feedback from the environment.
Policy - Method to map agent's state to actions.
Value - Future reward that an agent would receive by taking an action in a particular state.
Updates Q values which denote the value of performing action a in state s. The following value update rule is the core of the Q-learning algorithm.
Reward example: Best path with resources - path bandwidth / path length
Learning rate and discount factor: ]0, 1[