Notes - MCS
Machine Learning Applied to Security
Notes - MCS
Machine Learning Applied to Security
  • Machine Learning Applied to Security
  • Machine Learning
    • AI and ML
    • Taxonomy
    • Limitations
    • Terminology
  • SPAM
    • SPAM
    • SPAM Detection
    • Classification Model
    • Naive Bayes (Discrete)
    • SPAM or HAM
    • Blind Optimization
    • Gradient descent
    • Linear Regression
    • Logistic Regression
    • Binary Classification
  • Anomaly Detection
    • Context
    • Anomaly Detection
      • Examples
      • Detection
      • Techniques
    • Detecting anomalies just by seeing
    • Unsupervised Learning
    • Autoencoders
    • Isolation Forest
    • Local Outlier Factor
    • One-Class SVM
    • Tips
  • Malware Detection
    • Context
    • Creeper virus
    • ILOVEYOU worm
    • CryptoLocker ransomware
    • Mirai botnet
    • Clop ransomware
    • How To Recognize Malware
    • Malware Detection
    • Machine Learning Approaches
    • Requirements
    • Multi-Class Classification
Powered by GitBook
On this page
  • "Classical" example
  • Gradient descent of multivariable function
  • 1. Partial derivative with respect to X:
  • 2. Partial derivative with respect to Y:
  1. SPAM

Gradient descent

Last updated 1 year ago

Gradient descent is a very simple and powerful algorithm that is used to find a local minimum of a function.

"Classical" example

Starting with something simple - the "classic" parabola example. To make it we need two things - the function and the derivative of that function.

As a function I'm going to use: f(x)=x2f(x)=x^2f(x)=x2

And we can easily calculate its derivative: f′(x)=2xf'(x)=2xf′(x)=2x

In the plot above we can see our parabola and its derivative (red line). Derivative it's nothing more than the instantaneous rate of change of the function at a certain point. In that particular case - when the derivative of our function is equal to 0 - then we are at the global minimum of our function.

Now let's start realization of the gradient descent algorithm.

The algorithm is very simple and can be divided into several steps:

  1. Define a starting x coordinate, from which we want to descent to local/global minimum

  2. Calculate derivative at this point

  3. Subtract from starting point derivative at this point. To prevent divergence we need to multiply the derivative on small number "alpha", or learning rate.

  4. Repeat from step 2

So the main gradient descent formula can be written as: xnext=xstart−α∗derivativex_{next} = x_{start} - \alpha * derivativexnext​=xstart​−α∗derivative

Gradient descent of multivariable function

Now let's take something more interesting - a multivariable function with 2 input variables and 1 output.

$f(x, y)=sin(\frac{1}{2}x^2-\frac{1}{4}y^2+3)cos(2x+1-e^y)$

As in previous example - to succesfully descent to local minimum we need to calculate derivaties but, because we have multivariable function, we need to calculate partial derivatives wit respect to X and Y:

1. Partial derivative with respect to X:

$\frac{\partial f}{\partial x}=xcos(\frac{1}{2}x^2-\frac{1}{4}y^2+3)cos(2x+1-e^y)-2sin(\frac{1}{2}x^2-\frac{1}{4}y^2+3)sin(2x+1-e^y)$

2. Partial derivative with respect to Y:

$\frac{\partial f}{\partial y}=-\frac{1}{2}ycos(\frac{1}{2}x^2-\frac{1}{4}y^2+3)cos(2x+1-e^y)+e^ysin(\frac{1}{2}x^2-\frac{1}{4}y^2+3)sin(2x+1-e^y)$

These partial derivatives are necessary parts of gradient - the vector, which shows direction of steepest ascend of the function:

$\nabla f(x, y)=\begin{bmatrix} \frac{\partial f}{\partial x} \ \frac{\partial f}{\partial y} \end{bmatrix}$

So, if we take opposite of that vector - it will show us the direction of steepest descent. First - let's plot the function to see how it looks like.