Notes - MCS
Machine Learning Applied to Security
Notes - MCS
Machine Learning Applied to Security
  • Machine Learning Applied to Security
  • Machine Learning
    • AI and ML
    • Taxonomy
    • Limitations
    • Terminology
  • SPAM
    • SPAM
    • SPAM Detection
    • Classification Model
    • Naive Bayes (Discrete)
    • SPAM or HAM
    • Blind Optimization
    • Gradient descent
    • Linear Regression
    • Logistic Regression
    • Binary Classification
  • Anomaly Detection
    • Context
    • Anomaly Detection
      • Examples
      • Detection
      • Techniques
    • Detecting anomalies just by seeing
    • Unsupervised Learning
    • Autoencoders
    • Isolation Forest
    • Local Outlier Factor
    • One-Class SVM
    • Tips
  • Malware Detection
    • Context
    • Creeper virus
    • ILOVEYOU worm
    • CryptoLocker ransomware
    • Mirai botnet
    • Clop ransomware
    • How To Recognize Malware
    • Malware Detection
    • Machine Learning Approaches
    • Requirements
    • Multi-Class Classification
Powered by GitBook
On this page
  • Linear Regression Classifiers
  • Steps
  • Example
  1. SPAM

Linear Regression

The idea is to explore a classifier named Logistic Regression. However, to introduce the inner workings of such a model, let us start to analyze the Linear Regression one.

This will allow us to understand:

  1. what is a linear model

  2. define a cost function to train a model

  3. how to fit a linear model given a cost function

  4. Traditional optimization methods

The linear regression model is defined by:

h(x,m,b)=m×x+bh(x,m,b) = m\times x + bh(x,m,b)=m×x+b

The typical cost function to compute fit the linear regression is the following:

e=∑i=0n(yi−h(xi,m,b))22ne = \frac{\sum_{i=0}^{n}(y_i-h(x_i, m, b))^2}{2n}e=2n∑i=0n​(yi​−h(xi​,m,b))2​

Linear Regression Classifiers

Linear regression classifiers are a type of supervised machine learning algorithm used for predicting a continuous outcome variable based on one or more predictor variables. While linear regression is commonly used for regression tasks (predicting a continuous value), it can also be adapted for classification tasks by applying a threshold to the predicted values.

Here's how linear regression classifiers work:

Steps

1. Model Representation

The linear regression model is represented by an equation of the form Y=β0​+β1​X1​+β2​X2​+…+βn​Xn​+ϵY=β _0 ​ +β_1 ​ X_1 ​ +β_2 ​ X_2 ​ +…+β_n ​ X_n ​ +ϵY=β0​​+β1​​X1​​+β2​​X2​​+…+βn​​Xn​​+ϵ.

  • YYYis the predicted outcome variable.

  • β0β_0β0​is the intercept term.

  • β1,β2,…,βnβ_1,β_2,…,β_n β1​,β2​,…,βn​are the coefficients for the predictor variables X1,X2,…,XnX_1,X_2,…,X_nX1​,X2​,…,Xn​.

  • ϵϵϵ represents the error term.

2. Training Phase

During the training phase, the algorithm adjusts the coefficients (βββ values) to minimize the difference between the predicted values and the actual values in the training data. This is typically done using a method like least squares, which minimizes the sum of squared differences between the predicted and actual values.

3. Predictions

Once the model is trained, it can be used to make predictions on new data. For classification tasks, a threshold is applied to the predicted values. For example, if the predicted value is greater than 0.5, the instance might be classified as one category; otherwise, it's classified as another.

Example

Let's consider a binary classification example where we want to predict whether a student passes (1) or fails (0) an exam based on the number of hours they studied.

  • The linear regression equation might be Pass=β0+β1×HoursStudied+ϵPass = β_0 + β_1 ×Hours_Studied+ϵPass=β0​+β1​×HoursS​tudied+ϵ. During training, the algorithm adjusts the coefficients (�β�β�β values) to minimize the difference between the predicted pass probabilities and the actual pass/fail labels.

Once the model is trained, predictions for new students can be made. If the predicted probability is, say, 0.7, we might classify the student as likely to pass.

In practice, logistic regression is often preferred for binary classification tasks over linear regression because it models the probability directly and has a logistic (S-shaped) curve, making it suitable for classification. However, understanding linear regression classifiers helps build a foundation for more advanced techniques.

Last updated 1 year ago