Linear regression model
"That just means fitting a straight line to your data. It's probably the most widely used learning algorithm in the world today."[1]
Linear regression predicts a number based on your data. "A linear regression model describes the relationship between a dependent variable, y, and one or more independent variables, X. The dependent variable is also called the response variable. Independent variables are also called explanatory or predictor variables."[2]
The key focus is on defining a cost function, which is crucial in assessing the performance of the model and guiding efforts to improve it. The linear regression model is represented by the function (f_w, b(x) = wx + b)
, where w and b are parameters that can be adjusted during training to enhance the model's performance. The terms "parameters," "coefficients," or "weights" are adjustable variables in machine learning models. The parameters determine the shape of the linear function. The goal of linear regression is to find optimal values for w and b that make the line fit the training data well.
Linear regression aims to fit a straight line to the training data using a model defined by parameters w and b. The cost function, denoted as J(w, b), measures the difference between the model's predictions and the actual values in the training set.
To measure how well the line fits the data, the cost function is defined as the average squared error between the predicted values and the actual target values across the entire training set. The squared error is the square of the difference between the predicted and actual values, and the cost function is computed as the sum of these squared errors divided by twice the number of training examples 2m. The convention of dividing by 2 is for simplifying later calculations, though it's not strictly necessary. The cost function J(w, b) is expressed as (1/(2m) sum_{i=1}^{m} (f(x^i) - y^i)^2
.
The objective in linear regression is to minimize this cost function by finding values of w and b that result in the smallest possible error. It "is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression."[3|
Gradient descent algorithm, is a technique used for training various machine learning models, including linear regression. It is an optimization algorithm used to minimize the cost function of a model. It iteratively adjusts model parameters, such as weights, in the direction of steepest descent of the gradient, aiming to find the optimal values that lead to the lowest cost or error. The learning rate controls the step size during this iterative process. The learning rate is a small positive number between 0 and 1, controlling the size of steps taken during the optimization process. A larger learning rate results in more aggressive steps, where a smaller rate leads to smaller steps. Selecting an appropriate learning rate is crucial for the convergence and efficiency of the algorithm.
The derivative term provides the direction in which the parameter w should be adjusted and, in combination with the learning rate, determines the size of the adjustment. Since there are two parameters w and b, both need to be updated simultaneously in each iteration of the algorithm. The algorithm is repeated until convergence, meaning the parameters w and b no longer change significantly with each iteration.
[1]Andrew Ng, Deep Learning AI & Stanford, Machine Learning Specialization, Course 1.
[2] https://www.mathworks.com/help/stats/what-is-linear-regression.html#
[3] https://en.wikipedia.org/wiki/Linear_regression