If we have a complex, difficult to solve, system of equations for which we need to find a minimum value, it can be easier to just pick a random set of inputs, calculate the result (which is usually must easier) and then see if we can change the inputs to make the output value go down. If it goes up, we can move the inputs in the other direction. But there is an easier way to figure out which direction to move: Take the derivative of the original formula. That gives us the slope, and we can just role the input down the hill to the bottom.

In practice, we have a guess, call it theta, which represents the inputs to the formula. In order to change theta to a better value, we can modify it by a small increment (represented by a or alpha) times the slope of our error. Doing this again and again will slowly move our prediction over the entire training set to an optimal line IF the value of alpha is small enough. If alpha is too large, the new predicted value of theta may overshoot the ideal value and bounce out of control. Of course, very small values of alpha may converge to the ideal parameters very slowly.

Note that we are computing the derivative of the cost function to find it's slope so that we know which direction to move and by how much. That's why we use half the MSE as our cost function: The derivative of ½x2 is x which is easy to calculate. The slope for parameter OJ is simply (OTX - y) X, or the actual error (difference not MSE) times X.

```% Basic Linear Regression Gradient Decent with multiple parameters in Octave
alpha = 0.01; % try larger values 0.01, 0.03, 0.1, 0.3, etc...
m = length(y); % number of training examples
p = size(X,2); % number of parameters (second dimension of X)

for iter = 1:num_iters
hyp = X*theta;  %calculate our hypothesis using current parameters
err = (hyp .- y); %find the error between that and the real data
s = ( X' * err )./m; %find the slope of the error. (should that be .' instead of '?)
%Note: This is the derivative of our cost function
theta = theta - alpha .* s;
%adjust our parameters by a small distance along that slope.
end
```

Given an error curve that smoothly approaches a local minimum, the correction applied to each theta is decreased as our error slope decreases, even though alpha is not changed over each run. This helps us quickly converge when the error is large, but slow down and not overshoot our goal as we approach the best fit. This nice curve is not always present.

 file: /Techref/method/ai/gradientdescent.htm, 3KB, , updated: 2019/11/18 23:18, local time: 2024/6/24 19:55, owner: JMN-EFP-786, TOP NEW HELP FIND:  3.236.143.154:LOG IN

 ©2024 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?Please DO link to this page! Digg it! / MAKE! Gradient Descent

After you find an appropriate page, you are invited to your to this massmind site! (posts will be visible only to you before review) Just type a nice message (short messages are blocked as spam) in the box and press the Post button. (HTML welcomed, but not the <A tag: Instead, use the link box to link to another page. A tutorial is available Members can login to post directly, become page editors, and be credited for their posts.

Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.
 Did you find what you needed? "No. I'm looking for: " "No. Take me to the search page." "No. Take me to the top so I can drill down by catagory" "No. I'm willing to pay for help, please refer me to a qualified consultant"

.