1. hypothsis
2. cost function:
3. Goal:
4. Gradient descent algorithm
repeat until convergence {
(for j = 0 and j = 1)
}
note: simultaneous update
α:learning rate
if α is too small, gradient descent can be slow.
if α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
5. Gradient descent algorithm for one variable
repeat until convergence {
}
6. "batch" gradient descent: each step of gradient descent uses all the training examples