W3
Notes
Classification and Logistic Regression
- Sigmoid function – Logistic function
1/(1+e^(-z))
- Logistic regression: thetax=1/(1+e^(thetaX)), estimates the probability that y=1 on input x
Decision Boundary
- The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.
- Directly use the cost function of logistic regression will lead to a non-convex function with many local optima.
- New cost function, cost(hthetax, y)=-log(htheta(x)), if y=1; and cost(hthetax,y)= -log(1-htheta(x)), if y=0. This will promise when hthetax gives a wrong answer, a very large penalty will be imposed.
- Single expression: cost(thetax, y)=
-y*log(htheta(x))-(1-y)*log(1-htheta(x))
- Advantages: 1. Maximum likelihood estimation 2. Convex
- The form of gradient decent of logistic regression is the same as linear regression, just the hypothesis changes.
Advanced Optimization
- Gradient descent, Conjugate gradient, BFGS, L-BFGS
- Advanced algorithm advantages:
- No need to manually pick
alpha
- faster
- No need to manually pick
- Disadvantage: Much more complex
Multiclass Classification
- One-vs-all, One-vs-rest
- We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.
Overfitting
- underfitting – high bias; overfitting – high variance
Regularization
- Penalize the unuseful term
- Small values for parameters tend to a simpler hypothesis
- Add regularization term to cost function:
lambda*sigma(thetaj^2)
, theta from 1 to n - The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.
-
New gradient descent with regularized term: thetaj=thetaj(1-alphalambda/m)
- preconception 偏见