LZN's Blog CodePlayer

Coursera Machine Learning : W3

2021-07-08
LZN

W3

Notes

Classification and Logistic Regression
  • Sigmoid function – Logistic function 1/(1+e^(-z))
  • Logistic regression: thetax=1/(1+e^(thetaX)), estimates the probability that y=1 on input x
Decision Boundary
  • The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.
  • Directly use the cost function of logistic regression will lead to a non-convex function with many local optima.
  • New cost function, cost(hthetax, y)=-log(htheta(x)), if y=1; and cost(hthetax,y)= -log(1-htheta(x)), if y=0. This will promise when hthetax gives a wrong answer, a very large penalty will be imposed.
  • Single expression: cost(thetax, y)=-y*log(htheta(x))-(1-y)*log(1-htheta(x))
  • Advantages: 1. Maximum likelihood estimation 2. Convex
  • The form of gradient decent of logistic regression is the same as linear regression, just the hypothesis changes.
Advanced Optimization
  • Gradient descent, Conjugate gradient, BFGS, L-BFGS
  • Advanced algorithm advantages:
    • No need to manually pick alpha
    • faster
  • Disadvantage: Much more complex
Multiclass Classification
  • One-vs-all, One-vs-rest
  • We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.
Overfitting
  • underfitting – high bias; overfitting – high variance
Regularization
  • Penalize the unuseful term
  • Small values for parameters tend to a simpler hypothesis
  • Add regularization term to cost function: lambda*sigma(thetaj^2), theta from 1 to n
  • The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.
  • New gradient descent with regularized term: thetaj=thetaj(1-alphalambda/m)

  • preconception 偏见

Comments

Content