LZN's Blog CodePlayer

Coursera Machine Learning : W8 Unsupervised Learning

2021-08-21
LZN

W8 Unsupervised Learning

Notes

KMeans
  • Process:
      1. Randomly initialize cluster centroids
      1. Assign all data points to their nearest cluster centroids
      1. Move cluster centroids to the mean value of all of the samples assigned to each previous centroid.
      1. Repeats these last two steps until this value is less than a threshold.
    • If no sample is assigned to the k-th centroid, just eliminate that centroid.
  • Optimization Objectives
    • Distortion cost function
    • Multiple random initializations to avoid local optima. For large K, not work.
    • Elbow method: Cost function J as a function of numer of clusters K.
Dimentionality Reduction
  • Aim: 1. Data compression 2. Visualization
  • Princical Component Analysis (PCA)
  • Feature scaling / mean normalization before performing PCA.
  • PCA procedure:
      1. compute covariance matrix Sigma
      1. eigen vector: [U, S, V]=svd(Sigma) or eig(Sigma)
      1. U=[U1, U2, …, Uk (col)] the first k cols will represent k-dim features reduced from n-dim original features.
      1. z=transpose(Ureduce)*x
  • Choose k with the smallest value that 99% of variance is retained.
  • [U, S, V], S is used for check variance, with diagnal entries S11 S22 … added up to check the variance.
  • Do not use PCA to avoid overfitting (no info from Y)

  • aptitude 资质

Comments

Content