Introduction to Machine Learning

This note is based on 2019 Spring COMPSCI189/289A course at University of California, Berkeley by Jonathan Shewchuk.

#Supervised Learning


example 1: classify digit 1 and 7

  1. $N\times N$ pixels matrices
  2. flatten into vector
  3. create a classifier for $N\times N$ space
    Note: Decision Boundary is a hyperplane

example 2: Bank loan default prediction

See notes for M3S17Quantitative Methods for Finance

Feature/Independent Variables/ Predictor Variables



The model is shaped to specifically to one certain data set so not predictive to new data.

When the test error worse coz the classifier becomes too sensitive to outliers or to other spurious/untrue patterns.

Sinuous decision boundaries that fit the sample points so well that it do not classify future points well.

Quantify Overfitting

Decision Boundary

The boundary chosen by classifier to separate items in class from those are not.

Decision Function

A function $f(x)$ that maps a sample point to a scalar value that
f(x)>0 if x \in class C
f(x)\leq if x not \in class C

For these decision function, the decision boundary is $f(x)=0$, usually a (d-1) dimensional surface in $R^d$.

Isosurface/iso contours

A isosurface for function $f$ is ${x: f(x)=0}$, 0 is isovalue here.
Note: ‘iso-‘ prefix means ‘equal-‘

Linear classifier

The decision boundary is a line/plane
Usually a linear decision function.




Uppercase roman: matrix, random variables, set
Lowercase roman: vector
Greek: scalar
Other scalar:

  • n, number of sample points
  • d, number of features
  • i,j,k, integer indices
    Function: f(), s(),…


  • Euclidean norms
  • Normalize a vector: $\frac{x}{|x|}$
  • dot product:
    • length:$|x|=\sum x_i y_i$
    • angle: $cos(\theta)=\frac{x\dot y}{|x||y|}$


Given a decision function $f(x)=w\dot x+\alpha$ is $H={x: w\dot x =-\alpha }$
The set H is a hyperplane.

For any x, y on H, $w\dot(y-x)=0$

normal vector w

signed distance$w\dot x +\alpha$, $w$ is unit vector
i.e. positive on one side of H, negative on the other side

Note: the distance from H to the origin is $\alpha$.
Note2: $\alpha =0$ iff H passes through origin


coefficients in $w$ and$\alpha$ are called weights or regression coefficients

Linearly separable

the input data is linearly separable if there exists a hyperplane that separates all the sample planes in C from those not in C.

Centriod classifier

computer mean $\mu_c$ of all vectors in class C and meam $\mu_x$ of al vectors NOT in class C.

  • Decision function:
    $f(x)=(\mu_c-\mu_x)\dot x-(\mu_c-\mu_x)\dot \frac{(\mu_c-\mu_x)}{2}$
    $(\mu_c-\mu_x)$ is normal vector
    $\frac{(\mu_c-\mu_x)}{2}$ is midpoint between$\mu_c, \mu_x$
    so the decision boundary is the hyperplane that bisects \bar{\mu_c\mu_x}
  • good at: classify with samples from two gaussian/normal distributions, especially when sample size is large

Perceptron Algorithm

Slow but correct for linearly separable points.
Uses a numerical optimisation algorithm, namely the gradient decent.

Sample points $X_1,X_2,…,X_n$.
For each sample point, $y_i = 1(\in C)\ or\ -1(\notin C)$

For simplicity, assume the decision boundaries pass through the origin.


#Unsupervised Learning


Dimensionality Reduction


Hold back a subset of training data for future test use–validation set(to tune hyperparameters/choose model).

test set: final evaluation

  1. train a classifier multiple times, with different model/hyperparameter
  2. test it on NEW data
  3. choose the setting that gives the best validation result
    ?why: we want the model to be working in general data. In most cases(knn for example),input used data will always give the right output, which is not valuable for our evaluation of the model.

Types of error

  1. training set error: fraction of training images not classified correctly
  2. test set error:
    fraction of misclassifying new data


points are atypical


~ control overfitting/underfitting(e.g. k in knn)

© 2019 Z.Ran All Rights Reserved.
Theme by hiero