Definition

Scalar ,

Vector

Linearity ,

Estimator

Bias

unbiased if

Mean Squared Error

minimizing the mean squared error = maximum likelihood estimator

Moore-Penrose pseudoinverse

Probability

represent the probability of and hanppend.

represent the probability of while already known been happend.

Conditional Probability

If independent with , then

Bayes's Rule

Experience

Variance

Covariance

Normal Distribution

If , then , .

Logistic sigmoid

Softplus function

Softmax

norm

XOR

Linear model cannot perform XOR operation

General Problem

For a data-set with samples

Linear Case

Module

Likelihood function

Likelihood is the probability that a particular outcome is observed when the true value of the parameter is ,

Unlike probabilities, likelihood function do not have to integrate (or sum) to 1.

Quadratic Problem

is invertable,

summary

  • maximum likelihood estimation
  • least square regression
  • minimum cross-entropy between the distributions
  • minimum the KL divergence

prevent overfitting

  • maximum a posteriori estimation
  • regularized least square

cross-entropy negative log-likelihood of a Bernoulli or softmax distribution

PCA

  • maximize variance of data after projection

maximize variance -- lagrangian multiplier --> eigenvector of covariance matrix

Theorem (Bochner’s theorem)

A continuous function of the form 𝑘(𝑥,𝑦)=𝑘(𝑥−𝑦) is positive definite if and only if 𝑘(𝛿) is the Fourier transform of a non-negative measure.

Random Fourier features — Random walks