basic - Aller au boulot

Definition

Scalar $α \in R$ ,

Vector $x \in R^{n}, n \in N$

Linearity $f : R^{n} \mapsto R^{m}$ ,

$f (α \cdot x + β \cdot y) = α \cdot f (x) + β \cdot f (y)$

Estimator

Bias

$b i a s (\overset{y}{^}) = E (\overset{y}{^}) - y$

$\overset{y}{^}$ unbiased if $b i a s (\overset{y}{^}) = 0$

Mean Squared Error

$M S E = E ((\overset{y}{^} - y)^{2}) = b i a s (\overset{y}{^})^{2} + V a r (\overset{y}{^})$

minimizing the mean squared error = maximum likelihood estimator

Moore-Penrose pseudoinverse

$X^{+} = α \to 0 lim (X^{T} X + α I)^{- 1} X^{T}$

Probability

$P (y, x)$ represent the probability of $x$ and $y$ hanppend.

$P (y ∣ x)$ represent the probability of $y$ while $x$ already known been happend.

Conditional Probability

$P (y ∣ x) = \frac{P ( y , x )}{P ( x )}$

If $x$ independent with $y$ , then

$P (y ∣ x) = P (y), P (y ∣ x) = P (y, x)$

Bayes's Rule

$P (x ∣ y) = \frac{P ( x ) P ( y ∣ x )}{P ( y )}$

Experience

$E (x_{i}) = i \sum P (x_{i}) x_{i}$

$E (x) = \int_{I} P (x) x d x$

Variance

$V a r (x) = E ((x - E (x))^{2})$

Covariance

$C o v (x, y) = E ((x - E (x)) (y - E (y)))$

Normal Distribution

$N (μ, σ^{2}) = \frac{1}{2 π σ ^{2}} e^{- \frac{1}{2 σ ^{2}} (\cdot - μ)^{2}}$

If $x \sim N (μ, σ^{2})$ , then $E (x) = μ$ , $V a r (x) = σ^{2}$ .

Logistic sigmoid

$s (x) = \frac{1}{1 + e ^{- x}}$

Softplus function

$ζ (x) = l o g (1 + e^{x})$

Softmax

$σ (x)_{i} = \frac{e ^{x_{i}}}{\sum _{j} e ^{x_{j}}}$

$L^{p}$ norm

$∣ ∣ x ∣ ∣_{p} = (i \sum ∣ x_{i} ∣^{p})^{\frac{1}{p}}$

XOR

Linear model cannot perform XOR operation

$f (x, W, c, w, b) = w^{T} \cdot m a x (0, W^{T} x + c) + b$

$f ⎝ ⎜ ⎜ ⎜ ⎛ ⎣ ⎢ ⎢ ⎢ ⎡ 00110101 ⎦ ⎥ ⎥ ⎥ ⎤ ⎠ ⎟ ⎟ ⎟ ⎞ = ⎣ ⎢ ⎢ ⎢ ⎡ 0110 ⎦ ⎥ ⎥ ⎥ ⎤$

$W = [1111], c = [0 - 1], w = [1 - 2], b = 0$

General Problem

For a data-set with $N$ samples

$(x_{i}, y_{i}) \in R^{n} \times R^{m}, i \in N,$

$f : R^{n} \mapsto R^{m}$

$\overset{y}{^} = f (x)$

$∣ ∣ \overset{y}{^} - y ∣ ∣_{2}^{2} = i \sum (\overset{y}{^}_{i} - y_{i})^{2}$

Linear Case

Module $x \in R^{n},$ $w \in R^{n} \times R^{m},$ $y \in R^{m},$ $b \in R^{m},$

$\overset{y}{^} = f (x) = x \cdot w + b$

Likelihood function

Likelihood is the probability that a particular outcome $x$ is observed when the true value of the parameter is $θ$ ,

$L (θ ∣ x) = p_{θ} (x) = P_{θ} (X = x)$

Unlike probabilities, likelihood function do not have to integrate (or sum) to 1.

Quadratic Problem

$F (x) = \frac{1}{2} x w x^{T} - b x$

$\nabla F (x) = x w - b$

$\nabla^{2} F (x) = w$

$w$ is invertable,

$x_{1} = x_{0} - \nabla F (x_{0}) (\nabla^{2} F (x_{0}))^{- 1} = x_{0} - (x_{0} w - b) w^{- 1} = b w^{- 1} = x^{*}$

$\overset{w}{^} = w a r g m i n S (w)$

summary

maximum likelihood estimation
least square regression
minimum cross-entropy between the distributions
minimum the KL divergence

prevent overfitting

maximum a posteriori estimation
regularized least square

cross-entropy $\neq =$ negative log-likelihood of a Bernoulli or softmax distribution

PCA

maximize variance of data after projection

maximize variance -- lagrangian multiplier --> eigenvector of covariance matrix

Theorem (Bochner’s theorem)

A continuous function of the form 𝑘(𝑥,𝑦)=𝑘(𝑥−𝑦) is positive definite if and only if 𝑘(𝛿) is the Fourier transform of a non-negative measure.

Random Fourier features — Random walks