Bits and Bytes

Shreyas Srivastava

19 June 2023

Deep learning book(Goodfellow) Chapter 8 Optimization

by Shreyas Srivastava

Empirical risk minimization

Optimization problem formulation:

Gradient for log likelihood

\[\nabla_{\boldsymbol{\theta}} J(\boldsymbol{\theta})=\mathbb{E}_{\mathbf{x}, \mathrm{y} \sim \hat{p}_{\mathrm{data}}} \nabla_{\boldsymbol{\theta}} \log p_{\text {model }}(\boldsymbol{x}, y ; \boldsymbol{\theta})\]

Practical considerations of batch size

Challenges in Neural Network Optimization

Ill-Conditioning

$$f\left(\boldsymbol{x}^{(0)}\right)-\epsilon \boldsymbol{g}^{\top} \boldsymbol{g}+\frac{1}{2} \epsilon^2 \boldsymbol{g}^{\top} \boldsymbol{H} \boldsymbol{g}$$

Local minima

Plateaus, Saddle Points and Other Flat Regions

Cliffs and exploding gradients

Gradients can explode near cliff structures and gradient clipping can prevent this from happening

Long term dependencies

Inexact gradients

Sampling techniques can give noisy or biased estimate of gradient

tags: