Image
Notes about image processing.
Convolution
Implementation of convolution
提取灰度特征和边缘特征的卷积核:
# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))
# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]
# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]
im2col
Deconvolution
- Visualizing and Understanding Convolutional Networks arXiv
Face recognition
Eigenfaces
- Projecting all training samples into the PCA subspace.
- Projecting the query image into the PCA subspace.
- Finding the nearest neighbor between the projected training images and the projected query image.
Question: From your linear algebra lessons you know that a M \times N matrix with M > N can only have N - 1 non-zero eigenvalues.
Quick notes
- 图像识别中的深度学习
- LFW, Labeled Faces in the Wild
- 人眼,中心区域:97.53%
- 人眼,整张图像:99.15%
- Eigenface:60%
- 非深度学习最高识别率:96.33%
- 深度学习:99.47%
- ImageNet, PSACAL VOC
- 深度学习将各种复杂的因素通过非线性方式进行分离
- 浅层网络可以近似任何分类函数,但相同能力需要的参数呈指数级增长,同样需要更多的训练样本
- GoogLeNet,中间层和底层的特征表示也要能够对训练数据进行准确分类
- LFW, Labeled Faces in the Wild
- Transposed Convolution, Fractionally Strided Convolution or Deconvolution
- 文章结构非常清晰,博客排版的样式也很值得学习
- 开篇声明文章解决了什么问题,很重要!
- 我们很容易得到卷积层的反向传播就是和CC的转置相乘。?
- Generative Adversarial Networks
- “What I cannot create, I do not understand.” —Richard Feynman
- 生成模型由一个参数数量比训练数据少的多神经网络构成,所以生成模型为了能够产生和训练数据相似的输出就会迫使自己去发现数据中内在的本质内容。
GAN
优化目标:
$$
\min_G \max_D {\mathbb E}_{x\sim p_{\rm data}} \log D(x)+{\mathbb E}_{z\sim p_z}[\log (1-D(G(z)))]
$$
GoogLeNet and Inception Module
- GoogLeNets
- Inception Module 的发展历程
- Inception architecture: Short history of the Inception deep learning architecture
无需数学背景,读懂 ResNet、Inception 和 Xception 三大变革性架构
- GoogLeNet: Going Deeper with Convolutions
- Inception v2, v3: Rethinking the Inception Architecture for Computer Vision
- Inception v4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Xception: Xception: Deep Learning with Depthwise Separable Convolutions
Tips and tricks
Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)
- Data Augmentation
- horizontally fliping
- random crops
- color jittering
- fancy PCA?
- Pre-processing, not used with CNN
- zero-center, normalize (not necessary, already [0~255])
- PCA Whitening?
- Initializations
- small random numbers, like \(weights \sim 0.001 \times N(0,1) \)
Batch normalization
Internal covariance shift.
Batch normalization has a slight regularization effect. Larger mini-batch size will have little regularization effect.
解决梯度问题(梯度爆炸或梯度消失)。
- Deeplearning.ai: Why Does Batch Norm Work?
- 深度学习中 Batch Normalization为什么效果好? - 知乎 https://www.zhihu.com/question/38102762
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Walking through networks
- 10 大深度学习架构:计算机视觉优秀从业者必备(附代码实现)
- 10 Advanced Deep Learning Architectures Data Scientists Should Know! - 2017.8.9
ResNet
- \(x\): input
- \(f\): mapping function
- \(y\): target, the value we want
For one certain layer, conventional method is to learn \(f\) to make:
$$f(x) \approx y$$
The resudial method is to learn \(f\) to make:
$$f(x) + x \approx y$$