Image
Notes about image processing.
Modules
Ideas
Increment-class classification
How to make n_class in classification incresable without re-train all networks?
- The basic problem is always “0 or 1”, isn’t it?
- This unit may could be assembled to make more complex networks.
Convolution
Implementation of convolution
在计算时,卷积核与图像中每个mxm大小的图像块做element-wise相乘,然后得到的结果相加得到一个值,然后再移动一个stride,做同样的运算,直到整副输入图像遍历完
提取灰度特征和边缘特征的卷积核:
# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))
# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]
# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]
im2col
Deconvolution
- Visualizing and Understanding Convolutional Networks arXiv
Face recognition
Eigenfaces
- Projecting all training samples into the PCA subspace.
- Projecting the query image into the PCA subspace.
- Finding the nearest neighbor between the projected training images and the projected query image.
Question: From your linear algebra lessons you know that a M \times N matrix with M > N can only have N - 1 non-zero eigenvalues.
Quick notes
-
- LFW, Labeled Faces in the Wild
- 人眼,中心区域:97.53%
- 人眼,整张图像:99.15%
- Eigenface:60%
- 非深度学习最高识别率:96.33%
- 深度学习:99.47%
- ImageNet, PSACAL VOC
- 深度学习将各种复杂的因素通过非线性方式进行分离
- 浅层网络可以近似任何分类函数,但相同能力需要的参数呈指数级增长,同样需要更多的训练样本
- GoogLeNet,中间层和底层的特征表示也要能够对训练数据进行准确分类
- LFW, Labeled Faces in the Wild
Transposed Convolution, Fractionally Strided Convolution or Deconvolution
- 文章结构非常清晰,博客排版的样式也很值得学习
- 开篇声明文章解决了什么问题,很重要!
- 我们很容易得到卷积层的反向传播就是和CC的转置相乘。?
- Generative Adversarial Networks
- “What I cannot create, I do not understand.” —Richard Feynman
- 生成模型由一个参数数量比训练数据少的多神经网络构成,所以生成模型为了能够产生和训练数据相似的输出就会迫使自己去发现数据中内在的本质内容。
GAN
优化目标:
GoogLeNet and Inception Module
- GoogLeNets
- Inception Module 的发展历程
- Inception architecture: Short history of the Inception deep learning architecture
无需数学背景,读懂 ResNet、Inception 和 Xception 三大变革性架构
- GoogLeNet: Going Deeper with Convolutions
- Inception v2, v3: Rethinking the Inception Architecture for Computer Vision
- Inception v4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Xception: Xception: Deep Learning with Depthwise Separable Convolutions
Tips and tricks
Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)
- Data Augmentation
- horizontally fliping
- random crops
- color jittering
- fancy PCA?
- Pre-processing, not used with CNN
- zero-center, normalize (not necessary, already [0~255])
- PCA Whitening?
- Initializations
- small random numbers, like \(weights \sim 0.001 \times N(0,1) \)
Batch normalization
Internal covariance shift.
Batch normalization has a slight regularization effect. Larger mini-batch size will have little regularization effect.
解决梯度问题(梯度爆炸或梯度消失)。
- Deeplearning.ai: Why Does Batch Norm Work?
- 深度学习中 Batch Normalization为什么效果好? - 知乎 https://www.zhihu.com/question/38102762
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Walking through networks
- 10 大深度学习架构:计算机视觉优秀从业者必备(附代码实现)
- 10 Advanced Deep Learning Architectures Data Scientists Should Know! - 2017.8.9
ResNet
- \(x\): input
- \(f\): mapping function
- \(y\): target, the value we want
For one certain layer, conventional method is to learn \(f\) to make:
$$f(x) \approx y$$
The resudial method is to learn \(f\) to make:
$$f(x) + x \approx y$$
Sudden drop in error
def lr_sch(epoch):
#200 total
if epoch <50:
return 1e-3
if 50<=epoch<100:
return 1e-4
if epoch>=100:
return 1e-5
lr_scheduler = LearningRateScheduler(lr_sch)
lr_reducer = ReduceLROnPlateau(monitor='val_acc',factor=0.2,patience=5,
mode='max',min_lr=1e-3)
Baselines
Fine-tuning
- truncate the last layer
- use a smaller learning rate
- freeze the weights of the first few layers
- How transferable are features in deep neural networks? [pdf]
Depth-width & point-width
SeparableConv2D
: Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. (区域和通道分离处理,减少了参数量)