Image

Jan 31, 2018 May 22, 2018

deep learning

Notes about image processing.

Modules

Ideas

Increment-class classification

How to make n_class in classification incresable without re-train all networks?

Make classifiers more like human thinking ?

The basic problem is always “0 or 1”, isn’t it?
This unit may could be assembled to make more complex networks.

Convolution

Implementation of convolution

Implementing convolution as a matrix multiplication

在计算时，卷积核与图像中每个mxm大小的图像块做element-wise相乘，然后得到的结果相加得到一个值，然后再移动一个stride，做同样的运算，直到整副输入图像遍历完

提取灰度特征和边缘特征的卷积核：

# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))

# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]

# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

im2col

An example for im2col

Deconvolution

Visualizing and Understanding Convolutional Networks arXiv

Face recognition

Eigenfaces

Projecting all training samples into the PCA subspace.

Projecting the query image into the PCA subspace.

Finding the nearest neighbor between the projected training images and the projected query image.

Eigenfaces in OpenCV

Question: From your linear algebra lessons you know that a M \times N matrix with M > N can only have N - 1 non-zero eigenvalues.

Face recognition with Python

Quick notes

图像识别中的深度学习
- LFW, Labeled Faces in the Wild
  - 人眼，中心区域：97.53%
  - 人眼，整张图像：99.15%
  - Eigenface：60%
  - 非深度学习最高识别率：96.33%
  - 深度学习：99.47%
- ImageNet, PSACAL VOC
- 深度学习将各种复杂的因素通过非线性方式进行分离
- 浅层网络可以近似任何分类函数，但相同能力需要的参数呈指数级增长，同样需要更多的训练样本
- GoogLeNet，中间层和底层的特征表示也要能够对训练数据进行准确分类
Transposed Convolution, Fractionally Strided Convolution or Deconvolution
- 文章结构非常清晰，博客排版的样式也很值得学习
- 开篇声明文章解决了什么问题，很重要！
- 我们很容易得到卷积层的反向传播就是和CC的转置相乘。？

Generative Adversarial Networks
- “What I cannot create, I do not understand.” —Richard Feynman
- 生成模型由一个参数数量比训练数据少的多神经网络构成，所以生成模型为了能够产生和训练数据相似的输出就会迫使自己去发现数据中内在的本质内容。
- GAN优化目标：

$$ \min_G \max_D {\mathbb E}_{x\sim p_{\rm data}} \log D(x)+{\mathbb E}_{z\sim p_z}[\log (1-D(G(z)))] $$

GoogLeNet and Inception Module

GoogLeNets
Inception Module 的发展历程
- Inception architecture: Short history of the Inception deep learning architecture

无需数学背景，读懂 ResNet、Inception 和 Xception 三大变革性架构

GoogLeNet: Going Deeper with Convolutions
Inception v2, v3: Rethinking the Inception Architecture for Computer Vision
Inception v4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Xception: Xception: Deep Learning with Depthwise Separable Convolutions

Tips and tricks

Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)

Data Augmentation
- horizontally fliping
- random crops
- color jittering
- fancy PCA?
Pre-processing, not used with CNN
- zero-center, normalize (not necessary, already [0~255])
- PCA Whitening?
Initializations
- small random numbers, like $weights \sim 0.001 \times N(0,1) $

Batch normalization

Internal covariance shift.

Batch normalization has a slight regularization effect. Larger mini-batch size will have little regularization effect.

解决梯度问题（梯度爆炸或梯度消失）。

Deeplearning.ai: Why Does Batch Norm Work?
深度学习中 Batch Normalization为什么效果好？ - 知乎 https://www.zhihu.com/question/38102762
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Walking through networks

ResNet

$x$: input
$f$: mapping function
$y$: target, the value we want

For one certain layer, conventional method is to learn $f$ to make:

$$f(x) \approx y$$

The resudial method is to learn $f$ to make:

$$f(x) + x \approx y$$

Sudden drop in error

Get stuck in saddle point

def lr_sch(epoch):
    #200 total
    if epoch <50:
        return 1e-3
    if 50<=epoch<100:
        return 1e-4
    if epoch>=100:
        return 1e-5
lr_scheduler = LearningRateScheduler(lr_sch)
lr_reducer = ReduceLROnPlateau(monitor='val_acc',factor=0.2,patience=5,
                               mode='max',min_lr=1e-3)

Baselines

Fine-tuning

A Comprehensive guide to Fine-tuning Deep Learning Models in Keras (Part II)

truncate the last layer

use a smaller learning rate

freeze the weights of the first few layers

How transferable are features in deep neural networks? [pdf]

Depth-width & point-width

SeparableConv2D: Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. （区域和通道分离处理,减少了参数量）