Forward Learning
Papers read during research forward deep learning algorithm.
Quick notes
- 正推,逆推
- 底层精度高=> 高层精度高
- 相同结构网络,精度高的网络,从中间截取进行分类是否精度也高?
- 不同粒度特征提取的结合
- 粒度,卷积层数? 层数不同的实质是什么不同?
- 在调整了一些Hyper-paramenter后,大致上可以发现影响更大的参数,比如卷积核数
- 调整优先级:欠拟合 > 过拟合
- 用CIFAR-10训练时,测试集上的loss会在某次迭代中突然丢失,然后又恢复,形成一个尖刺?
- 将问题分割成子问题,但试图用深度学习解决的问题,都不太好分割成子问题
Orthogonal Bipolar Target Vectors1
Can OBV construct a middle target for CNN?
A kind of target representation.
- conventional
- BNV - binary: \((0, 0, 1, 0, 0)\)
- BPV - bipolar?: \((-1, -1, 1, -1, -1)\)
- OBV - orthogonal bipolar vectors
- NOV - Non-Orthogonal Vecotrs
- For fail comparision
- \(V_i=(\overbrace{-1 , \cdots , -1}^{i-1}, 1, \overbrace{-1 , \cdots , -1}^{n-i})\)
- \(cos \theta = \frac{n-2}{n}\)
- degraded characters?
- They use degraded license plate images as expirement data. (车牌号)
How to generate OBV from conventional target?
OBV (Orthogonal Bipolar Vector)
- Bipolar: \(0 \to -1 \)
- Orthogonal: \( V_{2^k}^{i} \cdot V_{2^k}^{j} = 0 \)
- \(V_{2^k}^m\)
- \(2^k\) - Can be used to represent \(2^k\) classed
- \(k\) - Can be constructed in k steps (裂变)
- \(m\) - \(m_{th}\) vector
Number of components in an OBV:
$$
n=2^km \\
V_{m}^{0} = (\overbrace{1, 1, \cdots , 1}^{m})^T
$$
Example of generating OBVs
Take four classes classification for example. Let’s say four labels are 1, 2, 3, 4.
Step.1 Initialize parameters.
$$ m=1, k=2 $$
- \(m\) can be set to 1, 2, 3, 4, …
- \(k\) should satisfy \(2^k \ge 4\)
Step.2 Initialize \(V_1^0 = (1)^T\)
Step.3
$$
\begin{align}
& V_2^1 = ({V_1^0}^T, {V_1^0}^T) = (1, 1)^T \\
& V_2^2 = ({V_1^0}^T, -{V_1^0}^T) = (1, -1)^T
\end{align}
$$
- Obviously, \(V_2^1 \cdot V_2^2 = 0\)
Step.4
$$
\begin{align}
& V_4^1 = ({V_2^1}^T, {V_2^1}^T) = (1, 1, 1, 1)^T \\
& V_4^2 = ({V_2^1}^T, -{V_2^1}^T) = (1, 1, -1, -1)^T \\
& V_4^3 = ({V_2^2}^T, {V_2^2}^T) = (1, -1, 1, -1)^T \\
& V_4^4 = ({V_2^2}^T, -{V_2^2}^T) = (1, -1, -1, 1)^T \\
\end{align}
$$
- We can use these four vectors to represent 1, 2, 3, 4
ELM
Batch mode ELM
Batch ELM Algorithm 2
$$
\begin{aligned}
& \text{Given a training set $\mathcal{\aleph} = \{(\mathbf x_i, \mathbf t_i) | \mathbf x_i \in \mathbf R^n, \mathbf t_i \in \mathbf R^m, i=1, …,N\}$,} \\
& \qquad \text{activation function $g$ or kernel function $\phi$ } \\
& \qquad \text{and hidden neuron or kernel number $\tilde{N}$} \\
\\
& step 1. \quad \text{Assign arbitrary input weight $\mathbf{w}_i$ and bias $b_i$ or} \\
& \qquad \text{center $\mu_i$ and impact width $\sigma_i,i=1,\cdots,\tilde N$.} \\
& step 2. \quad\text{Calculate the hidden layer output matrix $\mathbf H$} \\
& step3. \quad\text{Estimate the output weight $\beta: \hat \beta = \mathbf H^\dagger \mathbf T$}
\end{aligned}
$$