# Papers

## General

• LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” Nature 521.7553 (2015): 436-444. [pdf] (Three Giants’ Survey)

2021-01-28：看完对DL有了一个更加明确的认识(其实黑盒之前的理解也差不多，只不过这里很明确地指出来了)，那就是Deep Learning主要就是learn representations of data with multiple levels of abstraction.

Representation Learning is a set of methods that allows a machine to be fed with raw data and automatically discover the representations needed for detection or classification.

Deep Learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level(starting with raw input) into a representation at a higher, slightly more abstract level.

## Attention

• Sawant, Shriraj P., and Shruti Singh. “Understanding attention: in minds and machines.” arXiv preprint arXiv:2012.02659 (2020).[pdf]

• Soft vs Hard Attention:最先在Image Captions任务中被提出. In Soft Attention, the alignment weights are placed all over the source image. On the other hand, Hard Attention selects one patch of the image to attend to at a time.
• Local vs Global Attention: 最先在Machine Translation任务中提出.Global Attention与Soft Attention类似，consider all hidden states of the encoder when deriving the context. Local Attention focus on small context window.
• Self Attention: Self Attention is the mechanism to capture different relations between words at different positions in the same sequence.
• Hierachical Attention: To take into account the hierarchical nature of the data.
• Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).[pdf]

T较大的时候，C很难将全部的信息综合起来，损失了较多的信息，所以使得Decoder很难有好的表现。

• Recurrent Models of Visual Attention[pdf]

What: a novel framework for attention-basedtask-driven visual processing with neural networks.

How(to train): This procedure uses backpropagation to train the neural-network components and policy gradient to address the non-differentiabilities due to the control problem.

# Emmmm

You need to understand the durable, lasting insights underlying how neural networks work. Technologies come and technologies go, but insight is forever.