注意力机制(Attention Mechanism)在自然语言处理中的应用 1 Attention研究进展 2 Recurrent Models of Visual Attention 3 Attention-based RNN in NLP 4 Attention-based CNN in NLP 5 总结

注意力机制(Attention Mechanism)在自然语言处理中的应用


    Attention机制最早是在视觉图像领域提出来的,应该是在九几年思想就提出来了,但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14],他们在RNN模型上使用了attention机制来进行图像分类。随后,Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中,使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行,他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近,如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结

2 Recurrent Models of Visual Attention

    在介绍NLP中的Attention之前,我想大致说一下图像中使用attention的思想。就具代表性的这篇论文《Recurrent Models of Visual Attention》 [14],他们研究的动机其实也是受到人类注意力机制的启发。人们在进行观察图像的时候,其实并不是一次就把整幅图像的每个位置像素都看过,大多是根据需求将注意力集中到图像的特定部分。而且人类会根据之前观察的图像学习到未来要观察图像注意力应该集中的位置。下图是这篇论文的核心模型示意图。

注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


3 Attention-based RNN in NLP

3.1 Neural Machine Translation by Jointly Learning to Align and Translate [1]

    这篇论文算是在NLP中第一个使用attention机制的工作。他们把attention机制用到了神经网络机器翻译(NMT)上,NMT其实就是一个典型的sequence to sequence模型,也就是一个encoder to decoder模型,传统的NMT使用两个RNN,一个RNN对源语言进行编码,将源语言编码到一个固定维度的中间向量,然后在使用一个RNN进行解码翻译到目标语言,传统的模型如下图:

注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结

从结果来看相比传统的NMT(RNNsearch是attention NMT,RNNenc是传统NMT)效果提升了不少,最大的特点还在于它可以可视化对齐,并且在长句的处理上更有优势。

3.2 Effective Approaches to Attention-based Neural Machine Translation [2]



注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结



注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结



4 Attention-based CNN in NLP

    随后基于Attention的RNN模型开始在NLP中广泛应用,不仅仅是序列到序列模型,各种分类问题都可以使用这样的模型。那么在深度学习中与RNN同样流行的卷积神经网络CNN是否也可以使用attention机制呢?《ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs》 [13]这篇论文就提出了3中在CNN中使用attention的方法,是attention在CNN中较早的探索性工作。

注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


    第一种方法ABCNN0-1是在卷积前进行attention,通过attention矩阵计算出相应句对的attention feature map,然后连同原来的feature map一起输入到卷积层。具体的计算方法如下。

注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结


5 总结


注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结



注意力机制(Attention Mechanism)在自然语言处理中的应用
1 Attention研究进展 
2 Recurrent Models of Visual Attention 
3 Attention-based RNN in NLP 
4 Attention-based CNN in NLP 
5 总结



[1] Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. Iclr 2015 1–15 (2014).

[2] Luong, M. & Manning, C. D. Effective Approaches to Attention-based Neural Machine Translation. 1412–1421 (2015).

[3] Rush, A. M. & Weston, J. A Neural Attention Model for Abstractive Sentence Summarization. EMNLP (2015).

[4] Allamanis, M., Peng, H. & Sutton, C. A Convolutional Attention Network for Extreme Summarization of Source Code. Arxiv (2016).

[5] Hermann, K. M. et al. Teaching Machines to Read and Comprehend. arXiv 1–13 (2015).

[6] Yin, W., Ebert, S. & Schütze, H. Attention-Based Convolutional Neural Network for Machine Comprehension. 7 (2016).

[7] Kadlec, R., Schmid, M., Bajgar, O. & Kleindienst, J. Text Understanding with the Attention Sum Reader Network. arXiv:1603.01547v1 [cs.CL] (2016).

[8] Dhingra, B., Liu, H., Cohen, W. W. & Salakhutdinov, R. Gated-Attention Readers for Text Comprehension. (2016).

[9] Vinyals, O. et al. Grammar as a Foreign Language. arXiv 1–10 (2015).

[10]    Wang, L., Cao, Z., De Melo, G. & Liu, Z. Relation Classification via Multi-Level Attention CNNs. Acl 1298–1307 (2016).

[11]    Zhou, P. et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. Proc. 54th Annu. Meet. Assoc. Comput. Linguist. (Volume 2 Short Pap. 207–212 (2016).

[12]    Yang, Z. et al. Hierarchical Attention Networks for Document Classification. Naacl (2016).

[13] Yin W, Schütze H, Xiang B, et al. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. arXiv preprint arXiv:1512.05193, 2015.

[14] Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems. 2014: 2204-2212.