LSTM上的Keras注意层

问题描述:

我正在使用keras 1.0.1,我试图在LSTM的顶部添加一个关注层.到目前为止,这是我所拥有的,但是不起作用.

I'm using keras 1.0.1 I'm trying to add an attention layer on top of an LSTM. This is what I have so far, but it doesn't work.

input_ = Input(shape=(input_length, input_dim))
lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_)
att = TimeDistributed(Dense(1)(lstm))
att = Reshape((-1, input_length))(att)
att = Activation(activation="softmax")(att)
att = RepeatVector(self.HID_DIM)(att)
merge = Merge([att, lstm], "mul")
hid = Merge("sum")(merge)

last = Dense(self.HID_DIM, activation="relu")(hid)

网络应在输入序列上应用LSTM.然后,应将LSTM的每个隐藏状态输入到一个完全连接的层中,在该层上应用Softmax.对每个隐藏维复制softmax,并逐元素乘以LSTM隐藏状态.然后应该对所得向量进行平均.

The network should apply an LSTM over the input sequence. Then each hidden state of the LSTM should be input into a fully connected layer, over which a Softmax is applied. The softmax is replicated for each hidden dimension and multiplied by the LSTM hidden states elementwise. Then the resulting vector should be averaged.

编辑:可以编译,但是我不确定它是否可以执行我认为应该执行的操作.

EDIT: This compiles, but I'm not sure if it does what I think it should do.

input_ = Input(shape=(input_length, input_dim))
lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_)
att = TimeDistributed(Dense(1))(lstm)
att = Flatten()(att)
att = Activation(activation="softmax")(att)
att = RepeatVector(self.HID_DIM)(att)
att = Permute((2,1))(att)
mer = merge([att, lstm], "mul")
hid = AveragePooling1D(pool_length=input_length)(mer)
hid = Flatten()(hid)

您共享的第一段代码不正确.第二段代码看起来很正确,除了一件事.不要使用TimeDistributed,因为权重是相同的.使用具有非线性激活的常规密集层.

The first piece of code you have shared is incorrect. The second piece of code looks correct except for one thing. Do not use TimeDistributed as the weights will be the same. Use a regular Dense layer with a non linear activation.


    input_ = Input(shape=(input_length, input_dim))
    lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_)
    att = Dense(1, activation='tanh')(lstm_out )
    att = Flatten()(att)
    att = Activation(activation="softmax")(att)
    att = RepeatVector(self.HID_DIM)(att)
    att = Permute((2,1))(att)
    mer = merge([att, lstm], "mul")

现在,您有了体重调整状态.您如何使用它取决于您.我见过的大多数Attention版本,只需在时间轴上将它们累加起来,然后将输出用作上下文即可.

Now you have the weight adjusted states. How you use it is up to you. Most versions of Attention I have seen, just add these up over the time axis and then use the output as the context.