使用Keras,如何将从CuDNNLSTM生成的权重加载到LSTM模型中?

问题描述:

基于LSTM层,我已经用Keras开发了一个NN模型.为了提高Paperspace(一种GPU云处理基础架构)的速度,我已将 LSTM 层切换为新的

I've developed a NN Model with Keras, based on the LSTM Layer. In order to increase speed on Paperspace (a GPU Cloud processing infrastructure), I've switched the LSTM Layer with the new CuDNNLSTM Layer. However this is usable only on machines with GPU cuDNN support. PS: CuDNNLSTM is available only on Keras master, not in the latest release.

因此,我已经生成了权重,并将其保存为Cloud上的hdf5格式,我想在MacBook上本地使用它们.由于CuDNNLSTM层不可用,因此仅对于本地安装,我已切换回LSTM.

So I've generated the weights and saved them to hdf5 format on the Cloud, and I'd like to use them locally on my MacBook. Since CuDNNLSTM layer is not available, only for my local installation I've switched back to LSTM.

阅读此来自@fchollet的有关CuDNN的推文我认为这样做会很好,只需阅读权重返回到LSTM模型.

Reading this tweet about CuDNN from @fchollet I thought it would work just fine, simply reading the weights back into the LSTM model.

但是,当我尝试导入它们时,Keras抛出此错误:

However, when I try to import them Keras is throwing this error:

Traceback (most recent call last):
{...}
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096].
{...}
ValueError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096]

使用h5cat分析hdf5文件,我可以看到两个结构是不同的.

Analyzing the hdf5 files with h5cat I can see that the two structures are different.

TL; DR

我无法将根据 CuDNNLSTM 生成的权重加载到 LSTM 模型中. 我做错事了吗?如何让他们无缝地工作?

I cannot load weights generated from CuDNNLSTM into a LSTM model. Am i doing something in the wrong way? How can I get them to work seamlessly?

这是我的模特:

SelectedLSTM = CuDNNLSTM if is_gpu_enabled() else LSTM
# ...
model = Sequential()
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=True, input_shape=(SEQ_LENGTH, vocab_size)))
model.add(Dropout(0.2))
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=False))
model.add(Dense(vocab_size))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

原因是CuDNNLSTM层的bias层是LSTM层的两倍.这是由于cuDNN API的基础实现.您可以将以下方程式(从cuDNN用户指南中复制)与常用的LSTM方程式进行比较:

The reason is that the CuDNNLSTM layer has a bias twice as large as that of LSTM. It's because of the underlying implementation of cuDNN API. You can compare the following equations (copied from cuDNN user's guide) to the usual LSTM equations:

CuDNN使用两个偏置项,因此偏置权重的数量增加了一倍.要将其转换回LSTM的用法,需要对两个偏差项求和.

CuDNN uses two bias terms, so the number of bias weights is doubled. To convert it back to what LSTM uses, the two bias terms need to be summed.

我已经提交了 PR 进行转换并将其合并.您可以从GitHub安装最新的Keras,并应解决重量加载问题.

I've submitted a PR to do the conversion and it's merged. You can install the latest Keras from GitHub and the problem in weight loading should be solved.