使用kashgari实现BERT+Bilstm命名实体识别,在保存模型时报错!!!求助!!!

使用kashgari实现BERT+Bilstm命名实体识别,在保存模型时报错!!!求助!!!

问题描述:

 以下是我的代码,

import tensorflow as tf
import time
import jieba as jb
import random
import kashgari
import sys,io
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model
from kashgari.embeddings import BertEmbedding

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8') # Change default encoding to utf8

start = time.process_time()
train_x,train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

embedding = BertEmbedding('chinese_L-12_H-768_A-12')
model = BiLSTM_Model(embedding,sequence_length=100)
model.fit(train_x,train_y,valid_x,valid_y,epochs=1)

model.save('model_learn2/bilstm_ner')

end = time.process_time()
step = end - start
print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step,step / 60))

结果报出这样的错误

我的TensorFlow版本是2.1.0;kashgari版本是2.0.1;BERT, Chinese 中文模型使用的是Google Cloud的BERT-base, Chinese

你好。这个错误需要进入源码进行一下修正。我已经私信你了,请看一下。

需要在D:\dev\anaconda\lib\site-packages\kashgari\tasks\abs_task_model.py的82行open(filename)as f;的时候修改为open(filename, encoding='utf-8') as f. 

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gbk') # Change default encoding to gbk

或者

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') # Change default encoding to gb18030

试试

#!/usr/bin/python
# -*- coding: utf-8 -*-

import tensorflow as tf
import time
import jieba as jb
import random
import kashgari
import sys, io
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model
from kashgari.embeddings import BertEmbedding

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')  # Change default encoding to utf8

start = time.process_time()
train_x, train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

embedding = BertEmbedding('chinese_L-12_H-768_A-12')
model = BiLSTM_Model(embedding, sequence_length=100)
model.fit(train_x, train_y, valid_x, valid_y, epochs=1)

model.save('model_learn2/bilstm_ner')

end = time.process_time()
step = end - start
print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step, step / 60))

文件头加上编码试试

我的kashgari 怎么导入不进去