使用kashgari实现BERT+Bilstm命名实体识别,在保存模型时报错!!!求助!!!
问题描述:
以下是我的代码,
import tensorflow as tf
import time
import jieba as jb
import random
import kashgari
import sys,io
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model
from kashgari.embeddings import BertEmbedding
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8') # Change default encoding to utf8
start = time.process_time()
train_x,train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')
embedding = BertEmbedding('chinese_L-12_H-768_A-12')
model = BiLSTM_Model(embedding,sequence_length=100)
model.fit(train_x,train_y,valid_x,valid_y,epochs=1)
model.save('model_learn2/bilstm_ner')
end = time.process_time()
step = end - start
print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step,step / 60))
结果报出这样的错误
我的TensorFlow版本是2.1.0;kashgari版本是2.0.1;BERT, Chinese 中文模型使用的是Google Cloud的BERT-base, Chinese
答
你好。这个错误需要进入源码进行一下修正。我已经私信你了,请看一下。
需要在D:\dev\anaconda\lib\site-packages\kashgari\tasks\abs_task_model.py的82行open(filename)as f;的时候修改为open(filename, encoding='utf-8') as f.
答
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gbk') # Change default encoding to gbk
或者
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') # Change default encoding to gb18030
试试
答
#!/usr/bin/python
# -*- coding: utf-8 -*-
import tensorflow as tf
import time
import jieba as jb
import random
import kashgari
import sys, io
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model
from kashgari.embeddings import BertEmbedding
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') # Change default encoding to utf8
start = time.process_time()
train_x, train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')
embedding = BertEmbedding('chinese_L-12_H-768_A-12')
model = BiLSTM_Model(embedding, sequence_length=100)
model.fit(train_x, train_y, valid_x, valid_y, epochs=1)
model.save('model_learn2/bilstm_ner')
end = time.process_time()
step = end - start
print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step, step / 60))
文件头加上编码试试
答
我的kashgari 怎么导入不进去