如何使用包含多个键值的字典在python中替换字符串

问题描述:

我有包含Word及其最接近的相关单词的字典.

I have dictionary with Word and its closest related words.

我想用原始单词替换字符串中的相关单词. 目前,我能够替换每个键仅具有值的字符串中的单词,我无法替换具有多个值的Key的字符串. 该怎么办

I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done

示例输入

North Indian Restaurant
South India  Hotel
Mexican Restrant
Italian  Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.

词典的制作方式

y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']] 

在包含列表的数据框中包含2列

Its 2 columns inside a dataframe with lists

Orginal_Word    Related_Words
[Indian]        [India,Ind,ind.]    
[Restaurant]    [Hotel,Restrant,Hotpot]   
[Pub]           [Bar,Baar, Beer]     
[1888]          [188, 188., 18] 

词典

similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()

{'Indian ': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer'
 '1888': '188, 188., 18'}

预期产量

North Indian Restaurant
South India  Restaurant
Mexican Restaurant
Italian  Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888

感谢您的帮助

我认为您可以 answer 的新字典与regex :

I think you can replace by new dict with regex from this answer:

d = {'Indian': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer',
 '1888': '188, 188., 18'}

d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}

df['col'] = df['col'].replace(d1, regex=True)
print (df)
                        col
0   North Indian Restaurant
1   South Indian Restaurant
2        Mexican Restaurant
3       Italian  Restaurant
4                  Cafe Pub
5                 Irish Pub
6               Maggiee Pub
7           Jacky Craft Pub
8               Bristo 1888
9               Bristo 1888
10              Bristo 1888

编辑(上述代码的功能):

EDIT (Function for the above code):

def replace_words(d, col):
    d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

如果出现以下错误:

regex错误-缺少),位置7处的子模式未终止

regex error- missing ), unterminated subpattern at position 7

键中必需的转义正则表达式值:

is necessary escape regex values in keys:

import re

def replace_words(d, col):
    d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')