替换 Pandas Dataframe 列中的 Unicode 字符

问题描述:

我有一个熊猫数据框的问题,其中包含公寓中的房间数(字符串类型).

I have a problem with a pandas Dataframe that amongst other things contains the number of rooms in an apartment (type String).

此数据包含一个 unicode 字符 u"\u00BD" (https://www.fileformat.info/info/unicode/char/00bd/index.htm).

This data consists of a unicode character u"\u00BD" (https://www.fileformat.info/info/unicode/char/00bd/index.htm).

我如何有效地用十进制值替换这个字符,以便数据将读取 2.5、3.5、4.5 等(Still String 格式),而不是 unicode 字符.

How do i effectively replace this character with decimal values so that instead of the unicode character the data will read 2.5, 3.5, 4.5 etc (Still String format).

目前看起来像这样: 2½、3½、4½ 等 我希望列中的值是 2.5、3.5、4.5 等代码>.

It currently looks like this: 2½, 3½, 4½ etc And i want the values in the column to be 2.5, 3.5, 4.5 etc.

您可以通过以下方式修复您的专栏:

You can fix your column with:

df['rooms'] = df['rooms'].str.replace("½", ".5")

使其成为浮动:

df['rooms'] = df['rooms'].str.replace("½", ".5").apply(float)