将指数符号数转换为字符串-解释
我有DataFrame
来自这个问题:
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp))
print (df)
Total Price test_num
0 0 71.7 2.042560e+14
1 1 39.5 2.042540e+14
2 2 82.2 2.041880e+14
3 3 42.9 2.041710e+14
如果将float
s转换为string
s,则得到尾随的0
:
If convert float
s to string
s get trailing 0
:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
解决方案将float
s转换为integer64
:
Solution is convert float
s to integer64
:
print (df['test_num'].astype('int64'))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: int64
print (df['test_num'].astype('int64').astype(str))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: object
问题是为什么它会以这种方式转换?
Question is why it convert this way?
我添加了这个拙劣的解释,但感觉应该更好:
I add this poor explanation, but feels it should be better:
糟糕的解释:
您可以检查 dtype
转换后的列-返回float64
.
print (df['test_num'].dtype)
float64
转换为字符串后,它将删除指数表示法并强制转换为float
s,因此添加了0
:
After converting to string it remove exponential notation and cast to float
s, so added traling 0
:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
使用pd.read_csv导入数据且未定义数据类型时, 熊猫做出了有根据的猜测,在这种情况下,决定了该专栏 最好用浮点值表示"2.04256e + 14"之类的值.
When you use pd.read_csv to import data and do not define datatypes, pandas makes an educated guess and in this case decides, that column values like "2.04256e+14" are best represented by a float value.
将其转换回字符串后会添加一个".0".当你写得不错的时候, 转换为int64可以解决此问题.
This, converted back to string adds a ".0". As you corrently write, converting to int64 fixes this.
如果您知道该列仅在输入之前具有int64值(并且 没有空值(np.int64无法处理),您可以在导入时强制使用此类型,以避免不必要的转换.
If you know that the column has int64 values only before input (and no empty values, which np.int64 cannot handle), you can force this type on import to avoid the unneeded conversions.
import numpy as np
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp), dtype={2: np.int64})
print(df)
返回
Total Price test_num
0 0 71.7 204256000000000
1 1 39.5 204254000000000
2 2 82.2 204188000000000
3 3 42.9 204171000000000