解析"NA".读取熊猫数据框中的NaN值的条目
我是熊猫新手.我已经使用pandas.read_csv加载了csv.我尝试不指定dtype,但是速度太慢.由于它是一个非常大的文件,因此我还指定了数据类型.但是,有时在数字列中,它包含"NA".我已经使用过na_values = ['NA'],会影响我的数据框吗?我仍然想保留这些行.我的问题是,如果我指定数据类型并添加na_values = ['NA'],是否会丢掉NA?如果是,我如何保持相似的处理时间而又不丢失这些na?非常感谢!
i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!
来自 pd.read_csv
文档:
na_values
:标量,str
,list
或dict
,默认为None
na_values
: scalar,str
,list
-like, ordict
, defaultNone
其他
识别为NA
/NaN
的字符串.如果dict
通过,则特定的每列NA
价值观.默认情况下,以下值解释为NaN:",
... 不适用" ,...`.
Additional
strings to recognize as NA
/NaN
. If dict
passed, specific per-column NA
values. By default the following values are interpreted as NaN: ‘’,
... ‘NA’, ...`.
强调粗体.这些值不会被丢弃,而是被转换为NaN
.熊猫足够聪明,可以自动识别这些值,而无需您明确声明.
Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN
. Pandas is smart enough to automatically recognise those values without you explicitly stating it.