解析"NA".读取熊猫数据框中的NaN值的条目

问题描述：

我是熊猫新手.我已经使用pandas.read_csv加载了csv.我尝试不指定dtype，但是速度太慢.由于它是一个非常大的文件，因此我还指定了数据类型.但是，有时在数字列中，它包含"NA".我已经使用过na_values = ['NA']，会影响我的数据框吗?我仍然想保留这些行.我的问题是，如果我指定数据类型并添加na_values = ['NA']，是否会丢掉NA?如果是，我如何保持相似的处理时间而又不丢失这些na?非常感谢！

i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!

答

来自 pd.read_csv 文档:

na_values:标量，str，list或dict，默认为None

na_values : scalar, str, list-like, or dict, default None

其他识别为NA/NaN的字符串.如果dict通过，则特定的每列NA 价值观.默认情况下，以下值解释为NaN:"， ... 不适用" ，...`.

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ... ‘NA’, ...`.

强调粗体.这些值不会被丢弃，而是被转换为NaN.熊猫足够聪明，可以自动识别这些值，而无需您明确声明.

Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN. Pandas is smart enough to automatically recognise those values without you explicitly stating it.

解析"NA".读取熊猫数据框中的NaN值的条目

相关推荐