解析"NA".读取熊猫数据框中的NaN值的条目

问题描述:

我是熊猫新手.我已经使用pandas.read_csv加载了csv.我尝试不指定dtype,但是速度太慢.由于它是一个非常大的文件,因此我还指定了数据类型.但是,有时在数字列中,它包含"NA".我已经使用过na_values = ['NA'],会影响我的数据框吗?我仍然想保留这些行.我的问题是,如果我指定数据类型并添加na_values = ['NA'],是否会丢掉NA?如果是,我如何保持相似的处理时间而又不丢失这些na?非常感谢!

i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!

来自 pd.read_csv 文档:

na_values:标量,strlistdict,默认为None

na_values : scalar, str, list-like, or dict, default None

其他 识别为NA/NaN的字符串.如果dict通过,则特定的每列NA 价值观.默认情况下,以下值解释为NaN:", ... 不适用" ,...`.

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ... ‘NA’, ...`.

强调粗体.这些值不会被丢弃,而是被转换为NaN.熊猫足够聪明,可以自动识别这些值,而无需您明确声明.

Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN. Pandas is smart enough to automatically recognise those values without you explicitly stating it.