将空csv列值替换为零

问题描述:

所以我处理一个csv文件有缺失值。
我想要的脚本是:

So I'm dealing with a csv file that has missing values. What I want my script to is:

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x
print row

是一个数据的例子,我尝试它,理想情况下它应该工作在任何列lenghth

Here is an example of data, I trying it on, ideally it should work on any column lenghth

Before:
actnum,col2,col4
xxxxx ,    ,
xxxxx , 845   ,
xxxxx ,    ,545

After
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 0  ,545

更新以下是我现在的感谢:

Update Here is what I have now (thanks):

reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
print row

只是似乎出了一个记录,我将管道输出到一个新的文件在命令行。

However it only seems to out put one record, I will be piping the output to a new file on the command line.

更新3:好了现在我有相反的问题,我输出每个记录的重复。
为什么会发生这种情况?

Update 3: Ok now I have the opposite problem, I'm outputting duplicates of each records. Why is that happening?

After
actnum,col2,col4
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 845, 0
xxxxx , 0  ,545
xxxxx , 0  ,545

Ok我修正了/ p>

Ok I fixed it (below) thanks you guys for your help.

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
    print ','.join(str(x) for x in row)


更改您的代码:

for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x

into:

for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
                print x

你认为你是通过打印完成的,但关键的问题是你需要修改 row 为此目的,你需要一个索引,枚举给你。

Not sure what you think you're accomplishing by the print, but the key issue is that you need to modify row, and for that purpose you need an index into it, which enumerate gives you.

注意所有其他值,除了您要更改为 0 的空白字符串将保持字符串。如果你想把它们变成 int ,你必须明确这样做。

Note also that all other values, except the empty ones which you're changing into the number 0, will remain strings. If you want to turn them into ints you have to do that explicitly.