从.CSV文件中选择特定范围的列
问题描述:
我有一个包含78000列的CSV文件.我正在尝试选择2-100、102-200列和最后300列.其余的列需要跳过.
I have a CSV file which has 78000 columns. I am trying to select the columns 2-100, 102-200, and the last 300 columns. The rest of the columns need to be skipped.
我已使用numpy.loadtxt选择列范围:
I have used numpy.loadtxt to select range of columns:
numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=range(1,99))
我们如何选择类似功能的列块:
How can we select blocks of columns doing something similar, like:
numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=(range(1,99),range(101,199),range(74999,77999)))
答
使用numpy行选择器np.r_
.
Use the numpy row selector, np.r_
.
>>> np.r_[range(3), range(15, 18), range(100, 103)]
或者(使用hpaulj的建议)
Or (using hpaulj's suggestion),
>>> np.r_[0:3, 15:16, 100:103]
array([ 0, 1, 2, 15, 16, 17, 100, 101, 102])
对于您的代码,这就是您的称呼方式-
For your code, this is how you'd call it -
numpy.loadtxt(
input_file_name,
delimiter=",",
skiprows = 1,
usecols=np.r_[range(1, 99), range(101, 199), range(74999, 77999)]
)