从.CSV文件中选择特定范围的列

从.CSV文件中选择特定范围的列

问题描述:

我有一个包含78000列的CSV文件.我正在尝试选择2-100、102-200列和最后300列.其余的列需要跳过.

I have a CSV file which has 78000 columns. I am trying to select the columns 2-100, 102-200, and the last 300 columns. The rest of the columns need to be skipped.

我已使用numpy.loadtxt选择列范围:

I have used numpy.loadtxt to select range of columns:

numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=range(1,99))

我们如何选择类似功能的列块:

How can we select blocks of columns doing something similar, like:

numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=(range(1,99),range(101,199),range(74999,77999)))

使用numpy行选择器np.r_.

Use the numpy row selector, np.r_.

>>> np.r_[range(3), range(15, 18), range(100, 103)]

或者(使用hpaulj的建议)

Or (using hpaulj's suggestion),

>>> np.r_[0:3, 15:16, 100:103]

array([  0,   1,   2,  15,  16,  17, 100, 101, 102])

对于您的代码,这就是您的称呼方式-

For your code, this is how you'd call it -

numpy.loadtxt(
  input_file_name, 
  delimiter=",", 
  skiprows = 1, 
  usecols=np.r_[range(1, 99), range(101, 199), range(74999, 77999)]
)