将 Pandas DataFrame 的行转换为列标题,

问题描述:

我必须处理的数据有点乱.它的数据中有标题名称.如何从现有的 Pandas 数据框中选择一行并将其(重命名为)列标题?

The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?

我想做类似的事情:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

将列标签设置为等于第 2 行(索引位置 1)中的值:

Set the column labels to equal the values in the 2nd row (index location 1):

In [23]: df.columns = df.iloc[1]

如果索引具有唯一标签,您可以使用以下方法删除第二行:

If the index has unique labels, you can drop the 2nd row using:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

如果索引不是唯一的,您可以使用:

If the index is not unique, you could use:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

使用 df.drop(df.index[1]) 删除 所有 与第二行标签相同的行.因为非唯一索引会导致像这样的绊脚石(或潜在的错误),通常最好注意索引是唯一的(即使 Pandas 不需要它).

Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it's often better to take care that the index is unique (even though Pandas does not require it).