将列表的多列拆分为单独的行

问题描述：

我有一个像这样的数据框-

I have a dataframe like this -

df = pd.DataFrame(
    {'key': [1, 2, 3, 4],
     'col1': [['apple','orange'], ['pineapple'], ['','','guava','',''], ['','','orange','apple','']],
     'col2': [['087','799'], ['681'], ['078'], ['816','018']]
     }
)

#   key                   col1        col2
#0    1        [apple, orange]  [087, 799]
#1    2            [pineapple]       [681]
#2    3        [, , guava, , ]       [078]
#3    4  [, , orange, apple, ]  [816, 018]

我需要拆分列'col1'和'col2'并创建单独的行，但是根据其索引映射列表元素.所需的输出是这个-

I need to split the columns 'col1' and 'col2' and create separate rows, but map the list elements according to their indices. The desired output is this -

desired_df = pd.DataFrame(
    {'key': [1, 1, 2, 3, 4, 4],
     'col1': [['apple'],['orange'],['pineapple'], ['guava'], ['orange'],['apple']],
     'col2': [['087'],['799'], ['681'], ['078'], ['816'],['018']]
    }
)

在col1中，元素可能为空，但非空col1元素的总长度将与col2的相应元素的长度匹配.例如:df的第2行和第3行.

In col1, there might be elements that are blanks, but the overall length of the non-empty col1 element will match with the length of the corresponding elements of col2. Examples: rows 2 and 3 of df.

我尝试了以下操作，但没有用-

I tried the following, but it did not work -

df.set_index(['key'])[['col1','col2']].apply(pd.Series).stack().reset_index(level=1, drop=True)

答

由于您知道每个列表中的非空元素的数量总是匹配的，因此您可以分别 explode 每列，进行过滤删掉空白，然后将结果加入.如果您想将'key'作为列返回，请添加 .reset_index().

Since you know that the number of non-empty elements in each list will always match, you can explode each column separately, filter out the blanks, and join the results back. Add on a .reset_index() if you want 'key' back as a column.

import pandas as pd

pd.concat([df.set_index('key')[[col]].explode(col).query(f'{col} != ""')
           for col in ['col1', 'col2']], axis=1)

# Without the f-string
#pd.concat([df.set_index('key')[[col]].explode(col).query(col + ' != ""')
#           for col in ['col1', 'col2']], axis=1)

          col1 col2
key                
1        apple  087
1       orange  799
2    pineapple  681
3        guava  078
4       orange  816
4        apple  018

如果您使用的旧版本的 pandas 熊猫不允许 explode 方法，请使用

If you are using an older verions of pandas that doesn't allow for the explode method use @BEN_YO's method to unnest. I'll copy the relevant code over here since there are a few different versions to choose from.

import numpy as np

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

pd.concat([unnesting(df.set_index('key')[[col]], explode=[col]).query(f'{col} !=""')
           for col in ['col1', 'col2']], axis=1)
# Same output as above

将列表的多列拆分为单独的行

相关推荐